
Most AI Engagements Fail Because Nobody Set Expectations
Here's a pattern we see constantly: a company hires an AI consultant, spends 90 days in meetings and exploration, and walks away with a slide deck and no working software. The consultant calls it a "strategic assessment." The client calls it a waste of $75,000.
We run AI consulting engagements differently. By day 90, you should have at least one AI capability running in production — not a prototype, not a demo, a real tool that's delivering measurable value. Here's exactly how that process works.
Days 1–14: Discovery and Data Audit
The first two weeks are about understanding what you actually have to work with. AI is only as good as the data and workflows behind it, and most companies overestimate their data readiness.
Week 1: Business Discovery
- Stakeholder interviews — We talk to the people doing the work, not just the executives who approved the budget. The best AI opportunities come from frontline frustrations.
- Process mapping — Document the workflows that are candidates for AI augmentation. We're looking for tasks that are repetitive, high-volume, and follow identifiable patterns.
- Impact scoring — Rank opportunities by business value, feasibility, and time to value. A quick win that saves 10 hours per week beats a transformative project that takes 18 months.
Week 2: Data Audit
- Data inventory — What data do you have, where does it live, and how clean is it? We've walked into companies that thought they had 5 years of structured customer data and actually had 5 years of inconsistent spreadsheets.
- Quality assessment — Missing values, duplicate records, format inconsistencies. We score data readiness on a 1–5 scale across completeness, accuracy, consistency, and accessibility.
- Gap analysis — What data would you need to collect to enable the highest-value AI use cases? Sometimes the answer is "start logging this one field" and you're 80% there.
Deliverable by day 14: A prioritized roadmap of 3–5 AI opportunities with feasibility scores, estimated ROI, and data readiness ratings.
Days 15–30: Architecture and First Build
This is where most consulting engagements stall. Ours accelerate.
Selecting the Right Approach
Not every problem needs a custom model. Our decision framework:
- Off-the-shelf API (OpenAI, Anthropic, Google) — Best for text generation, summarization, classification, and code generation. Fastest time to value, lowest cost.
- Fine-tuned model — When you need domain-specific accuracy that base models can't deliver. Common for specialized classification, industry jargon, or proprietary workflows.
- Custom model — Rare. Only when your problem is genuinely novel and no existing architecture fits. We talk most clients out of this.
- Agentic workflows — When you need AI to chain multiple steps, make decisions, and interact with external systems. This is where we're seeing the most ROI in 2026.
Building the First Solution
We pick the highest-impact, lowest-risk opportunity from the roadmap and build it. Not a proof of concept — a production-grade solution with:
- Error handling and fallback logic
- Logging and monitoring
- Cost tracking (LLM API costs can surprise you — we've seen teams accidentally spend $10,000/month on poorly optimized prompts)
- Human-in-the-loop checkpoints where accuracy matters
Deliverable by day 30: First AI feature deployed to a staging environment with real data flowing through it.
Days 31–60: Iteration, Testing, and Production
Measuring What Matters
We set up tracking for three categories of metrics:
- Accuracy metrics — How often does the AI get it right? For classification tasks, we target 90%+ accuracy before production deployment. For generative tasks, we use human evaluation scores.
- Business metrics — Time saved, cost reduced, revenue influenced. These are the numbers that justify the engagement.
- Operational metrics — Latency, error rates, API costs, throughput. These determine whether the solution scales.
The Refinement Cycle
Weeks 5–8 follow a tight iteration loop:
- Deploy to a subset of users or workflows
- Collect feedback and measure accuracy
- Refine prompts, adjust thresholds, improve preprocessing
- Expand to more users
- Repeat
This is where the real learning happens. Edge cases surface, prompt engineering gets refined, and you discover the 20% of inputs that cause 80% of the errors.
Common Adjustments We Make
- Prompt restructuring — The difference between a 75% accurate prompt and a 95% accurate prompt is usually in the system instructions and few-shot examples, not the model itself.
- Chunking strategy — For document processing, how you split text dramatically affects quality. We typically test 3–4 chunking approaches.
- Temperature tuning — Lower temperatures (0.1–0.3) for factual tasks, higher (0.7–0.9) for creative tasks. Most teams default to 0.7 for everything and wonder why their extraction pipeline hallucinates.
- Guardrails — Output validation, content filtering, and confidence thresholds. If the model isn't confident, escalate to a human.
Deliverable by day 60: First AI feature live in production with measurable business impact.
Days 61–90: Scale and Knowledge Transfer
Expanding the Scope
With one solution in production, we move to the second and third items on the roadmap. The infrastructure is in place, the team understands the process, and iteration is faster.
Typical second-phase projects we see:
- Internal knowledge bases — AI-powered search across company documents, Slack history, and support tickets
- Customer-facing features — Intelligent recommendations, natural language interfaces, automated responses
- Process automation — Invoice processing, data entry, report generation, quality checks
Knowledge Transfer
An AI engagement that leaves when the consultants do has failed. We spend the final weeks ensuring your team can maintain and extend what we've built:
- Documentation — Architecture decisions, prompt libraries, runbooks for common issues
- Training sessions — Hands-on workshops for your engineering team covering prompt engineering, monitoring, and troubleshooting
- Handoff package — Full source code, infrastructure-as-code templates, cost projections, and a recommended roadmap for the next 6 months
Deliverable by day 90: Two to three AI features in production, comprehensive documentation, and a trained internal team.
What This Actually Costs
Let's be transparent about pricing. A 90-day AI consulting engagement with a firm like ours typically runs:
- Discovery and strategy only: $25,000–$40,000 (if you just need the roadmap)
- Full engagement with production delivery: $75,000–$150,000 (depending on complexity and number of solutions)
- Enterprise scale (multiple teams, custom models): $150,000–$300,000+
The ROI threshold we target: your AI solutions should save or generate at least 3x the engagement cost within the first 12 months. If we can't project that return during discovery, we'll tell you.
Red Flags to Watch For
Not all AI consultants operate this way. Watch out for:
- No working software by day 60. If you're still in "assessment" phase after two months, you're being milked.
- Vague deliverables. "AI strategy framework" is not a deliverable. Running code is a deliverable.
- Model obsession. If your consultant spends more time debating GPT-4 vs. Claude vs. Gemini than understanding your business problem, they're optimizing the wrong thing.
- No cost projections. AI has ongoing costs — LLM APIs, compute, monitoring. If your consultant hasn't modeled these, they haven't planned for production.
- No knowledge transfer plan. You should be less dependent on the consultant at day 90 than at day 1, not more.
The Bottom Line
A good AI consulting engagement is a sprint, not a study group. You should walk into day 1 expecting that by day 90, you'll have real AI capabilities running in production, a team that understands how to maintain them, and a clear roadmap for what comes next.
We've run these engagements for companies ranging from 10-person startups to enterprise teams with hundreds of employees. The timeline is consistent — what changes is scope, not speed. The companies that get the most value are the ones that show up ready to make decisions, provide access to data quickly, and commit internal resources to work alongside us.