Building an AI-Powered App: What the Discovery Phase Really Looks Like

Most AI Projects Fail Before a Single Line of Code Is Written

Gartner reported in 2025 that over 50% of AI projects never make it to production. Having built AI-powered applications for clients since the early days of GPT integration, we can tell you the pattern: it's almost never a technology problem. It's a discovery problem. Teams skip the hard questions, jump straight to building, and end up with something that's technically impressive and practically useless.

The discovery phase for an AI app is different from traditional app discovery. There are additional unknowns, different risks, and decisions that are expensive to change later. Here's exactly what that phase looks like when it's done right.

Week 1: Problem Definition and Data Audit

The first week is about getting brutally honest about what you're trying to solve and whether AI is actually the right tool.

Defining the problem (not the solution)

Most clients come to us with a solution in mind: "We want an AI chatbot" or "We need a machine learning model for predictions." We start by backing up to the problem.

What decision are you trying to make faster or better?
What does a human currently do to solve this, and how long does it take?
What happens when the human gets it wrong?
How would you measure success — what number needs to change?

These questions matter because they determine whether you need a fine-tuned model, an off-the-shelf API, a retrieval-augmented generation (RAG) system, or maybe not AI at all. We've had discovery sessions that ended with a recommendation to build a rules engine instead of an AI system — and saved the client six figures.

The data audit

AI is only as good as the data feeding it. During week one, we inventory:

What data exists? Databases, documents, spreadsheets, emails, chat logs
What format is it in? Structured (database tables, CSVs) or unstructured (PDFs, free text, images)
How much do you have? For fine-tuning, you typically need thousands of examples. For RAG, you need a comprehensive knowledge base.
How clean is it? Duplicate records, inconsistent formatting, missing fields — we assess quality honestly
Who owns it? Privacy regulations, licensing restrictions, and user consent all affect what you can use

The data audit is often the most eye-opening part of discovery. Clients who thought they had plenty of training data discover it's fragmented across six systems. Others who assumed their data was too messy find it's actually quite usable with the right preprocessing.

Week 2: Architecture and Model Selection

With a clear problem and data inventory, we move to the technical architecture.

Build vs. buy vs. combine

The AI landscape in 2026 gives you three broad options:

Use an API (OpenAI, Anthropic, Google) — fastest to market, lowest upfront cost, but you're dependent on a third party's pricing, performance, and availability
Fine-tune an existing model — higher cost ($5,000-50,000+ for training), but better accuracy on your specific use case and more control
Build a custom model — only justified when you have proprietary data that creates a competitive moat and off-the-shelf models genuinely can't handle your domain

For 80% of the projects we scope, option 1 (API) or a hybrid of options 1 and 2 is the right call. We've seen too many companies burn $200,000 on custom model training when a well-designed prompt chain using Claude or GPT-5.4 would have delivered 90% of the accuracy at 5% of the cost.

Key architecture decisions

During week two, we nail down:

RAG vs. fine-tuning vs. prompt engineering — each has different cost, accuracy, and latency profiles
Embedding strategy — how your data gets vectorized and searched (we typically use pgvector with PostgreSQL or a dedicated vector store for larger datasets)
Guardrails and safety — what the AI should never do, content filtering, output validation
Fallback behavior — what happens when the AI's confidence is low or the service is unavailable
Latency requirements — a real-time chat feature has very different needs than a batch processing pipeline

Cost modeling

We build a detailed cost model during discovery because AI infrastructure costs are notoriously hard to predict without doing the math.

Example from a recent project — a document analysis tool processing 500 documents per day:

API costs (Claude Sonnet, ~4,000 tokens per document): ~$900/month
Embedding generation and storage: ~$150/month
Vector database hosting: ~$200/month
Application hosting: ~$300/month
Total infrastructure: ~$1,550/month

That same project using a fine-tuned model would have cost $35,000 upfront for training plus ~$600/month in inference — cheaper long-term but a higher barrier to entry and less flexibility to iterate.

Week 3: Proof of Concept

This is where discovery for AI apps diverges most from traditional app discovery. Before we commit to a full build timeline, we run a focused proof of concept.

What the PoC tests

The PoC isn't a demo. It's a technical validation that answers three questions:

Does the AI produce acceptable output quality? We test against a curated set of 50-100 real examples and measure accuracy, relevance, and consistency.
Is the latency acceptable for the user experience? An AI feature that takes 45 seconds to respond won't work for a real-time interface.
Do the costs align with the business model? If each AI interaction costs $0.50 and users average 20 interactions per session, that's $10/user/session — does the economics work?

We typically build the PoC in 3-5 days using the actual data and the proposed architecture. It's intentionally scrappy — no UI polish, no user auth, no production infrastructure. Just the core AI pipeline running against real inputs.

Success criteria

Before we start the PoC, we agree on specific thresholds:

Accuracy target (e.g., "correctly classifies 85%+ of test cases")
Latency target (e.g., "responds in under 5 seconds for 95% of queries")
Cost per interaction target (e.g., "under $0.15 per AI call at projected volume")

If the PoC doesn't hit these targets, we don't just push forward and hope — we discuss whether to adjust the approach, adjust expectations, or pause the project. Killing a $5,000 PoC is infinitely better than killing a $150,000 build.

Week 4: Scope, Timeline, and Quote

With the PoC results in hand, we have the confidence to scope the full project accurately.

What the final discovery deliverable includes

Technical architecture document — system diagram, data flow, model selection rationale, infrastructure plan
Feature specification — every feature with acceptance criteria, AI-specific behavior definitions, and edge case handling
Risk register — what could go wrong (model degradation, API deprecation, data quality issues) and mitigation strategies
Fixed-price quote — total build cost with a clear scope boundary
Timeline — typically 8-16 weeks for an AI-powered MVP, depending on complexity
Running cost estimate — monthly infrastructure and API costs at launch, at 10x users, and at 50x users

What Discovery Costs (And Why It's Worth It)

Our AI discovery phase runs $5,000-12,000 depending on complexity. That includes the data audit, architecture design, PoC build, and complete scoping documentation.

Is it worth it? Consider the alternative: skipping discovery and committing to a $100,000+ build based on assumptions. If the model doesn't perform well enough, if the data isn't suitable, or if the unit economics don't work — you find out after you've spent the money, not before.

Every AI project we've shipped successfully started with discovery. Every AI horror story we've been hired to rescue skipped it.

Starting the Conversation

If you're considering an AI-powered application, the best thing you can do is start with clarity about the problem you're solving and honesty about the data you have. The technology is genuinely powerful in 2026 — the gap between success and failure isn't technical capability, it's disciplined planning.

Bring us a problem, not a solution. We'll figure out the right approach together.