Guide · Scoping

How to scope an AI build engagement

TL;DR

A good AI scope has six components: the workflow target, the success metric, a data inventory, the integration points, the constraints, and a timeline anchor. Most failed AI projects were under-scoped on one of these before the contract was signed. This guide shows what good and bad look like for each.

The single biggest predictor of whether an AI build succeeds is the quality of the scope written before it starts. Vague scope is not a paperwork problem — it is the mechanism by which budgets double and timelines slip.

This guide breaks scope into six components, with a good and a bad example for each, so you can pressure-test a scope before committing.

Why scope decides the outcome

AI work has more inherent uncertainty than ordinary software, so loose scope compounds faster. When 'build an AI assistant' is the scope, every party fills the gaps with a different assumption, and the gap becomes a dispute the moment the bill arrives.

Good scope does not eliminate uncertainty; it locates it. It says exactly what is being built, how success is measured, and what is explicitly out of bounds — so the unknowns that remain are small and named.

The six components

1) Workflow target — the specific operation being changed, not a capability. 2) Success metric — a measurable definition of done. 3) Data inventory — what data exists, where, and in what state. 4) Integration points — the exact systems to connect to. 5) Constraints — data residency, latency, budget, non-negotiables. 6) Timeline anchor — a fixed date the work is paced against.

Each maps to something concrete: the success metric becomes the eval suite; the data inventory determines feasibility; the integration points are where most hidden cost lives.

Good scope vs bad scope

Bad: 'Use AI to improve customer support.' Good: 'Draft first-response replies for billing tickets, scored against a 200-ticket golden set, integrated with our help desk, with no customer data leaving our cloud, live in eight weeks.' The second is buildable and quotable; the first is an invitation to bill by the hour.

The test for any scope line is whether two vendors would price it the same. If they would not, the line is too vague to commit to.

The mistakes that kill projects

The fatal scoping mistakes are: defining a capability instead of a workflow; leaving success undefined; discovering the data is unusable after signing; and treating integration as a detail. Each one converts a fixed engagement into an open-ended one.

A Diagnostic exists to produce exactly this scope — but you can do much of it yourself first, and arrive at the conversation with the unknowns already narrowed.

How scope connects to the four principles

Scope is not a procurement formality that happens before the real engineering; it is the first engineering act, and each of its six components feeds directly into one of the four methodology principles. The success metric becomes the eval suite — evals before features only works if the scope already names what 'working' looks like in numbers. The data inventory and integration points decide where telemetry must be instrumented, because you cannot record quality on a path the scope never traced. The constraints — data residency, latency, budget — determine how owned infrastructure is shaped, since they dictate which cloud, which model accounts, and which storage the system has to run on.

The timeline anchor is what makes a lean pod on a fixed clock viable at all. A two-to-three-person pod working an eight-week clock can only commit to a date if the scope has bounded what it is committing to. Read the other way round, this is a useful test of your own scope: take each line and ask which principle it serves. If a line maps to none of them — if it neither defines done, nor names where to measure, nor shapes what you own, nor paces the clock — it is probably decoration, and decoration in a scope is where cost quietly accumulates.

This is also why a vendor's methodology and your scope cannot be assessed separately. A scope written for a vendor who has no eval discipline will still drift, because there is nothing downstream to hold the success metric to account. The strongest scope in the world cannot rescue a delivery model that profits from the work continuing — and the leanest, most disciplined pod cannot rescue a scope that never said what done means.

Questions to ask a vendor before you sign

A scope is tested in the conversation that follows it, and the questions a vendor asks back tell you more than the proposal they send. The first thing to probe is the success metric: ask how they intend to turn your definition of done into something automated and repeatable. A serious vendor will talk about a golden dataset, scoring rubrics, and thresholds; a weaker one will reassure you that they will 'know it when they see it,' which is precisely the open-endedness good scope exists to prevent. Ask, too, what they would refuse to quote until they had seen your data — a vendor who will price any scope sight-unseen is either padding heavily or planning to bill the difference later.

The second line of questioning is about variance and what happens when the build turns out harder than expected. Ask directly: under your model, do you earn more by finishing or by continuing? Ask what the scoping step produces, whether the price is fixed against it, and what the warranty is measured on. The honest answers here are specific — a fixed price quoted only after a Diagnostic, a warranty measured against the agreed eval thresholds, a clear statement of what is out of scope. Vagueness in the answers is a forecast of vagueness in the invoice.

Finally, ask what you will hold when the engagement ends. Where will the code live, whose cloud account runs it, who owns the model keys and the telemetry store. The answers separate a builder from a landlord — and the distinction matters most precisely when the relationship is going well, because that is when a buyer is least inclined to check. Scope the exit before you scope the build; it is far cheaper to negotiate ownership on the way in than to discover its absence on the way out.

Sequencing the scope: what to settle first

The six components are not equally urgent, and trying to perfect all of them in parallel is itself a scoping mistake. There is an order that de-risks the work fastest. Settle the data inventory first, because it is the component most likely to be wrong in a way that invalidates everything downstream — a workflow target and a success metric built on data that turns out to be incomplete, inaccessible, or legally encumbered are sophisticated answers to the wrong question. Confirm the data exists, that you can lawfully use it, and that it is representative of production before you invest effort anywhere else.

With the data confirmed, fix the workflow target and the success metric together, because they constrain each other: the metric is only meaningful against a specific operation, and the operation is only worth changing if its success can be measured. Then map integration points, which is where the largest unbudgeted cost usually hides — the unglamorous work of connecting to systems that behave nothing like their documentation. Constraints and the timeline anchor come last not because they matter least but because they are the easiest to state once the substance is settled; a date and a residency rule are quick to write and quick to verify.

The honest limit of doing this yourself is the data and integration work. You can write a strong workflow target, a testable success metric, and a clear set of constraints without a vendor in the room. What you usually cannot fully resolve alone is whether the data will actually support the metric and whether the integrations are as clean as they look — and that uncertainty is exactly what the two-week Diagnostic is built to price down before the Build is quoted.

Common misconceptions about scoping

The most persistent misconception is that detailed scope slows a project down — that pinning everything before the build is bureaucratic overhead that delays the interesting work. The opposite is true for AI specifically, because the uncertainty that loose scope leaves unresolved does not disappear; it is merely deferred to a moment when it is far more expensive to confront. A success metric left vague at signing becomes a dispute at delivery. An integration assumed at scoping becomes a fortnight of unplanned work at build. Detail is not the cost; it is the thing that prevents the cost.

A second misconception is that a longer scope is a better scope. Length is not the signal — testability is. A page of buildable, priceable lines beats ten pages of aspiration, and padding a scope with capabilities the project does not need is a reliable way to inflate both the quote and the risk surface. The discipline is subtractive: a good scope is as much a record of what is explicitly out of bounds as of what is in. Naming the exclusions plainly is what keeps an eight-week clock honest.

The third misconception is that scope is fixed once signed. In practice scope is a living constraint that telemetry and evals keep honest — the success metric agreed up front is the yardstick production data is later measured against, and a regression below it is a warranty matter rather than a renegotiation. What must not move is the definition of done; what can be learned is how reality compares to it. Treating scope as a one-time document rather than a measured commitment is how engagements that started well still drift in their second half.

AI Product Engineering Build readiness checklist Lean pods, fixed clocks AI engineering glossary

Frequently asked questions

What should an AI scope document include?

Six components: the workflow target, a measurable success metric, a data inventory, the integration points, the constraints (residency, latency, budget), and a timeline anchor. Each removes a class of later dispute.

What does a good AI scope look like?

Specific and testable: the exact workflow, a measurable definition of done, the systems to integrate, the data constraints, and a fixed date. The test is whether two vendors would price it identically.

What's the most common scoping mistake?

Defining a capability ('use AI for support') instead of a workflow ('draft first replies for billing tickets, scored against a golden set'). Capabilities cannot be priced or finished; workflows can.

How does scope relate to the eval suite?

The success metric in the scope becomes the eval suite. A scope with no measurable success criterion cannot produce evals, which is why such projects end up open-ended and disputed.

Can the Diagnostic do the scoping for us?

Yes — producing this scope is exactly what the two-week Diagnostic delivers. Doing the groundwork yourself first makes the Diagnostic faster and the resulting Build cheaper.

Who should write the scope — us or the vendor?

Both, in sequence. You can write the workflow target, the success metric, and the constraints before any vendor is involved, which sharpens every conversation that follows. What you usually cannot finish alone is confirming the data supports the metric and the integrations are clean — that is what the two-week Diagnostic prices down before a Build is quoted. Arriving with the unknowns already narrowed makes the Diagnostic faster and the Build cheaper.

How detailed should an AI scope be before we talk to a vendor?

Detailed enough that two vendors would price it the same, and no more. The signal is testability, not length: a single page of buildable, priceable lines beats ten pages of aspiration. Pin the workflow, the measurable success criterion, and what is explicitly out of bounds. Leave the data feasibility and integration depth as named unknowns — those are precisely what a scoping step exists to resolve, and guessing at them only creates false precision.

What scope red flags should make us walk away from a vendor?

Three. A fixed Build price quoted with no scoping step is a guess or a plan to bill the difference later. A success criterion the vendor describes as 'we'll know it when we see it' has no eval behind it and will drift. And a system that lives in the vendor's cloud, repo, or model accounts is lock-in by design. Each red flag converts a defined engagement into an open-ended one.

Can scope change after the Build starts?

The definition of done should not. That is the one thing the fixed price, the warranty, and the eval suite all depend on — move it mid-Build and the engagement becomes open-ended. What can change is your understanding of how reality compares to that definition, surfaced by telemetry. A regression below the agreed thresholds is a warranty matter, not a renegotiation. Genuine new scope is new work, quoted separately, not absorbed silently into the original clock.

Start with a Diagnostic

Two weeks. €5,000. A mapped bottleneck and a production-ready plan — with no obligation to proceed to a Build.

Start a Diagnostic →