Framework · Readiness

Are you ready for an 8-week AI build? A readiness checklist

TL;DR

Most AI builds that fail were not ready to start. This checklist assesses readiness across five areas — data, stakeholder alignment, success criteria, infrastructure access, and commercial commitment. If you are weak on more than one, a Diagnostic comes first; if you are strong across all five, you are ready to build.

An eight-week fixed-price build only works if the ground is prepared. The most common reason a build slips is not the engineering — it is that one of five preconditions was missing and nobody checked.

Use this checklist before committing to a Build. It is the readiness assessment a Diagnostic performs, made into something you can run yourself.

Interactive

Are you ready for an 8-week build?

Rate yourself across the five preconditions an eight-week fixed-price build depends on. A gap is not a disqualification — it is a thing to close before the clock starts, which is exactly what a Diagnostic does.

Representative data exists, is accessible, and is good enough to build evals from

One accountable decision-maker owns the outcome — not a committee

You can state what 'working' means in measurable terms

A team can provision in your environment without a months-long approval chain

The budget and the eight-week calendar are genuinely committed

Your readiness0/10

Start with a Diagnostic

Enough preconditions are open that starting a Build now would be risky. A two-week Diagnostic exists precisely to resolve these unknowns and turn a 'not yet' into a 'ready'.

Areas to firm up first

Representative data exists, is accessible, and is good enough to build evals from
One accountable decision-maker owns the outcome — not a committee
You can state what 'working' means in measurable terms
A team can provision in your environment without a months-long approval chain
The budget and the eight-week calendar are genuinely committed

Start a Diagnostic →

How to use this

Score yourself honestly across the five areas below. A 'no' is not a disqualification — it is a thing to fix before the clock starts. The point of the checklist is to surface the gaps now, when they are cheap to close, rather than in week three of a build, when they are expensive.

Strong across all five means you are build-ready. Weak in one means close it first. Weak in two or more means start with a Diagnostic, which exists precisely to resolve these unknowns.

The five areas

1) Data readiness — does representative data exist, is it accessible, and is it good enough to build evals from? 2) Stakeholder alignment — is there one decision-maker who owns the outcome, not a committee? 3) Success criteria — can you state what 'working' means in measurable terms? 4) Infrastructure access — can a team provision in your environment without a months-long approval chain? 5) Commercial commitment — is the budget and the eight-week calendar genuinely committed?

These map directly onto the four principles: data and success criteria feed evals; infrastructure access enables owned infrastructure; commitment makes the fixed clock real.

Reading your score

If you are strong on all five, a Build can start with confidence and the fixed price is low-risk. If data or success criteria are weak, those are exactly what a Diagnostic produces — it maps the bottleneck and writes the eval spec, turning a 'not yet' into a 'ready'.

If stakeholder alignment or commitment is the gap, fix that before spending on engineering at all. No methodology survives a build that the organisation has not actually committed to.

What to do with the result

A strong score means your next step is a Diagnostic to lock scope and price the Build — short, because you are already prepared. A mixed score means the Diagnostic does double duty: it closes the readiness gaps and produces the build plan.

Either way, the checklist has done its job if it moved a problem from week three of a build to the week before it starts.

How the five areas trade off against each other

The checklist reads as five independent scores, but in practice they interact, and the interactions are where the real judgement lives. Strength in one area can compensate for weakness in another — or expose it. A committed decision-maker with a cleared calendar can pull messy data into shape fast, so a 'no' on data alongside a strong 'yes' on stakeholder alignment is a recoverable position. The reverse is not: pristine data behind a committee with no owner tends to stay unused, because nobody is empowered to decide what 'good enough' means.

The two areas that cannot be compensated for are stakeholder alignment and commitment. They are upstream of everything else — they are what funds the work that fixes the other gaps. Data, success criteria, and infrastructure access are all things a Diagnostic can resolve, because they are engineering and information problems. Alignment and commitment are organisational, and no amount of method substitutes for them. If you are weighing a mixed score, weight those two heavily and treat the technical three as closeable.

There is also a hidden coupling between data and success criteria: you often cannot write a measurable definition of 'working' until you have looked at representative data, and you cannot judge whether your data is good enough until you know what you are trying to measure. They are chicken-and-egg, which is exactly why the Diagnostic tackles both in the same two weeks rather than sequentially — the eval spec and the data assessment are written together because each depends on the other.

Edge cases where the simple rule misleads

The rule — strong on five, build; weak on two or more, diagnose first — holds in the ordinary case, but a few situations break it, and they are worth naming plainly. The first is the false 'yes' on data. Many operators score themselves strong because the data exists somewhere, only to discover in week two that it is unlabelled, inconsistent across systems, or locked behind an export that takes legal six weeks to approve. Existing is not the same as accessible-and-representative; if you cannot put a sample in front of an engineer this week, score it honestly as a 'not yet'.

The second is the false 'yes' on success criteria. A target like 'reduce handling time' feels measurable but is not an eval — it is an outcome with no per-input definition of a correct answer. The genuine test is narrower: for a single representative input, can you say what the system should output and how you would score whether it did? If you cannot, you have a business goal, not a success criterion, and the gap is larger than the checklist score suggests.

The third edge case is the inverse: a perfect five-out-of-five on a problem that is too small to need an eight-week Build at all. Readiness measures whether you can build, not whether you should. A fully ready company with a one-week problem is better served by a tightly scoped piece of work than by paying for a clock it does not need — and an honest Diagnostic will say so rather than upsell the Build.

The most common ways operators misuse this checklist

The first misuse is grading optimistically to justify a decision already made. The checklist only works as a diagnostic if you let it return an uncomfortable answer; scored to confirm a build you have already committed to internally, it becomes theatre. The discipline is to treat every 'yes' as a claim you would have to defend with evidence in week one — if you would struggle to produce that evidence, it is a 'no'.

The second is treating the five areas as a gate to clear once rather than a state to maintain. Readiness can decay: the accountable owner gets reassigned, the budget gets re-allocated mid-quarter, the data source you assessed gets migrated. A score taken three months before a build started can be stale by the time the clock starts. Re-run it close to the actual start date, because the cost of a precondition that quietly lapsed is the same whether it was never there or simply went away.

The third misuse is using the checklist to grade a vendor rather than yourself. It is built to assess your side of the engagement — the preconditions you control. The vendor's method, evals discipline, and infrastructure stance are a separate question. A strong readiness score and a weak vendor still produces a poor build; the checklist removes the half of the risk that is yours to own, not the half that belongs to whoever you hire.

How the result feeds the Diagnostic

The checklist is not a substitute for the Diagnostic — it is the input that determines what the Diagnostic costs you in time and attention. A clean five-out-of-five does not mean you skip scoping; it means the two-week Diagnostic spends its time confirming and locking rather than discovering, and the resulting Build quote arrives faster and with a tighter range. A mixed score means the same two weeks do double duty, closing the open preconditions and producing the build plan in one pass, which is precisely what the Diagnostic is scoped to absorb.

What changes between the two cases is where the Diagnostic's effort goes, not whether you need it. With strong data and criteria, the Diagnostic concentrates on architecture and the eval thresholds; with weak ones, it spends its first days on data access and on turning a business goal into a scorable definition of done. Either way the deliverable is the same — a mapped bottleneck, an eval spec, and a fixed-price Build scope — but the readiness score tells you in advance which conversations will be hard.

This is also why more than 60% of Diagnostics proceed to a Build: by the time the Diagnostic ends, the readiness gaps that would otherwise have surfaced mid-build have already been resolved or named. The honest limit of the checklist is that it cannot do this resolution itself — it can only tell you whether the Diagnostic will be a short confirmation or a longer piece of groundwork. Both are legitimate starting points; the only wrong move is starting the Build clock with the gaps still open.

AI Product Engineering Build vs buy calculator Evals before features AI engineering glossary

Frequently asked questions

What makes a company ready for an AI build?

Strength across five areas: representative accessible data, a single accountable decision-maker, measurable success criteria, fast infrastructure access in your environment, and genuine budget and calendar commitment.

What if our data isn't ready?

Then a Diagnostic comes first. Building the eval suite requires representative data; if it is missing or messy, the Diagnostic surfaces that and defines what is needed before a fixed-price Build is sensible.

Do we need success metrics before starting?

Yes — or a Diagnostic to define them. 'Working' must be measurable before a build, because the eval suite and the fixed price are written against it. Undefined success is the most common cause of open-ended AI projects.

Why does stakeholder alignment matter so much?

Because a build with a committee and no single owner stalls on decisions. One accountable decision-maker keeps an eight-week clock realistic; a build the organisation has not truly committed to will slip regardless of method.

What is the next step after the checklist?

A two-week Diagnostic — short if you scored strongly, or doing double duty to close gaps and produce the build plan if your score was mixed.

Can we start a Build with a single weak area, or must everything be green first?

A single weak area is usually closeable without delaying the Build, especially if it is one of the technical three — data, success criteria, or infrastructure access. Close it first if it is quick; otherwise the Diagnostic resolves it as part of scoping. The areas you cannot start weak on are stakeholder alignment and commitment, because no method compensates for an organisation that has not truly decided to build.

How long does readiness last once we have it?

Treat readiness as a state, not a permanent pass. It can decay when an owner is reassigned, a budget is re-allocated mid-quarter, or a data source is migrated. A score taken months before the work starts can be stale by the time the clock begins, so re-run the checklist close to the actual start date — a precondition that quietly lapsed costs the same as one that was never there.

We scored strongly on all five — do we still need a Diagnostic?

Yes, but it is shorter and lower-risk. A strong score does not remove the need to map the bottleneck and write the eval spec the fixed Build price is quoted against; it means the Diagnostic confirms and locks rather than discovers. Skipping straight to a fixed-price Build without that step means the price is a guess, however ready you are.

Does a perfect score mean an eight-week Build is the right move?

Not necessarily. The checklist measures whether you can build, not whether you should. A fully ready company with a problem that needs only a week of work is better served by tightly scoped work than by paying for a clock it does not need. Readiness is a precondition for a Build, not an argument for one — an honest Diagnostic will tell you if your problem is smaller than the engagement.

Start with a Diagnostic

Two weeks. €5,000. A mapped bottleneck and a production-ready plan — with no obligation to proceed to a Build.

Start a Diagnostic →