Why Most AI Programs Stall After the First Use Case

The Second Use Case Is the Real Test

Eighty-nine percent of organisations have AI pilot programs. The number that have scaled beyond pilot? Roughly one in four, according to MIT Sloan research. That gap should alarm anyone running an AI program.

The first use case always works. It has executive attention, the best data engineers, a handpicked dataset, and a forgiving success criteria. It ships. The steering committee claps. Someone writes a LinkedIn post. Then the second use case starts, and everything grinds to a halt.

Not because the model failed. Because every decision made during the first use case was a one-off. The data pipeline was custom. The feature store was a folder. The evaluation was manual. The governance was a conversation. None of it transfers. The World Economic Forum's AI at Work insights highlight this pattern explicitly: organisations invest heavily in initial AI deployments but fail to build the organisational scaffolding that makes subsequent deployments viable.

Diagnosis: What Actually Breaks

The bottleneck is not technical talent. It is not model quality. It is the absence of shared infrastructure between use cases.

Data access is rebuilt every time. The first use case team spent six weeks negotiating access to customer data, cleaning it, building a pipeline. The second use case team needs the same customer data plus product data. They start from scratch. Different pipeline, different cleaning logic, different assumptions about what "active customer" means. The pattern is depressingly common: two teams in the same organisation spending three months each building pipelines to the same source system -- with contradictory business logic.

Semantic definitions do not exist. What is "revenue"? Booked revenue? Recognised revenue? ARR? The first use case team defined it one way in their notebook. The second team defined it differently. Both are correct within their context. Neither is reusable. This is the problem dbt's semantic layer was designed to solve -- a single, governed place where business metrics are defined once and consumed everywhere. Most organisations do not have one.

Evaluation is ad hoc. The first model was evaluated by the data scientist who built it. "Looks good" was the standard. The second use case is higher stakes -- credit scoring, claim routing, demand forecasting. Suddenly "looks good" is not enough, but there is no evaluation harness, no benchmark dataset, no systematic way to test model behaviour before deployment.

Governance is invisible. Who approved the first model for production? Usually the same person who built it. That works once. It does not work at five, ten, or fifty use cases.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph LR
    UC1["Use Case 1"]
    UC2["Use Case 2"]
    UC3["Use Case 3"]
    UC1 --> D1["Custom Pipeline"]
    UC1 --> E1["Ad-hoc Eval"]
    UC1 --> G1["Informal Approval"]
    UC2 --> D2["Custom Pipeline"]
    UC2 --> E2["Ad-hoc Eval"]
    UC2 --> G2["Informal Approval"]
    UC3 --> D3["Custom Pipeline"]
    UC3 --> E3["Ad-hoc Eval"]
    UC3 --> G3["Informal Approval"]
    style UC1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style UC2 fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style UC3 fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style D1 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style D2 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style D3 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style E1 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style E2 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style E3 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style G1 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style G2 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style G3 fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b

Every use case rebuilds the same components from scratch. The cost is not just duplication -- it is contradiction. Three teams, three definitions of "customer," three governance standards.

Reframe: Cities Solved This a Century Ago

Here is what makes this problem interesting. Civil engineering solved it in the 1800s.

Before municipal infrastructure, every building in London had its own water supply -- a well, a cistern, a deal with a water carrier. Sewage was the building owner's problem. Gas lighting required per-building contracts. Building one structure was straightforward. Building a city was impossible.

The breakthrough was not better buildings. It was shared infrastructure. Standardised water mains, sewage systems, electrical grids. You did not build each building with its own plumbing standard. You built the common layer once, and every subsequent building connected to it.

Enterprise AI is stuck in the pre-municipal phase. Each use case is a standalone building with its own well. The first building works fine. The second building works fine. But by the fifth, you are drowning in incompatible plumbing, and the cost of each new building keeps rising instead of falling.

The unit economics of AI programs should improve with scale. Shared data pipelines, common semantic definitions, reusable evaluation frameworks -- each new use case should be cheaper and faster than the last. When the opposite happens, when use case three takes longer than use case one, it is a clear signal that the common layer is missing.

I'm not sure this analogy stretches to every dimension -- cities also have zoning laws and building codes that enterprises rarely match -- but the core insight holds. Shared scaffolding is what makes scale possible.

The Minimum Viable Common Layer

The fix is not a massive platform investment. It is identifying the smallest set of shared components that make the second use case meaningfully faster than the first. I call this the Minimum Viable Common Layer (MVCL).

Four components. Not twelve. Not a two-year platform programme.

1. Shared data access. A curated set of core data products -- customer, product, transaction, interaction -- available through a single interface. Not a data lake. Not a warehouse dump. Governed, documented, access-controlled data products that any use case team can consume without a six-week data engineering sprint. Unity Catalog on Databricks or Snowflake's governance layer can serve this role, but the technology matters less than the decision to build it.

2. Semantic definitions. One place where "revenue," "active customer," "churn," and "conversion" are defined, versioned, and enforced. The dbt semantic layer is one implementation. The point is that business logic lives in one governed location, not scattered across fifty notebooks.

3. Evaluation harness. A standard way to test model outputs before deployment. Benchmark datasets, evaluation metrics, regression tests. Not necessarily complex -- even a shared test suite with known-good examples and known-bad examples moves you from "looks good" to "passes criteria."

4. Governance framework. A lightweight decision framework: what risk tier is this use case? Who approves deployment? What monitoring is required? This does not need to be a bureaucracy. A one-page decision matrix per risk tier is enough to start.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph TD
    MVCL["Minimum Viable Common Layer"]
    MVCL --> DA["Shared Data Access\nCurated data products"]
    MVCL --> SD["Semantic Definitions\nSingle source of business logic"]
    MVCL --> EH["Evaluation Harness\nStandardised testing"]
    MVCL --> GF["Governance Framework\nRisk-tiered approvals"]
    DA --> UC["Use Case Teams"]
    SD --> UC
    EH --> UC
    GF --> UC
    UC --> V["Faster delivery,\nconsistent quality,\nlower marginal cost"]
    style MVCL fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style DA fill:#1a2540,stroke:#ffb347,color:#ffb347
    style SD fill:#1a2540,stroke:#ffb347,color:#ffb347
    style EH fill:#1a2540,stroke:#ffb347,color:#ffb347
    style GF fill:#1a2540,stroke:#ffb347,color:#ffb347
    style UC fill:#1a2540,stroke:#ffffff,color:#ffffff
    style V fill:#0a2a1e,stroke:#00ff88,color:#00ff88,stroke-width:2px

The tradeoff is real. Building the common layer slows down use case one. The team that could ship a demo in eight weeks now takes twelve because they are building reusable components instead of throwaway scripts. This is hard to sell to impatient sponsors. But the payoff hits on use case two: what would have been another twelve-week build becomes a four-week build. By use case five, teams are deploying in days.

Application: How ING Built the Common Layer

ING Bank offers a public example of what building shared AI infrastructure looks like. When ING partnered with McKinsey to build a GenAI chatbot in just 7 weeks, achieving a 25% productivity gain, the speed was not accidental. ING had already invested in shared data infrastructure and governance standards across its banking operations. The chatbot team did not start from scratch on data access, metric definitions, or compliance frameworks.

Contrast this with the typical pattern. A bank builds an automated credit scoring model -- celebrated success. Then the fraud detection team needs the same customer data from a different angle. They spend eight weeks recreating the data pipeline, with subtle differences in how they define "customer tenure" and "transaction value." The evaluation is a spreadsheet. Governance is an email chain.

The difference is the common layer. ING's approach -- shared data products, governed definitions, standardised evaluation -- meant that each subsequent AI use case could build on existing infrastructure rather than reinventing it. McKinsey's documentation of the project highlights that the 7-week timeline was possible precisely because the foundational data and governance layers already existed.

The uncomfortable truth: building shared infrastructure before the first use case is politically difficult. The first use case often needs to exist to justify the investment. But the organisations that pause after use case one to build the common layer -- shared data products, a metric layer defining core business terms, standard evaluation harnesses, a lightweight approval matrix -- see dramatic acceleration. What would be another twelve-week build becomes a four-week build. By use case five, teams deploy in days.

The Real Question

The metric that matters is not "how many AI use cases have we launched." It is "how much faster and cheaper is use case N+1 compared to use case N?" If that number is not improving, your AI programme is not scaling. It is just repeating.

Most organisations are building standalone houses when they should be laying municipal pipes. The first house will take a bit longer. Every house after that will be fundamentally cheaper. That is the difference between an AI programme that stalls after the showcase and one that compounds.

Sources

MIT Sloan Management Review -- Scaling AI Results
World Economic Forum -- AI at Work Insights
dbt Labs -- Semantic Layer Documentation
McKinsey -- Banking on Innovation: How ING Uses Generative AI to Put People First

Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai