AI Doesn't Fail Because of Technology. It Fails Because of Trust.

The Trust Deficit

Here is a number that should unsettle every AI programme sponsor: McKinsey's latest State of AI survey found that inaccuracy remains the top reported risk of generative AI adoption, cited by more respondents than any other concern -- above cost, above privacy, above workforce disruption.

Not accuracy. Trust in accuracy.

The models are good enough. GPT-4o passes the bar exam. Claude writes production code. Gemini reasons over million-token contexts. Yet most enterprises still pen their agents into sandbox demos and tightly supervised pilots. The bottleneck is not capability. It is confidence. Specifically, the confidence of the humans who must sign off on letting an agent operate in production -- processing real claims, routing real money, talking to real customers.

I keep hearing the same phrase from CDOs and CIOs: "The tech works. I just can't get the business to trust it." That sentence contains the entire problem.

Diagnosis: Why Trust Keeps Breaking

Trust breaks in predictable ways. The failure modes are remarkably consistent across industries.

Governance gaps. An agent makes a decision. Who approved the policy that let it? Who is accountable when it gets it wrong? In most organisations, the answer is a shrug. No decision authority matrix exists for AI. No one defined which actions require human approval and which do not. The agent operates in an accountability vacuum, and the first time it does something unexpected, leadership pulls the plug. Not because the action was catastrophic -- but because nobody could explain who was responsible for it.

Opacity. The agent processed 400 insurance claims overnight. Three were flagged as anomalies by a downstream team. When the operations lead asks "why did the agent approve these?", the answer is silence. No decision log. No reasoning trace. No audit trail. The agent is a black box with production access, and that terrifies every compliance officer I have spoken to.

Inconsistency. The agent handles the same claim type differently on Tuesday than it did on Monday, because the prompt changed, or the context window shifted, or the retrieval step pulled different documents. Non-determinism is a feature of LLMs and a dealbreaker for enterprise operations. Humans tolerate variation from other humans -- we expect it. We do not tolerate it from systems we were told would be reliable.

These three failures -- governance, transparency, reliability -- form a trust debt that compounds. Each incident erodes confidence. Each erosion tightens the constraints on the next pilot. Eventually the organisation concludes "AI doesn't work for us" when the real conclusion should be "we never built the infrastructure for trust."

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph TD
    INCIDENT["Agent Incident
(unexpected output)"]
    INCIDENT --> Q1{"Can you explain
who approved this?"}
    Q1 -->|"No"| GOV["Governance Gap"]
    Q1 -->|"Yes"| Q2{"Can you explain
why it decided this?"}
    Q2 -->|"No"| TRANS["Transparency Gap"]
    Q2 -->|"Yes"| Q3{"Will it behave
the same tomorrow?"}
    Q3 -->|"No"| REL["Reliability Gap"]
    Q3 -->|"Yes"| TRUST["Trust Maintained"]
    GOV --> ERODE["Trust Eroded"]
    TRANS --> ERODE
    REL --> ERODE
    ERODE --> RESTRICT["Tighter Constraints
on Next Deployment"]
    RESTRICT --> STALL["AI Programme Stalls"]
    style GOV fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style TRANS fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style REL fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style ERODE fill:#2a1a1a,stroke:#ffb347,color:#ffb347
    style STALL fill:#2a1a1a,stroke:#ff6b6b,color:#ff6b6b
    style TRUST fill:#0a2a1e,stroke:#00ff88,color:#00ff88,stroke-width:2px
    style INCIDENT fill:#1a2540,stroke:#ffffff,color:#00d4ff,stroke-width:2px
    style RESTRICT fill:#1a2540,stroke:#ffb347,color:#ffb347

Reframe: What Aviation Already Figured Out

Commercial aviation did not become the safest form of transport by building better planes. It became safe by building trust systems around imperfect machines.

Think about what the industry actually did. Checklists -- standardised pre-flight procedures that eliminated variance in how pilots prepared for takeoff. Black boxes -- flight data recorders and cockpit voice recorders that made every incident traceable, every failure analysable. Crew Resource Management (CRM) -- a structured protocol for how humans and automated systems share authority in the cockpit. And critically, blame-free incident reporting through systems like NASA's Aviation Safety Reporting System, where pilots report near-misses without career consequences, feeding a learning loop that prevents the next accident.

None of these innovations improved the aircraft engine. They improved the system of trust around it.

AI needs the same infrastructure. Not better models -- better black boxes. Not smarter agents -- clearer checklists for what they can and cannot do. Not more capable copilots -- structured protocols for how humans and agents share decision authority.

The NIST CAISI Request for Information on AI Agent Systems published in January 2026 is moving in this direction, explicitly calling for standards around agent accountability, boundary enforcement, and auditability. OWASP's Top 10 for Large Language Model Applications maps the attack surface. The building blocks for trust standards exist. Most enterprises have not assembled them.

Framework: The Three Pillars of Agent Trust

Trust is not a feeling. It is an engineering outcome. Build these three systems and trust follows.

1. Governance: who decides what.

Define a decision authority matrix for every agent. Which actions can the agent take autonomously? Which require human-in-the-loop approval? Which are prohibited entirely? This is not a policy document that lives in SharePoint. It is an enforceable configuration -- coded into the agent's orchestration layer, auditable, versioned.

Tools like NemoClaw enforce governed context boundaries -- ensuring agents access only approved data through the semantic layer, not raw tables. The agent cannot exceed its mandate because the mandate is architecturally enforced, not just documented.

2. Transparency: what happened and why.

Every agent action must produce a decision record: what input it received, what reasoning it applied, what tools it called, what output it generated, and what confidence level it assigned. This is the AI equivalent of the black box. When something goes wrong -- and it will -- you need the full trace. Not a log file buried in CloudWatch. A structured, queryable decision audit that a compliance team can review and a domain expert can understand.

3. Reliability: predictable behaviour under known conditions.

Non-determinism is acceptable in creative tasks. It is unacceptable in claims processing, payment routing, or regulatory reporting. Reliability engineering for agents means: pinned model versions, deterministic retrieval configurations, regression test suites that run before every deployment, and behavioural contracts that define expected output ranges for known input classes. The agent should behave the same way on Tuesday as it did on Monday for the same inputs.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph LR
    G["GOVERNANCE
Decision authority matrix
Enforced boundaries
Accountability chains"]
    T["TRANSPARENCY
Decision records
Reasoning traces
Queryable audit logs"]
    R["RELIABILITY
Pinned model versions
Regression suites
Behavioural contracts"]
    G --> TRUST["AGENT
TRUST"]
    T --> TRUST
    R --> TRUST
    TRUST --> PROD["Production
Deployment"]
    PROD --> VALUE["Enterprise
Value"]
    style G fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style T fill:#1a2540,stroke:#ffb347,color:#ffb347,stroke-width:2px
    style R fill:#1a2540,stroke:#00ff88,color:#00ff88,stroke-width:2px
    style TRUST fill:#1a2540,stroke:#ffffff,color:#ffffff,stroke-width:3px
    style PROD fill:#0a2a1e,stroke:#00ff88,color:#00ff88,stroke-width:2px
    style VALUE fill:#0a2a1e,stroke:#00ff88,color:#00ff88,stroke-width:2px

Application: Klarna and Lemonade -- Two Trust Stories

Two public cases illustrate trust infrastructure as the deciding factor.

Klarna: what happens without it. Klarna replaced 700 customer service agents with AI, announced it was doing the equivalent work at a fraction of the cost, and celebrated publicly. Then customer satisfaction dropped. CEO Sebastian Siemiatkowski admitted the company "focused too much on efficiency." They started rehiring humans. The AI was capable. But the trust infrastructure -- escalation paths, quality feedback loops, governance over when AI should hand off to a human -- was not there. Without those systems, capability alone could not sustain production deployment.

Lemonade: what happens with it. Lemonade's AI Jim processes 55% of claims fully autonomously in a regulated insurance market. The trust framework is visible in the architecture.

Governance: a decision authority matrix defines clear tiers. Simple claims within defined parameters -- agent decides autonomously. Complex or high-value claims route to human adjusters. These tiers are enforced architecturally, not just documented.

Transparency: every claim decision produces a structured record. When something goes wrong, the full trace is available -- what data the agent saw, what rules it applied, why it decided what it decided. Regulators can audit any decision.

Reliability: the system handles 96% of first notices of loss consistently. The 2-second claim settlement record is not a one-off demo -- it represents a repeatable process within defined parameters.

Lemonade's operations leaders can explain to regulators what the system is doing. Klarna's could not explain why customer satisfaction was dropping. The difference is not model quality. It is trust infrastructure.

Implication: Trust Is the Product

The enterprises that will capture value from agentic AI are not the ones with the best models. They are the ones that build trust infrastructure first -- governance, transparency, reliability -- as deliberately as they build data pipelines. Aviation did not wait for a perfect aircraft to start flying passengers. It built the systems that made imperfect aircraft trustworthy. The same discipline applied to AI agents is what separates a successful production deployment from another cancelled pilot. Trust is not a soft problem. It is a systems engineering problem. Treat it like one.

Sources

McKinsey, "The State of AI" -- mckinsey.com
NIST CAISI, "Request for Information Regarding AI Agent Systems" (January 2026) -- nist.gov
OWASP, "Top 10 for Large Language Model Applications" -- owasp.org
Fortune, "Klarna reverses AI-first customer service strategy" -- fortune.com
Lemonade Insurance, "AI Jim sets new world record" -- lemonade.com

Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai