The Biggest Shift in AI Right Now Isn't Technical -- It's Behavioural

The Technology Improved. The Behaviour Didn't.

GPT-4o, Claude 3.5, Gemini 2.0, longer context windows, tool use, multi-modal reasoning -- the last eighteen months delivered more raw AI capability than the previous five years combined. And yet McKinsey's latest State of AI research shows that only 39% of organisations report enterprise-level EBIT impact from AI, despite near-universal adoption at some level.

The technology story gets all the attention. Stronger models, smarter agents, better tooling. It is obvious and easy to narrate.

The less obvious story is behavioural. How do organisations actually change the way they work when AI is present? Not the tooling question. The human question. Because right now, most enterprises have given people powerful AI tools and absolutely no shared understanding of how, when, or whether to use them. The result is a thousand individual experiments with no collective learning. That is not adoption. That is chaos with good intentions.

Diagnosis: Individual Hacks, Not Enterprise Norms

Here is what I keep seeing. A data engineer uses Claude to generate dbt model scaffolding. A marketing analyst uses GPT-4 to draft campaign briefs. A finance team pastes quarterly figures into ChatGPT for narrative summaries. Each of these is rational. None of them are coordinated.

The problem is not that people are using AI. The problem is that nobody has established shared norms for how AI fits into enterprise workflows. There are no agreed standards for what outputs require human review. No shared vocabulary for describing AI-assisted vs. AI-generated work. No escalation path when an AI output looks plausible but feels wrong.

This creates three specific failures:

Data quality degrades silently. When analysts use AI to transform or summarise data without a verification step, errors compound. A hallucinated metric in a board pack does not announce itself. It looks exactly like a real one. The dbt Semantic Layer exists precisely to enforce consistent metric definitions across tools -- but it only works if people actually route their queries through it instead of asking an LLM to improvise.

Governance becomes reactive. Without behavioural norms, governance teams discover AI misuse after the fact. A compliance officer finds out that customer data was pasted into an external LLM three weeks later, through an incident report. By then, the damage is done.

Adoption ROI disappoints. Individual productivity gains are real but small. The compounding returns come from redesigned processes -- and you cannot redesign a process until you understand the new behavioural patterns it requires.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#00d4ff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#00d4ff', 'edgeLabelBackground': '#0a0f1e'}}}%%
flowchart TD
    A["AI tools deployed"] --> B{"Behavioural norms\nestablished?"}
    B -->|"No"| C["Individual experimentation"]
    C --> D["No shared patterns"]
    D --> E["Silent data quality issues"]
    D --> F["Reactive governance"]
    D --> G["Marginal ROI"]
    B -->|"Yes"| H["Coordinated adoption"]
    H --> I["Shared review standards"]
    I --> J["Proactive governance"]
    J --> K["Compounding returns"]

    style A fill:#1a2540,stroke:#00d4ff,color:#ffffff
    style B fill:#1a2540,stroke:#ffb347,color:#ffffff
    style C fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style D fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style E fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style F fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style G fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style H fill:#1a2540,stroke:#00ff88,color:#ffffff
    style I fill:#1a2540,stroke:#00ff88,color:#ffffff
    style J fill:#1a2540,stroke:#00ff88,color:#ffffff
    style K fill:#1a2540,stroke:#00ff88,color:#ffffff

Reframe: Hand Washing and the Semmelweis Gap

The most powerful reframe I know for this problem comes from public health. Not from tech.

In 1847, Ignaz Semmelweis demonstrated that hand washing between patients reduced maternal mortality in Vienna General Hospital from 18% to under 2%. The evidence was overwhelming. Colleagues rejected him. Hospitals took decades to adopt the practice. Semmelweis died in an asylum in 1865, his insight vindicated but his career destroyed.

Hand washing was not a technology breakthrough. Soap existed. Water existed. The intervention was purely behavioural -- a change in practice between one task and the next. And it was resisted not because the evidence was weak, but because changing behaviour is harder than changing tools.

AI adoption faces the same gap. I am calling it the Semmelweis Gap: the distance between a capability being available and an organisation actually changing its behaviour to use it well.

The parallel is specific. Semmelweis did not need a new invention. He needed doctors to wash their hands between the morgue and the maternity ward. Enterprises do not need better models. They need people to verify AI outputs before they enter a report, to route metric queries through the semantic layer instead of improvising, to flag uncertainty instead of smoothing it over.

The technology exists. The behaviour does not. And the behaviour is the hard part.

This is not a training problem. Training teaches people what AI can do. Behavioural enablement teaches people what they should do -- and builds the organisational systems that make the right behaviour easier than the wrong one.

Framework: The NRL Model for Behavioural AI Enablement

Behavioural enablement requires three interlocking components. I use the NRL framework: Norms, Rituals, Literacy. Each addresses a different failure mode, and none works in isolation.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#00d4ff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#00d4ff', 'edgeLabelBackground': '#0a0f1e'}}}%%
flowchart LR
    N["NORMS\nWhat to automate?\n\nAutomation boundaries\nUse policies\nOutput classification"]
    R["RITUALS\nReview patterns\n\nHITL gates\nReview cadence\nEscalation protocols"]
    L["LITERACY\nAsk, verify, escalate\n\nPrompt design by role\nVerification techniques\nConfidence calibration"]

    N --> R --> L
    L -.->|"Feedback"| N

    style N fill:#1a2540,stroke:#00d4ff,color:#ffffff,stroke-width:2px
    style R fill:#1a2540,stroke:#ffb347,color:#ffffff,stroke-width:2px
    style L fill:#1a2540,stroke:#00ff88,color:#ffffff,stroke-width:2px

Norms answer "what should we automate, and what should we not?" This is not a blanket policy. It is workflow-specific. A norm might say: AI can draft customer communications, but a human must review before sending. AI can generate SQL queries against the semantic layer, but cannot modify production tables. AI can summarise meeting notes, but cannot make commitments on behalf of the organisation.

Norms need to be concrete enough that someone can follow them without interpretation. "Use AI responsibly" is not a norm. "All AI-generated financial figures must be verified against the source system before inclusion in any client-facing document" is a norm.

Rituals answer "how do we review AI work?" These are the recurring patterns that make norms operational. A weekly AI output review where a team examines the AI-generated work from the past sprint. A pre-publish check for any content that includes AI-assisted analysis. A monthly governance review of AI usage patterns, looking for drift.

Rituals matter because norms decay without them. Organisations write excellent AI usage policies in January and find them completely ignored by March. Not maliciously. People just optimise for speed, and without a review rhythm, the path of least resistance wins.

Literacy answers "how do I actually work with AI well?" This is not generic prompt engineering. It is role-specific capability building. An analyst needs to know how to verify an AI-generated insight against source data. A process owner needs to know how to evaluate whether an AI suggestion actually improves a workflow or just accelerates a bad one. A data engineer needs to understand when AI-generated code needs extra testing versus when it is safe to trust.

The critical insight: these three components are load-bearing for technical architecture, not just organisational culture. Norms determine your data quality requirements -- if AI can generate metrics, you need a semantic layer enforcing definitions. Rituals determine your governance infrastructure -- review cadences require audit logs, output versioning, lineage tracking. Literacy determines your adoption ROI -- poorly prompted AI wastes compute and produces garbage.

Behavioural enablement is not soft. It is a core technical requirement.

Application: Lemonade's Behavioural Infrastructure for AI Claims

Lemonade Insurance illustrates the NRL model in practice -- even if they would not call it that. Their AI Jim processes 55% of claims fully automated, including a record 2-second claim settlement. But the impressive speed masks the behavioural infrastructure that makes it work.

Norms. Lemonade explicitly defines which claims AI handles end-to-end and which require human review. AI processes straightforward claims -- clear damage, matching documentation, within policy limits. Complex claims, disputed liability, and high-value cases route to human adjusters. These boundaries are embedded in the system, not written in a policy document nobody reads.

Rituals. The human claims team continuously reviews AI decisions, looking for pattern errors and edge cases the model mishandles. Fraud detection operates as a parallel review layer -- AI flags suspicious patterns, humans investigate. This is not ad hoc. It is a structured review rhythm that catches drift before it becomes a data quality incident.

Literacy. Lemonade's claims team does not just "use AI tools." They understand the AI's decision logic well enough to know when to trust it and when to override it. This is role-specific capability building, not generic prompt engineering training.

The contrast with Klarna is instructive. Klarna replaced 700 customer service agents with AI without establishing clear behavioural norms -- no explicit boundaries on what AI should and should not handle, no review rituals, no role-specific literacy for the remaining team. Customer satisfaction dropped. CEO Sebastian Siemiatkowski admitted they "focused too much on efficiency." Klarna had the technology. It lacked the behavioural layer.

The Microsoft Power BI March 2026 update introduced natural language query capabilities that make this even more relevant. When anyone in an organisation can ask questions of data in plain English, the behavioural layer -- what questions are appropriate, how to verify answers, when to escalate -- becomes the primary control surface. The technology just made the behavioural problem more urgent.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#00d4ff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#00d4ff', 'edgeLabelBackground': '#0a0f1e'}}}%%
flowchart TD
    subgraph BEFORE["BEFORE NRL"]
        B1["4% productivity gain"]
        B2["2 quality incidents"]
        B3["Compliance blocks it"]
    end

    BEFORE -->|"Apply NRL"| AFTER

    subgraph AFTER["AFTER NRL — 2 quarters"]
        A1["19% productivity gain"]
        A2["Zero quality incidents"]
        A3["Compliance advocates"]
    end

    style B1 fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style B2 fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style B3 fill:#1a2540,stroke:#ff6b6b,color:#ffffff
    style A1 fill:#1a2540,stroke:#00ff88,color:#ffffff
    style A2 fill:#1a2540,stroke:#00ff88,color:#ffffff
    style A3 fill:#1a2540,stroke:#00ff88,color:#ffffff

Implication: The Bottleneck Has Moved

The bottleneck in enterprise AI is no longer model capability. It is not even data infrastructure, though that remains important. The bottleneck is behavioural. Organisations that treat AI enablement as a training exercise will continue to see marginal returns. Organisations that treat it as a behavioural design challenge -- establishing norms, embedding rituals, building role-specific literacy -- will capture the compounding gains that everyone else is wondering why they cannot reach.

Semmelweis had the evidence. He lacked the behavioural infrastructure. Do not repeat his mistake with better technology.

Sources

McKinsey - The State of AI -- Enterprise AI adoption and EBIT impact data
dbt Semantic Layer YAML Specification -- Metric definition and governance layer
Microsoft Power BI March 2026 Update -- Natural language query capabilities
Lemonade Sets New World Record -- Lemonade Blog; AI Jim processes 55% of claims
Klarna AI Humans Return on Investment -- Fortune, May 2025
Semmelweis, I. (1861). Die Aetiologie, der Begriff und die Prophylaxis des Kindbettfiebers -- Original hand washing research

Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai