Most Enterprise "AI Automation" Is Just Faster Typing

Enterprises spent the last decade encoding business logic into workflows. Step A, then B, then C. Approvals at fixed gates. Exception handling via predefined branches. BPMN diagrams on whiteboards, translated into Airflow DAGs or Power Automate flows.

Now those same enterprises are plugging LLMs into these workflows and calling it AI transformation. It is not. It is the equivalent of putting a jet engine on a horse cart. The engine is powerful. The cart still only goes where the rails point.

MIT Sloan's research on scaling AI found that most organisations struggle to move beyond isolated pilots precisely because they bolt AI onto existing processes rather than rethinking the process shape itself. The constraint is not model capability. It is process architecture.

Diagnosis: Why Workflows Break Under Uncertainty

Classic workflows encode certainty. They assume you know the path before you start walking it. That assumption held when the work was predictable -- invoice processing, order fulfilment, compliance checks with stable rule sets.

Agent-driven work is different. Consider a due diligence review for a potential acquisition. The agent needs to pull financial filings, cross-reference them against news sentiment, identify regulatory risks specific to the target's operating jurisdictions, and flag inconsistencies that warrant deeper investigation. The sequence is not predetermined. What the agent finds at step two changes what it should do at step three.

Three specific breakdowns happen when you force this kind of work into a traditional workflow:

State amnesia. Workflows are stateless between steps. Each task gets inputs, produces outputs, passes them forward. But an agent reasoning over a complex problem needs memory -- what it tried, what failed, what context it accumulated. A claims processing agent that cannot remember it already checked the policy database is going to check it again. And again.

Brittle integrations. Workflows connect to systems through hard-coded API calls. When the agent needs a tool it was not pre-wired to, it stops. There is no protocol for "I need to look something up I was not designed to look up." Every new capability requires a developer to add a connector and redeploy.

Invisible reasoning. Workflows log task completion. Agent flows need to log why. What did the agent consider? What alternatives did it reject? When a workflow fails, you check the error log. When an agent flow produces a wrong answer, you need the full reasoning trace. Most workflow engines do not capture this.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph LR
    subgraph WORKFLOW ["Classic Workflow"]
        W1["Step A"] --> W2["Step B"] --> W3["Step C"] --> W4["Done"]
    end
    subgraph AGENTFLOW ["Agent Flow"]
        A1["Plan"] --> A2["Execute Tool"]
        A2 --> A3["Inspect Result"]
        A3 -->|"Revise plan"| A1
        A3 -->|"Continue"| A4["Next Action"]
        A4 --> A5["Inspect Result"]
        A5 -->|"Revise plan"| A1
        A5 -->|"Done"| A6["Complete"]
    end
    style WORKFLOW fill:#2a1a1a,stroke:#ff6b6b,color:#ffffff
    style AGENTFLOW fill:#0a2a1e,stroke:#00ff88,color:#ffffff
    style W1 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style W2 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style W3 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style W4 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style A1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
    style A2 fill:#1a2540,stroke:#ffb347,color:#ffb347
    style A3 fill:#1a2540,stroke:#00ff88,color:#00ff88
    style A4 fill:#1a2540,stroke:#ffb347,color:#ffb347
    style A5 fill:#1a2540,stroke:#00ff88,color:#00ff88
    style A6 fill:#1a2540,stroke:#00ff88,color:#00ff88

Reframe: "Yes, And..." -- What Improv Theatre Teaches About Agent Design

Here is the connection that reframed this problem for me: improv theatre.

In scripted theatre, every actor follows the script. Lines are memorised. Blocking is rehearsed. The director controls the outcome. Workflows operate the same way -- predefined steps, predictable results. If someone deviates, the show breaks.

Improv has a foundational principle: "yes, and..." One performer makes an offer. The next accepts it and builds on it. The scene emerges from what is discovered in the moment, not from what was planned beforehand. The performers are skilled, but the path is not scripted. What matters is their ability to respond to what actually happened, not what was supposed to happen.

Agent flows work the same way. The ReAct pattern -- Reasoning + Acting, introduced by Yao et al. at Princeton -- formalises exactly this. The agent reasons about the current state, takes an action, observes the result, then reasons again. Each step builds on what was discovered, not what was planned. Plan, act, observe, revise. "Yes, and..."

But improv is not chaos. The best improv troupes have structure: scene formats, character conventions, time limits, a shared understanding of what makes a scene work. The structure enables the freedom. Remove the structure and you get noise, not creativity.

Agent flows need the same kind of disciplined structure. Without it, "yes, and..." becomes "yes, and... we have no idea what happened or why."

Framework: Three Engineering Shifts for Agent Flows

Moving from workflows to agent flows requires three foundational changes. I have not seen an enterprise succeed at agentic AI without addressing all three.

Shift 1: State and Memory as First-Class Citizens

Workflows pass data between steps. Agent flows accumulate context.

An agent processing a complex insurance claim needs to remember: which documents it has reviewed, what inconsistencies it found, which external databases it queried, what hypotheses it formed and discarded. This is not a variable passed between tasks. It is a working memory that grows and evolves during execution.

Practically, this means persistent state stores -- not just message queues. Conversation-level memory, task-level memory, and cross-session memory are three different concerns, each requiring different storage and retrieval strategies. I'm not sure the industry has settled on the right abstractions here yet. Most teams I talk to are building custom state management, which is expensive and fragile.

Shift 2: Tool Contracts Instead of Ad-Hoc Integrations

Workflows call APIs. Agent flows negotiate with tools.

The difference matters. When a workflow calls an API, the integration is hard-coded: endpoint, payload format, authentication, error handling -- all baked in at development time. When an agent needs to interact with a system, it needs to discover what tools are available, understand their capabilities and constraints, invoke them correctly, and handle failures gracefully.

The Model Context Protocol (MCP) is the most promising effort to standardise this. MCP defines a contract: here is what this tool does, here are its inputs and outputs, here are its constraints. The agent does not need a custom integration for every system. It needs a protocol that lets it discover and use tools dynamically.

This is a genuine shift in integration architecture. Instead of point-to-point connectors maintained by platform teams, you get standardised tool contracts that agents can discover at runtime. The tradeoff is real, though -- dynamic tool discovery introduces latency and failure modes that static integrations avoid.

Shift 3: Evaluation and Tracing as Part of Development

Workflows have unit tests. Agent flows need evaluation harnesses.

You cannot unit-test an agent the way you test a function. The agent's behaviour depends on the sequence of observations it makes, the state it accumulates, and the reasoning it applies -- all of which vary between runs. An agent that produces the right answer through wrong reasoning is a time bomb.

Tracing every reasoning step, tool invocation, and state mutation is not a debugging afterthought. It is a development requirement. When your claims processing agent approves a payout it should not have, you need to reconstruct the full decision chain: what data did it see, what tools did it call, what reasoning led to the approval, and where did that reasoning go wrong.

OpenClaw encodes agent flows as plan, tool, verify, continue; NanoClaw workers are bounded executors scoped to single tool domains. This pattern -- explicit verification after every tool call, bounded execution scope per worker -- addresses both the tracing and the blast-radius problems simultaneously.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph TD
    S1["Shift 1: State + Memory"]
    S2["Shift 2: Tool Contracts"]
    S3["Shift 3: Eval + Tracing"]
    S1 --- CORE["Agent Flow\nEngineering"]
    S2 --- CORE
    S3 --- CORE
    S1D["Persistent context\nWorking memory\nCross-session recall"]
    S2D["MCP protocol\nDynamic discovery\nStandardised interfaces"]
    S3D["Reasoning traces\nStep-level verification\nEval harnesses"]
    S1 --> S1D
    S2 --> S2D
    S3 --> S3D
    style CORE fill:#1a2540,stroke:#ffffff,color:#00d4ff,stroke-width:2px
    style S1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
    style S2 fill:#1a2540,stroke:#ffb347,color:#ffb347,stroke-width:2px
    style S3 fill:#1a2540,stroke:#00ff88,color:#00ff88,stroke-width:2px
    style S1D fill:#1a2540,stroke:#ffffff,color:#ffffff
    style S2D fill:#1a2540,stroke:#ffffff,color:#ffffff
    style S3D fill:#1a2540,stroke:#ffffff,color:#ffffff

Application: Lemonade's Claims Pipeline -- Workflow to Agent Flow

Lemonade Insurance's AI Jim is the clearest public example of a claims pipeline redesigned from workflow to agent flow. The traditional insurance process is a classic linear workflow: receive claim, validate policy, assess damage, calculate payout, approve, pay. Industry average: days to weeks.

The agent flow version works differently. AI Jim receives the claim and plans its approach based on claim characteristics. A straightforward claim -- broken phone screen, stolen bicycle -- gets fast-tracked: policy validation, damage assessment, payout calculation, all in one pass. A complex or high-value incident triggers a different plan: additional verification, fraud pattern matching, human adjuster routing.

The critical difference is what happens when something unexpected surfaces. In a traditional workflow, an inconsistency between the claimant's account and the evidence would route to a human queue and wait. In Lemonade's agent flow, AI Jim revises its plan: it pulls additional data, runs a second analysis, and either resolves the inconsistency or escalates with a structured summary of exactly what does not add up and why.

The results are public. 55% of claims are processed fully autonomously. The record: a single claim settled in 2 seconds. 96% of first notices handled by AI. Complex claims still route to human adjusters, but with a pre-analysed brief instead of a raw file. The pattern confirms what I keep seeing across the industry: the hardest part is not building the agent -- it is building the state management and tracing infrastructure that makes the agent trustworthy.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
flowchart LR
    subgraph OLD ["Old: Linear Workflow"]
        O1["Receive"] --> O2["Validate"] --> O3["Assess"] --> O4["Calculate"] --> O5["Approve"] --> O6["Pay"]
    end
    subgraph NEW ["New: Agent Flow"]
        N1["Receive + Plan"] --> N2["Execute Tools"]
        N2 --> N3{"Unexpected\nfinding?"}
        N3 -->|"Yes"| N4["Revise Plan"]
        N4 --> N2
        N3 -->|"No"| N5["Resolve or\nEscalate"]
    end
    style OLD fill:#2a1a1a,stroke:#ff6b6b,color:#ffffff
    style NEW fill:#0a2a1e,stroke:#00ff88,color:#ffffff
    style N1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
    style N2 fill:#1a2540,stroke:#ffb347,color:#ffb347
    style N3 fill:#1a2540,stroke:#ffffff,color:#ffffff
    style N4 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
    style N5 fill:#1a2540,stroke:#00ff88,color:#00ff88
    style O1 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style O2 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style O3 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style O4 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style O5 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
    style O6 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b

Implication: Stop Automating the Script

The shift from workflows to agent flows is not incremental. It is architectural. State management, tool contracts, and evaluation infrastructure are not nice-to-haves bolted on later -- they are prerequisites. Enterprises that treat agent flows as "smarter workflows" will hit the same ceiling that MIT Sloan identified: isolated pilots that never scale. The ones that redesign the process architecture -- that move from "follow the script" to "yes, and..." -- will build systems that handle uncertainty by design. That is where the compounding value lives.


Sources


Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai