Most Enterprise "AI Automation" Is Just Faster Typing
Enterprises spent the last decade encoding business logic into workflows. Step A, then B, then C. Approvals at fixed gates. Exception handling via predefined branches. BPMN diagrams on whiteboards, translated into Airflow DAGs or Power Automate flows.
Now those same enterprises are plugging LLMs into these workflows and calling it AI transformation. It is not. It is the equivalent of putting a jet engine on a horse cart. The engine is powerful. The cart still only goes where the rails point.
MIT Sloan's research on scaling AI found that most organisations struggle to move beyond isolated pilots precisely because they bolt AI onto existing processes rather than rethinking the process shape itself. The constraint is not model capability. It is process architecture.
Diagnosis: Why Workflows Break Under Uncertainty
Classic workflows encode certainty. They assume you know the path before you start walking it. That assumption held when the work was predictable -- invoice processing, order fulfilment, compliance checks with stable rule sets.
Agent-driven work is different. Consider a due diligence review for a potential acquisition. The agent needs to pull financial filings, cross-reference them against news sentiment, identify regulatory risks specific to the target's operating jurisdictions, and flag inconsistencies that warrant deeper investigation. The sequence is not predetermined. What the agent finds at step two changes what it should do at step three.
Three specific breakdowns happen when you force this kind of work into a traditional workflow:
State amnesia. Workflows are stateless between steps. Each task gets inputs, produces outputs, passes them forward. But an agent reasoning over a complex problem needs memory -- what it tried, what failed, what context it accumulated. A claims processing agent that cannot remember it already checked the policy database is going to check it again. And again.
Brittle integrations. Workflows connect to systems through hard-coded API calls. When the agent needs a tool it was not pre-wired to, it stops. There is no protocol for "I need to look something up I was not designed to look up." Every new capability requires a developer to add a connector and redeploy.
Invisible reasoning. Workflows log task completion. Agent flows need to log why. What did the agent consider? What alternatives did it reject? When a workflow fails, you check the error log. When an agent flow produces a wrong answer, you need the full reasoning trace. Most workflow engines do not capture this.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph LR
subgraph WORKFLOW ["Classic Workflow"]
W1["Step A"] --> W2["Step B"] --> W3["Step C"] --> W4["Done"]
end
subgraph AGENTFLOW ["Agent Flow"]
A1["Plan"] --> A2["Execute Tool"]
A2 --> A3["Inspect Result"]
A3 -->|"Revise plan"| A1
A3 -->|"Continue"| A4["Next Action"]
A4 --> A5["Inspect Result"]
A5 -->|"Revise plan"| A1
A5 -->|"Done"| A6["Complete"]
end
style WORKFLOW fill:#2a1a1a,stroke:#ff6b6b,color:#ffffff
style AGENTFLOW fill:#0a2a1e,stroke:#00ff88,color:#ffffff
style W1 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style W2 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style W3 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style W4 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style A1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
style A2 fill:#1a2540,stroke:#ffb347,color:#ffb347
style A3 fill:#1a2540,stroke:#00ff88,color:#00ff88
style A4 fill:#1a2540,stroke:#ffb347,color:#ffb347
style A5 fill:#1a2540,stroke:#00ff88,color:#00ff88
style A6 fill:#1a2540,stroke:#00ff88,color:#00ff88
Reframe: "Yes, And..." -- What Improv Theatre Teaches About Agent Design
Here is the connection that reframed this problem for me: improv theatre.
In scripted theatre, every actor follows the script. Lines are memorised. Blocking is rehearsed. The director controls the outcome. Workflows operate the same way -- predefined steps, predictable results. If someone deviates, the show breaks.
Improv has a foundational principle: "yes, and..." One performer makes an offer. The next accepts it and builds on it. The scene emerges from what is discovered in the moment, not from what was planned beforehand. The performers are skilled, but the path is not scripted. What matters is their ability to respond to what actually happened, not what was supposed to happen.
Agent flows work the same way. The ReAct pattern -- Reasoning + Acting, introduced by Yao et al. at Princeton -- formalises exactly this. The agent reasons about the current state, takes an action, observes the result, then reasons again. Each step builds on what was discovered, not what was planned. Plan, act, observe, revise. "Yes, and..."
But improv is not chaos. The best improv troupes have structure: scene formats, character conventions, time limits, a shared understanding of what makes a scene work. The structure enables the freedom. Remove the structure and you get noise, not creativity.
Agent flows need the same kind of disciplined structure. Without it, "yes, and..." becomes "yes, and... we have no idea what happened or why."
Framework: Three Engineering Shifts for Agent Flows
Moving from workflows to agent flows requires three foundational changes. I have not seen an enterprise succeed at agentic AI without addressing all three.
Shift 1: State and Memory as First-Class Citizens
Workflows pass data between steps. Agent flows accumulate context.
An agent processing a complex insurance claim needs to remember: which documents it has reviewed, what inconsistencies it found, which external databases it queried, what hypotheses it formed and discarded. This is not a variable passed between tasks. It is a working memory that grows and evolves during execution.
Practically, this means persistent state stores -- not just message queues. Conversation-level memory, task-level memory, and cross-session memory are three different concerns, each requiring different storage and retrieval strategies. I'm not sure the industry has settled on the right abstractions here yet. Most teams I talk to are building custom state management, which is expensive and fragile.
Shift 2: Tool Contracts Instead of Ad-Hoc Integrations
Workflows call APIs. Agent flows negotiate with tools.
The difference matters. When a workflow calls an API, the integration is hard-coded: endpoint, payload format, authentication, error handling -- all baked in at development time. When an agent needs to interact with a system, it needs to discover what tools are available, understand their capabilities and constraints, invoke them correctly, and handle failures gracefully.
The Model Context Protocol (MCP) is the most promising effort to standardise this. MCP defines a contract: here is what this tool does, here are its inputs and outputs, here are its constraints. The agent does not need a custom integration for every system. It needs a protocol that lets it discover and use tools dynamically.
This is a genuine shift in integration architecture. Instead of point-to-point connectors maintained by platform teams, you get standardised tool contracts that agents can discover at runtime. The tradeoff is real, though -- dynamic tool discovery introduces latency and failure modes that static integrations avoid.
Shift 3: Evaluation and Tracing as Part of Development
Workflows have unit tests. Agent flows need evaluation harnesses.
You cannot unit-test an agent the way you test a function. The agent's behaviour depends on the sequence of observations it makes, the state it accumulates, and the reasoning it applies -- all of which vary between runs. An agent that produces the right answer through wrong reasoning is a time bomb.
Tracing every reasoning step, tool invocation, and state mutation is not a debugging afterthought. It is a development requirement. When your claims processing agent approves a payout it should not have, you need to reconstruct the full decision chain: what data did it see, what tools did it call, what reasoning led to the approval, and where did that reasoning go wrong.
OpenClaw encodes agent flows as plan, tool, verify, continue; NanoClaw workers are bounded executors scoped to single tool domains. This pattern -- explicit verification after every tool call, bounded execution scope per worker -- addresses both the tracing and the blast-radius problems simultaneously.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph TD
S1["Shift 1: State + Memory"]
S2["Shift 2: Tool Contracts"]
S3["Shift 3: Eval + Tracing"]
S1 --- CORE["Agent Flow\nEngineering"]
S2 --- CORE
S3 --- CORE
S1D["Persistent context\nWorking memory\nCross-session recall"]
S2D["MCP protocol\nDynamic discovery\nStandardised interfaces"]
S3D["Reasoning traces\nStep-level verification\nEval harnesses"]
S1 --> S1D
S2 --> S2D
S3 --> S3D
style CORE fill:#1a2540,stroke:#ffffff,color:#00d4ff,stroke-width:2px
style S1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff,stroke-width:2px
style S2 fill:#1a2540,stroke:#ffb347,color:#ffb347,stroke-width:2px
style S3 fill:#1a2540,stroke:#00ff88,color:#00ff88,stroke-width:2px
style S1D fill:#1a2540,stroke:#ffffff,color:#ffffff
style S2D fill:#1a2540,stroke:#ffffff,color:#ffffff
style S3D fill:#1a2540,stroke:#ffffff,color:#ffffff
Application: Lemonade's Claims Pipeline -- Workflow to Agent Flow
Lemonade Insurance's AI Jim is the clearest public example of a claims pipeline redesigned from workflow to agent flow. The traditional insurance process is a classic linear workflow: receive claim, validate policy, assess damage, calculate payout, approve, pay. Industry average: days to weeks.
The agent flow version works differently. AI Jim receives the claim and plans its approach based on claim characteristics. A straightforward claim -- broken phone screen, stolen bicycle -- gets fast-tracked: policy validation, damage assessment, payout calculation, all in one pass. A complex or high-value incident triggers a different plan: additional verification, fraud pattern matching, human adjuster routing.
The critical difference is what happens when something unexpected surfaces. In a traditional workflow, an inconsistency between the claimant's account and the evidence would route to a human queue and wait. In Lemonade's agent flow, AI Jim revises its plan: it pulls additional data, runs a second analysis, and either resolves the inconsistency or escalates with a structured summary of exactly what does not add up and why.
The results are public. 55% of claims are processed fully autonomously. The record: a single claim settled in 2 seconds. 96% of first notices handled by AI. Complex claims still route to human adjusters, but with a pre-analysed brief instead of a raw file. The pattern confirms what I keep seeing across the industry: the hardest part is not building the agent -- it is building the state management and tracing infrastructure that makes the agent trustworthy.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
flowchart LR
subgraph OLD ["Old: Linear Workflow"]
O1["Receive"] --> O2["Validate"] --> O3["Assess"] --> O4["Calculate"] --> O5["Approve"] --> O6["Pay"]
end
subgraph NEW ["New: Agent Flow"]
N1["Receive + Plan"] --> N2["Execute Tools"]
N2 --> N3{"Unexpected\nfinding?"}
N3 -->|"Yes"| N4["Revise Plan"]
N4 --> N2
N3 -->|"No"| N5["Resolve or\nEscalate"]
end
style OLD fill:#2a1a1a,stroke:#ff6b6b,color:#ffffff
style NEW fill:#0a2a1e,stroke:#00ff88,color:#ffffff
style N1 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
style N2 fill:#1a2540,stroke:#ffb347,color:#ffb347
style N3 fill:#1a2540,stroke:#ffffff,color:#ffffff
style N4 fill:#1a2540,stroke:#00d4ff,color:#00d4ff
style N5 fill:#1a2540,stroke:#00ff88,color:#00ff88
style O1 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style O2 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style O3 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style O4 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style O5 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
style O6 fill:#1a2540,stroke:#ff6b6b,color:#ff6b6b
Implication: Stop Automating the Script
The shift from workflows to agent flows is not incremental. It is architectural. State management, tool contracts, and evaluation infrastructure are not nice-to-haves bolted on later -- they are prerequisites. Enterprises that treat agent flows as "smarter workflows" will hit the same ceiling that MIT Sloan identified: isolated pilots that never scale. The ones that redesign the process architecture -- that move from "follow the script" to "yes, and..." -- will build systems that handle uncertainty by design. That is where the compounding value lives.
Sources
- Yao et al. -- ReAct: Synergizing Reasoning and Acting in Language Models (arXiv)
- MIT Sloan -- Scaling AI for Results (January 2026)
- Model Context Protocol (MCP) -- Introduction and Specification
- Lemonade Insurance: AI Jim sets new world record
Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai