Business Context
Every enterprise AI conversation eventually hits the same wall: the model works in the demo but breaks in production. The reason is not model quality. It is everything around the model.
The AI stack is maturing fast. In the first weeks of 2026 alone, Anthropic shipped Claude Opus 4.6 with extended thinking, Weaviate launched agent-native query skills, and Qdrant released version 1.17 with hybrid search improvements. These are not incremental updates. They signal that the industry is converging on a layered architecture for AI -- each layer solving a specific production pain that models alone cannot fix.
The Problem
Here is what I keep seeing in enterprise AI deployments. A team builds a proof of concept. The model performs well. Leadership greenlights production. Then reality hits.
The model hallucinates because it has no structured access to company data. It calls the wrong API because there is no tool registry. When it fails, nobody can trace why because there is no observability. When it succeeds, nobody can prove it used governed data because there is no semantic layer. And when the second team builds their own agent, they solve all these problems differently, creating a fragmented mess of bespoke integrations.
This is point-to-point wiring. Every agent team reinvents the wheel -- their own retrieval pipeline, their own tool connectors, their own evaluation scripts. The pattern is unsustainable. Organisations routinely burn six months building plumbing that should be infrastructure.
The fix is not better models. It is standardised layers.
The Solution: Eight Layers of the AI Stack
The networking world solved an identical problem forty years ago. Before the OSI model, every network vendor built proprietary end-to-end stacks. Connecting an IBM mainframe to a DEC minicomputer required custom bridging at every level -- physical cables, packet formats, session management, application protocols. Scaling was impossible because every new connection was a bespoke project.
The OSI model fixed this by defining seven standardised layers. Each layer had a clear contract with the layers above and below it. You could swap Ethernet for Token Ring at the physical layer without touching the application. You could change routing protocols without rewriting your email client. The layers created composability.
The AI stack needs the same discipline. And it is starting to emerge.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
graph TB
L8["8 -- Governance
Policy enforcement, audit, compliance"]
L7["7 -- Tracing + Evals
Observability, scoring, regression detection"]
L6["6 -- Agent Runtime
Orchestration, state, multi-step execution"]
L5["5 -- Semantic Layer
Business meaning, governed metrics, context"]
L4["4 -- Retrieval
Vector DBs, hybrid search, reranking"]
L3["3 -- Tool + Data Integration
MCP, API access, structured tool use"]
L2["2 -- Routing / Gateway
Model selection, fallback, rate limiting"]
L1["1 -- Model Layer
LLMs, SLMs, fine-tuned models"]
L8 --> L7
L7 --> L6
L6 --> L5
L5 --> L4
L4 --> L3
L3 --> L2
L2 --> L1
style L8 fill:#1a2540,stroke:#ff6b6b,color:#ffffff
style L7 fill:#1a2540,stroke:#ffb347,color:#ffffff
style L6 fill:#1a2540,stroke:#00d4ff,color:#ffffff
style L5 fill:#1a2540,stroke:#00ff88,color:#ffffff
style L4 fill:#1a2540,stroke:#00d4ff,color:#ffffff
style L3 fill:#1a2540,stroke:#ffb347,color:#ffffff
style L2 fill:#1a2540,stroke:#ffffff,color:#ffffff
style L1 fill:#1a2540,stroke:#00d4ff,color:#ffffff
Each layer addresses a distinct failure mode:
Layer 1 -- Model. The foundation. Claude, GPT, Gemini, Llama, Mistral. Raw reasoning capability. This is where most attention goes, but it is the least differentiating layer for enterprises. You will swap models. Build for that.
Layer 2 -- Routing/Gateway. Model selection, fallback logic, rate limiting, cost control. When Claude is down, route to Gemini. When the query is simple, use a smaller model. This layer turns model dependency into model optionality.
Layer 3 -- Tool + Data Integration. This is where MCP (Model Context Protocol) lives. Anthropic's open standard gives models a uniform way to discover and call tools -- databases, APIs, file systems -- through a consistent interface. Before MCP, every tool integration was custom. MCP is doing for AI tool use what HTTP did for web communication: creating a shared protocol so the ecosystem can compose.
Layer 4 -- Retrieval. Vector databases like Weaviate and Qdrant. This layer turns unstructured knowledge into queryable context. Weaviate's new agent skills push retrieval closer to autonomous operation -- the database does not just return results, it understands query intent. Qdrant's 1.17 release tightens hybrid search, combining dense vectors with sparse keyword matching. Retrieval is no longer just "find similar documents." It is becoming an intelligent layer.
Layer 5 -- Semantic Layer. Business meaning. Governed definitions. What does "revenue" mean in this context? What is an "active customer"? Without a semantic layer, two agents querying the same data will interpret it differently. This is the layer most enterprises skip, and it causes the most insidious failures -- not crashes, but quietly wrong answers.
Layer 6 -- Agent Runtime. Orchestration, state management, multi-step execution. OpenClaw sits in this layer, providing the scaffolding for agents to plan, execute, and recover across complex workflows.
Layer 7 -- Tracing + Evals. Observability for AI. When an agent makes a bad decision, trace the full chain: which model, which tools, which data, which reasoning steps. Evaluation frameworks catch regressions before users do.
Layer 8 -- Governance. Policy enforcement, compliance, audit trails. Which agents can access which data? What decisions require human approval? This layer wraps everything below it in enterprise-grade controls.
Implementation: Where the Layers Connect
The real power of a layered stack is how layers compose. Consider a concrete implementation path.
Start at Layer 3. Deploy MCP servers for your core data sources -- your data warehouse, your CRM, your document store. This immediately gives any model at Layer 1 structured access to your enterprise data through a standard protocol, rather than bespoke connectors per agent.
Layer 4 sits alongside Layer 3, not above it. Your vector database handles unstructured retrieval while MCP handles structured tool calls. A well-designed agent runtime at Layer 6 knows when to use which. NemoClaw bridges retrieval and semantics, ensuring agents access governed meaning, not raw data. This is the difference between an agent that retrieves "documents about Q4 revenue" and one that retrieves "Q4 revenue as defined by the finance team's governed metric, excluding one-time items."
Layer 5 -- the semantic layer -- is where I see the biggest gap in most deployments. Teams build retrieval and tool integration but skip the business meaning layer. The result: agents that can access data but misinterpret it. If you run Databricks, the Unity Catalog semantic layer handles this. If you run Snowflake, Cortex Analyst provides similar capability. The platform matters less than the principle: agents must access meaning, not just data.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#ffffff', 'lineColor': '#ffffff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#ffffff', 'edgeLabelBackground': '#1a2540'}}}%%
flowchart LR
AGENT["Agent Runtime"]
AGENT -->|"structured query"| MCP["MCP Server
Tool + Data"]
AGENT -->|"unstructured query"| VDB["Vector DB
Retrieval"]
MCP --> SEM["Semantic Layer
Governed Meaning"]
VDB --> SEM
SEM --> DW["Data Warehouse"]
SEM --> DOCS["Document Store"]
style AGENT fill:#1a2540,stroke:#00d4ff,color:#ffffff
style MCP fill:#1a2540,stroke:#ffb347,color:#ffffff
style VDB fill:#1a2540,stroke:#00d4ff,color:#ffffff
style SEM fill:#1a2540,stroke:#00ff88,color:#ffffff
style DW fill:#1a2540,stroke:#ffffff,color:#ffffff
style DOCS fill:#1a2540,stroke:#ffffff,color:#ffffff
Layers 7 and 8 -- tracing and governance -- wrap the entire system. Every tool call, every retrieval, every model inference gets logged. Governance policies determine which agents can access which layers and under what conditions. Without these outer layers, you have capability without accountability.
Example: How Lemonade Insurance Built a Layered Claims Stack
Lemonade Insurance offers a public example of what a layered AI claims architecture looks like in practice. Their AI claims agent, AI Jim, processes 55% of claims fully automated -- including a record 2-second claim settlement. But the architecture behind that speed is not a monolith. It is layered.
Consider what makes this work. The model layer handles initial claim assessment and fraud detection. A retrieval layer matches incoming claims against historical patterns. The semantic layer ensures terms like "total loss" and "personal property" are interpreted consistently across policy types. Governance and tracing layers capture every decision step -- critical for an insurer that must justify claim decisions to regulators.
The layered approach is what makes Lemonade's system resilient. When the AI is uncertain, claims route to human reviewers with full context -- the tracing layer provides the decision chain, the retrieval layer shows similar cases, the semantic layer ensures the reviewer sees the same definitions the AI used. The model layer is swappable; routing simple claims to a lighter model and complex ones to a more capable model is an architecture decision, not a rewrite.
The result is not a marginal improvement over monolithic claims processing. It is the difference between a fragile demo and a production system handling real money at scale.
Strategic Takeaway
The AI stack is not a vendor pitch. It is an architecture pattern. Just as the OSI model enabled the internet by standardising layers, the emerging AI stack will enable enterprise AI by making each layer independently improvable and replaceable.
The enterprises that get this right will not be the ones with the best model. They will be the ones with the best stack -- where every layer has a clear responsibility, a clean interface, and an upgrade path that does not require rewriting everything above it. That is what AI-native architecture actually looks like.
Sources
- Anthropic Claude Release Notes (Feb 2026)
- Model Context Protocol (MCP) Introduction
- Weaviate Blog -- Agent Skills (Feb 2026)
- Qdrant 1.17 Release (Feb 2026)
- OSI Model -- Wikipedia
- Lemonade Sets New World Record -- Lemonade Blog
Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai