What Makes an AI Agent: Model, Tools, and the Harness
Most products you bought as AI Agents this year are workflows with a chatbot bolted on. This article sets out the three-ingredient test, the three-tier hierarchy, and the diagnostic question that exposes Agent-Washing in 60 seconds.
Most products you bought this year as AI Agents are workflows with a chatbot bolted on. That is a procurement problem, not a technology problem. The vendor pitch said autonomous. The architecture is reactive. The invoice does not distinguish between the two, but next year's failure-rate report will.
This article sets out the three-ingredient framework the industry has converged on for defining a complete AI Agent stack, the three-tier hierarchy that separates a script from a workflow from an Actor, and the diagnostic question that exposes Agent-Washing in any vendor demo before you sign the contract.
The adoption-to-scaling gap
88% of enterprises report regular use of Artificial Intelligence inside their organizations [1]. Fewer than 10% have scaled true AI Agents in any single function [1]. The gap between those two numbers is a definitional failure measured in procurement budgets.
Gartner forecasts that over 40% of agentic AI projects will be cancelled by 2027, citing escalating costs, unclear value, and inadequate risk controls [2]. Almost half of every "agent" initiative on a 2026 roadmap is already a write-off in waiting. That is the cancellation rate organizations get when they buy the wrong category of software.
The market has trained you to talk about models. Frontier models generate the demo cycle and most of the press attention, and they are also the cheapest, most replaceable ingredient in any agent stack. The reason your last two agent contracts underdelivered sits in a different layer, one almost no vendor builds and almost no buyer asks about.
The three tiers: program, workflow, agent
Three architecturally distinct categories of software run inside your business right now. The market sells all three under the same label, and that conflation is where the procurement losses start.
A classical program executes deterministic, human-written control flow. It does exactly what its author told it to do, and nothing else. This is what Chapter 2 of AI Agents: They Act, You Orchestrate calls the Puppet: symbolic AI, expert systems, the email script that runs every Tuesday at 09:00. Useful within its rails, blind to its environment.
An agentic workflow uses a large language model to make local decisions inside a sequence of steps a human still defined. The LLM picks the wording of the email. The human picked the order of the steps. Anthropic, in its definitional reference "Building effective agents," draws the line precisely: workflows are "systems where LLMs and tools are orchestrated through predefined code paths" [3]. This is the Servant from Chapter 2 of my book. This is the AI Assistant: fluent and reactive, but waiting for the next prompt.
A true AI Agent decides its own next action based on what it perceives. The Perceive-Reason-Act Cycle is the three-stroke engine I describe in the book: observe the environment, reason about the next move, act on the world, and then feed the result back into the next observation. The cycle continues without a prompt between steps and adapts as it goes. This is the Actor, a different class of software.
Before you evaluate any agent product, place it on the Autonomy Spectrum. If a human defined the sequence of steps, you have a workflow. If the system needs you to prompt it after every step, you have a servant. If the system takes a goal and operates against it without your hand on the wheel, you have an actor. The marketing label does not enter into the diagnosis.
Model + Tools + Harness, and why the model is the cheap part
Model + Tools + Harness is the three-ingredient decomposition that defines a complete AI Agent stack. The industry has converged on this framing in 2026 because all three ingredients have finally matured at the same time. It is the cleanest definition you will get for a category that vendors are now incentivized to blur.
The Model is the reasoning kernel, the part that takes a goal and decides what to attempt. Models are now interchangeable infrastructure. OpenClaw, the open-source agent runtime widely cited as a canonical example of the new agent architecture, runs on Anthropic, OpenAI, Google, and DeepSeek models alike. Swap one for another and the agent still functions. The Model is the commodity ingredient in the stack.
The Tools are the APIs the agent can invoke: shell access, file system, browser control, calendar, email, internal databases. These are off-the-shelf. Every cloud vendor ships them, so the Tools are not the differentiator either.
The Harness is the persistence and memory layer that gives the agent continuity across time: session management, memory files, loop iteration, sandboxing, proactive triggers, and recovery from failure. The Harness converts an LLM call into a system capable of running for hours without your input. It is the part almost no vendor builds, because it is invisible, unglamorous, and architecturally hard.
The Harness is also the application-layer difference behind the products people genuinely use: Claude Code, Manus, Factory's Droids. The dozens of "agent platforms" you have been pitched have models and tools but no Harness, which is why they collapse the moment you ask them to do anything for more than five minutes.
Before you sign the next vendor contract, give the salesperson 60 seconds to describe their Harness: persistent memory, session management, loop iteration, proactive triggers. If they cannot answer, they do not have one.
The loop is the architectural difference
The feature that converts an LLM into an AI Agent is the loop. The size of the model and the polish of the interface are downstream of that.
A chatbot answers a prompt and stops there. A workflow runs its predefined pipeline and stops as well. An Actor runs the Perceive-Reason-Act Cycle continuously, observing the result of each action and choosing the next one for as long as the goal demands. Anthropic puts it cleanly: "Agents are typically just LLMs using tools based on environmental feedback in a loop"[3].
OpenClaw makes this concrete. Its agent runtime iterates up to 20 tool calls per request and can persist for up to 48 hours per task [4]. Inside that 48-hour window, the agent reads context, calls a tool, observes the result, reasons about the next step, calls another tool, recovers from errors, and writes intermediate state to a memory file. No human types between steps. The persistence of the loop is what separates an Actor from everything below it on the Autonomy Spectrum, and the intelligence of the underlying model is downstream of that fact.
The capability curve is also moving against organizations that wait. METR research shows that the length of task an AI system can complete autonomously with 50% reliability has been doubling roughly every seven months for the last six years [5]. A goal that takes a Servant three turns to fail at today will be completed end-to-end by an Actor in 18 months. Every quarter you spend deploying agentic workflows instead of true Actors is a quarter of ground given to competitors who picked the right category of software.
Run this test on your current "agent" stack: hand it a real multi-step objective and walk away for an hour. If on return there is no loop history and no record of how it handled obstacles, you own a vending machine for LLM responses.
Agent-Washing is the failure mechanism behind the 40%
Agent-Washing is the practice of attaching the "AI Agent" label to what is architecturally a workflow or an assistant. I named it because the practice now drives the procurement spend of the enterprise market. Once "agent" became commercially valuable, every vendor with a chatbot rebranded.
The Autonomy Spectrum is the three-tier classification from Chapter 2 separating Puppet (deterministic script), Servant (reactive assistant), and Actor (true AI Agent). Run its three questions on any system before you sign.
- Autonomy. Can the system complete a multi-step goal end-to-end without your prompt between steps? If the demo requires you to press a button after each action, you are looking at a Servant.
- Proactivity. Has the system ever initiated action without being told? An Actor wakes up, checks state, and acts on its own schedule. A Servant waits for you.
- Memory. Does it remember last week? Memory in the architectural sense means the Harness writing state to persistent storage and replaying it on the next run. Retrieval-augmented generation glued on at the last minute does not clear that bar.
If any answer is no, you own a Servant in a costume, and the project is queuing for the 40% cancellation pile.
Only 21% of organizations have a mature governance model for autonomous AI Agents [6]. The other 79% cannot tell their workflows from their Actors and cannot audit their Harnesses because there are none to audit. The architectural mismatch tends to surface on the day the project gets cancelled for "unclear value," at which point the board asks a question nobody on the team is prepared to answer.
The reframe: the model is the commodity, the Harness is the asset
The market trained you to obsess over the Model because Model releases generate the headlines. Frontier launches are where the attention goes. The Model is also the most substitutable ingredient in the entire stack. Swap ChatGPT for Claude, Claude for Gemini, Gemini for an open-weights model running on your own hardware, and a well-built Harness keeps running. Swap the Harness, and you no longer have an agent.
Models have been compounding for three years, which is not the reason 2026 became the year of AI Agents. The Harness layer finally matured, and that is what unlocked the category. The companies winning the next decade will be the ones who built or bought the best Harness, because the Harness compounds across model upgrades, tool additions, and product iterations in a way the Model itself does not.
Your competitive position in the Agent-First Era is the depth of your Harness rather than the brand of your Model. That sentence is the procurement strategy in one line.
The choice in front of you is straightforward. You either learn to ask "where is the Harness?" in every vendor meeting, or you keep paying agent prices for workflow capability. The 40% cancellation rate is the bill that arrives for organizations that never learned to read the architecture beneath the label.
This article scratches the surface of one framework from AI Agents: They Act, You Orchestrate by Peter van Hees. The book maps 18 chapters across the architecture of the Agent-First Era, from the Autonomy Spectrum and Perceive-Reason-Act Cycle to the AIOS stack, the Delegation Ladder, and the Human Premium Stack that decides which careers survive Synthetic Labor. If the gap between Agent-Washing and true Actors resonated, the book gives you the complete operating manual. Get your copy:
πΊπΈ Amazon.com | π¬π§ Amazon.co.uk | π«π· Amazon.fr | π©πͺ Amazon.de | π³π± Amazon.nl | π§πͺ Amazon.com.be
References
[1] McKinsey, "The state of AI in 2025," cited in Datagrid, "26 AI Agent Statistics (Adoption Trends and Business Impact)," updated November 2025. https://www.datagrid.com/blog/ai-agent-statistics
[2] Gartner, cited in Paul Okhrem, "Enterprise AI Agents Adoption Statistics 2026," updated May 2026. https://paul-okhrem.com/enterprise-ai-agents-statistics-2026/
[3] Anthropic, "Building effective agents," 19 December 2024. https://www.anthropic.com/research/building-effective-agents
[4] OpenClaw documentation, "The Open-Source AI Agent That Actually Does Things," 23 February 2026. https://www.mindstudio.ai/blog/what-is-openclaw-ai-agent/
[5] METR, "Measuring AI Ability to Complete Long Tasks," 19 March 2025. https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
[6] Deloitte, cited in Paul Okhrem, "Enterprise AI Agents Adoption Statistics 2026," updated May 2026. https://paul-okhrem.com/enterprise-ai-agents-statistics-2026/