How to build an AI agent from scratch in Python is a question where the answer used to be much more complicated than it is now. In 2026, with mature LLM APIs that support tool use natively, you can build a working AI agent in about 50 lines of Python without any framework. The result won’t compete with Claude Code or a polished CrewAI deployment, but it teaches you what’s happening under every agent framework you’ll ever use.
I’ve built half a dozen AI agents this way over the past year – throwaway prototypes and production tools. Building from scratch is worth doing even if you move to a framework later, because framework abstractions make sense only after you’ve seen what they’re abstracting. What follows is the working guide: the agent loop, tool use, memory, and when to graduate to a Python AI agent framework.
Quick answer: what’s an AI agent in Python?
An AI agent in Python is a program that runs an LLM in a loop, lets the LLM call tools (functions) in response to its outputs, and continues until a task is complete. The minimum components are an LLM client (Anthropic, OpenAI, or similar), a loop that handles back-and-forth tool calls, a set of tools the agent can invoke, and some way to track conversation state. The whole pattern can be implemented in around 50 lines of Python using the Anthropic or OpenAI Python libraries for AI agents. Frameworks like LangChain, CrewAI, and the Claude Agent SDK wrap this pattern with extra structure.
What an AI agent actually is
An AI agent is the loop pattern that emerges when you give an LLM tools and let it decide when to call them. Without the loop, you have a chatbot that responds to one message at a time. With the loop, you have a system that takes a goal, decides what tools to use to achieve it, executes them, observes the results, and continues until the goal is met or it gives up.
The pattern has three moving parts. The LLM generates outputs in response to a context (a system prompt plus conversation history). The tool layer is the set of functions the LLM can invoke; each tool has a name, a description, an input schema, and an implementation. The agent loop is the Python code that pumps inputs into the LLM, parses any tool calls the LLM produces, executes those tool calls, and feeds the results back in for the next iteration.
That’s the whole pattern. Every agent framework you’ve encountered – LangChain agents, CrewAI crews, the Claude Agent SDK – is some variation of this loop with additional structure layered on top.
Step 1: Set up the LLM client
The minimum useful Python AI agent uses the Anthropic SDK because Claude’s tool use API is genuinely clean. Install the library and set your API key:
pip install anthropic
export ANTHROPIC_API_KEY="your-key-here"
A bare-minimum LLM call looks like this:
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.content[0].text)
That’s not an agent yet – it’s just an LLM call. To make it an agent, we need to add tools and a loop. The next two sections build those pieces on top of this foundation.
Step 2: Add the agent loop
The agent loop is the core of what makes an agent different from a single LLM call. Here’s the minimum loop structure:
def run_agent(user_message: str, tools: list, max_iterations: int = 10):
messages = [{"role": "user", "content": user_message}]
for _ in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=messages,
)
# If the model wants to use a tool, execute it
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = handle_tool_calls(response.content)
messages.append({"role": "user", "content": tool_results})
continue
# Otherwise, the agent has finished
return response.content[0].text
return "Max iterations reached"
The loop has two exit paths. If the model produces a regular response (no tool use), the agent is done and returns the result. If the model produces a tool call, we execute the tool, append the result to the conversation, and continue the loop. The max_iterations ceiling prevents infinite loops if the agent gets stuck.
The conversation history grows with each iteration because the model needs the full context to make sensible decisions. This is also why agents can get expensive: long agent runs accumulate a lot of context, and you’re paying for the input tokens on every iteration.
Step 3: Add tools the agent can call
Tools are where agents become useful. A tool needs a name, a description (which the LLM reads to decide when to call it), an input schema, and a Python function that implements the actual behavior.
Here’s a complete example with two tools – one that reads a file and one that searches the web:
tools = [
{
"name": "read_file",
"description": "Read the contents of a file on the local filesystem.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Path to the file"}
},
"required": ["path"],
},
},
{
"name": "web_search",
"description": "Search the web and return the top results.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"],
},
},
]
def handle_tool_calls(content_blocks):
results = []
for block in content_blocks:
if block.type == "tool_use":
if block.name == "read_file":
result = open(block.input["path"]).read()
elif block.name == "web_search":
result = my_search_function(block.input["query"])
else:
result = f"Unknown tool: {block.name}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
return results
The agent now has real capabilities. Give it a task like “read config.json and search the web for what each setting does,” and it will call read_file first, then web_search multiple times based on what it found.
The quality of descriptions matters more than first-time builders expect. A tool described as “Search the web” gets used unpredictably; “Search the web and return top 5 results. Use this for current events, recent products, or anything not in training data” gets used predictably. Treat tool descriptions as the prompt-engineering surface that controls when each tool fires.
Step 4: Add memory and state
The loop above maintains conversation memory within a single agent run, which is enough for most single-task agents. For agents that need to remember information across sessions (a personal assistant, a long-running research agent), you need explicit memory management.
The simplest pattern: persist the messages list to disk between sessions and reload on startup. For more sophisticated needs, summarize old messages periodically to keep context within the model’s window. Tools like Mem0 or the Claude Agent SDK’s built-in compaction handle this automatically.
The memory question that surprises first-time builders: the model has no memory between iterations except what’s in the messages array. If you want the agent to remember something across a long task, the information has to be in that array – in the conversation history, the system prompt, or a tool result.
When to use a Python AI agent framework instead
After building a few agents from scratch, the question of whether to adopt a framework becomes practical.
Stay with scratch-built agents when the agent is small, the workflow is simple, or you specifically want full control over the loop. The 50 lines of code is yours to read, debug, and modify. Frameworks add abstraction layers that make sense for complex agents and feel heavy for simple ones.
Adopt a framework when you’re reinventing concepts the frameworks already handle: multi-step planning, multi-agent coordination, tool execution with retries, evaluation harnesses, observability. The popular Python AI agent frameworks (Claude Agent SDK, LangGraph, CrewAI, OpenAI Assistants) have invested in these. The decision rule: if you’re writing more than ~300 lines of agent infrastructure that isn’t core to your domain, switch to one of the established options.
Common gotchas when building from scratch
A few production-relevant pitfalls show up consistently.
The max_iterations ceiling is non-optional. Without it, a confused agent loops indefinitely, burning tokens. Always set a hard ceiling.
Tool errors should be returned to the model, not raised. When a tool call fails, return the error as a tool result so the agent can course-correct. Raising kills the loop.
Token costs add up fast. Each iteration sends the full conversation history. Long agent runs produce surprisingly large bills. Monitor token usage and set spending alerts.
Tool descriptions are prompts. Write them like API docs for the model. Vague descriptions produce unpredictable tool selection.
FAQ
If you’ve built a production AI agent in Python from scratch and have honest numbers on what changed when you moved to or away from a framework, that writeup is the gap worth filling. Most published content shows the from-scratch pattern or the framework code, but rarely the migration story between them.