How to build an AI agent from scratch in Python

March 24, 2026by Rohit shukla

How to build an AI agent from scratch in Python is a question where the answer used to be much more complicated than it is now. In 2026, with mature LLM APIs that support tool use natively, you can build a working AI agent in about 50 lines of Python without any framework. The result won’t compete with Claude Code or a polished CrewAI deployment, but it teaches you what’s happening under every agent framework you’ll ever use.

I’ve built half a dozen AI agents this way over the past year – throwaway prototypes and production tools. Building from scratch is worth doing even if you move to a framework later, because framework abstractions make sense only after you’ve seen what they’re abstracting. What follows is the working guide: the agent loop, tool use, memory, and when to graduate to a Python AI agent framework.

Quick answer: what’s an AI agent in Python?

An AI agent in Python is a program that runs an LLM in a loop, lets the LLM call tools (functions) in response to its outputs, and continues until a task is complete. The minimum components are an LLM client (Anthropic, OpenAI, or similar), a loop that handles back-and-forth tool calls, a set of tools the agent can invoke, and some way to track conversation state. The whole pattern can be implemented in around 50 lines of Python using the Anthropic or OpenAI Python libraries for AI agents. Frameworks like LangChain, CrewAI, and the Claude Agent SDK wrap this pattern with extra structure.

What an AI agent actually is

An AI agent is the loop pattern that emerges when you give an LLM tools and let it decide when to call them. Without the loop, you have a chatbot that responds to one message at a time. With the loop, you have a system that takes a goal, decides what tools to use to achieve it, executes them, observes the results, and continues until the goal is met or it gives up.

The pattern has three moving parts. The LLM generates outputs in response to a context (a system prompt plus conversation history). The tool layer is the set of functions the LLM can invoke; each tool has a name, a description, an input schema, and an implementation. The agent loop is the Python code that pumps inputs into the LLM, parses any tool calls the LLM produces, executes those tool calls, and feeds the results back in for the next iteration.

That’s the whole pattern. Every agent framework you’ve encountered – LangChain agents, CrewAI crews, the Claude Agent SDK – is some variation of this loop with additional structure layered on top.

Step 1: Set up the LLM client

The minimum useful Python AI agent uses the Anthropic SDK because Claude’s tool use API is genuinely clean. Install the library and set your API key:

pip install anthropic
export ANTHROPIC_API_KEY="your-key-here"

A bare-minimum LLM call looks like this:

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.content[0].text)

That’s not an agent yet – it’s just an LLM call. To make it an agent, we need to add tools and a loop. The next two sections build those pieces on top of this foundation.

Step 2: Add the agent loop

The agent loop is the core of what makes an agent different from a single LLM call. Here’s the minimum loop structure:

def run_agent(user_message: str, tools: list, max_iterations: int = 10):
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )

        # If the model wants to use a tool, execute it
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = handle_tool_calls(response.content)
            messages.append({"role": "user", "content": tool_results})
            continue

        # Otherwise, the agent has finished
        return response.content[0].text

    return "Max iterations reached"

The loop has two exit paths. If the model produces a regular response (no tool use), the agent is done and returns the result. If the model produces a tool call, we execute the tool, append the result to the conversation, and continue the loop. The max_iterations ceiling prevents infinite loops if the agent gets stuck.

The conversation history grows with each iteration because the model needs the full context to make sensible decisions. This is also why agents can get expensive: long agent runs accumulate a lot of context, and you’re paying for the input tokens on every iteration.

Step 3: Add tools the agent can call

Tools are where agents become useful. A tool needs a name, a description (which the LLM reads to decide when to call it), an input schema, and a Python function that implements the actual behavior.

Here’s a complete example with two tools – one that reads a file and one that searches the web:

tools = [
    {
        "name": "read_file",
        "description": "Read the contents of a file on the local filesystem.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "Path to the file"}
            },
            "required": ["path"],
        },
    },
    {
        "name": "web_search",
        "description": "Search the web and return the top results.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"],
        },
    },
]

def handle_tool_calls(content_blocks):
    results = []
    for block in content_blocks:
        if block.type == "tool_use":
            if block.name == "read_file":
                result = open(block.input["path"]).read()
            elif block.name == "web_search":
                result = my_search_function(block.input["query"])
            else:
                result = f"Unknown tool: {block.name}"

            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })
    return results

The agent now has real capabilities. Give it a task like “read config.json and search the web for what each setting does,” and it will call read_file first, then web_search multiple times based on what it found.

The quality of descriptions matters more than first-time builders expect. A tool described as “Search the web” gets used unpredictably; “Search the web and return top 5 results. Use this for current events, recent products, or anything not in training data” gets used predictably. Treat tool descriptions as the prompt-engineering surface that controls when each tool fires.

Step 4: Add memory and state

The loop above maintains conversation memory within a single agent run, which is enough for most single-task agents. For agents that need to remember information across sessions (a personal assistant, a long-running research agent), you need explicit memory management.

The simplest pattern: persist the messages list to disk between sessions and reload on startup. For more sophisticated needs, summarize old messages periodically to keep context within the model’s window. Tools like Mem0 or the Claude Agent SDK’s built-in compaction handle this automatically.

The memory question that surprises first-time builders: the model has no memory between iterations except what’s in the messages array. If you want the agent to remember something across a long task, the information has to be in that array – in the conversation history, the system prompt, or a tool result.

When to use a Python AI agent framework instead

After building a few agents from scratch, the question of whether to adopt a framework becomes practical.

Stay with scratch-built agents when the agent is small, the workflow is simple, or you specifically want full control over the loop. The 50 lines of code is yours to read, debug, and modify. Frameworks add abstraction layers that make sense for complex agents and feel heavy for simple ones.

Adopt a framework when you’re reinventing concepts the frameworks already handle: multi-step planning, multi-agent coordination, tool execution with retries, evaluation harnesses, observability. The popular Python AI agent frameworks (Claude Agent SDK, LangGraph, CrewAI, OpenAI Assistants) have invested in these. The decision rule: if you’re writing more than ~300 lines of agent infrastructure that isn’t core to your domain, switch to one of the established options.

Common gotchas when building from scratch

A few production-relevant pitfalls show up consistently.

The max_iterations ceiling is non-optional. Without it, a confused agent loops indefinitely, burning tokens. Always set a hard ceiling.

Tool errors should be returned to the model, not raised. When a tool call fails, return the error as a tool result so the agent can course-correct. Raising kills the loop.

Token costs add up fast. Each iteration sends the full conversation history. Long agent runs produce surprisingly large bills. Monitor token usage and set spending alerts.

Tool descriptions are prompts. Write them like API docs for the model. Vague descriptions produce unpredictable tool selection.

FAQ

How do I build an AI agent in Python?

To build an AI agent in Python, install an LLM client library (Anthropic, OpenAI, or similar), define tools as functions with descriptions and input schemas, and write a loop that calls the LLM with tools enabled, executes any tool calls the model produces, and continues until the model returns a final answer. The minimum working agent is around 50 lines of code. The pattern has three components: an LLM client, a tool layer, and an agent loop. Every popular Python AI agent framework wraps this pattern with extra structure for production concerns.

What Python libraries do I need for AI agents?

The Python libraries for AI agents fall into two categories. For building from scratch, you need an LLM client library: anthropic for Claude, openai for GPT models, google-generativeai for Gemini. That’s enough to implement the agent loop manually. For production-grade agents, frameworks like claude-agent-sdk, langchain, langgraph, crewai, and openai-agents add structure: agent loops, tool execution, memory, observability. For agents that need vector search, add chromadb, pgvector, or similar. Most production agents use a framework plus targeted libraries for memory, search, and observability.

What’s the best Python AI agent framework?

The best Python AI agent framework depends on what you’re building. The Claude Agent SDK is the strongest choice for agents that need long-running autonomous work and tight Claude integration. LangGraph wins on agents requiring explicit state-machine control over workflow. CrewAI is best for role-based multi-agent setups where multiple agents collaborate. The OpenAI Assistants API and the new OpenAI Agents library work well for OpenAI-heavy stacks. For your first agent, build from scratch in 50 lines and adopt a framework once you can name the specific complexity it solves.

Can I build an AI agent without a framework?

Yes, you can build an AI agent without a framework, and doing so once is genuinely educational. The pattern is straightforward: an LLM client, a loop that handles tool calls, a set of tools with descriptions and schemas, and a max_iterations ceiling. A working agent runs about 50 lines of Python. The trade-off is that you handle everything yourself: retries, error handling, observability, memory management. For simple agents this is fine. For complex production agents, frameworks handle the infrastructure work and let you focus on the agent’s specific behavior.

How long does an AI agent loop run?

An AI agent loop runs until the model produces a final response without requesting another tool call, the loop hits its max_iterations ceiling, or you implement an explicit termination condition. Simple agents typically run 2-5 iterations. Complex agents doing research can run 20-50. Setting max_iterations to 10-15 is reasonable for most use cases – tasks that need more iterations should be broken into sub-tasks. Always set a ceiling; never trust the agent to terminate on its own.

If you’ve built a production AI agent in Python from scratch and have honest numbers on what changed when you moved to or away from a framework, that writeup is the gap worth filling. Most published content shows the from-scratch pattern or the framework code, but rarely the migration story between them.

Written by

Rohit shukla

👋 Hi, I’m Rohit Shukla! I am a full-stack developer with expertise in Angular, Golang, Java, and I am passionate about building scalable applications, backend systems, and APIs. Over 4 the years, I have worked on various projects, improving my skills in modern web technologies, AI and cloud computing.