It’s been over a year since From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack, where I documented the current state of technical stacks in GenAI applications. In AI years, that is approximately 4 score and 7 years of innovation and change. We’re due for an update.

In the last survey of the landscape, the hot topics were Retrieval Augmented Generation (RAG), Vector Databases, and Model Context Protocol (MCP). The assumed application was a chatbot or knowledge management tool. The term ‘agent’ was in its infancy.

Today, the buzz is all about AI Agents. The RAG implementations of 2025 are still relevant, and often more production-ready than agents, but everyone is starting to believe in the potential of agents. This is where we find the current buzz, so this is what we’ll cover in this post.

AI Agent Architectures 2026-02-25 09.13.25.excalidraw.svg

AI Agent Technical Stack

Many of the foundational components from 2025 remain in place today. The core functionality is driven by Large Language Models (LLMs) that are used via API calls or local hosting. The LLMs can be general-purpose or fine-tuned for specific tasks. There is a need for observability and evaluation to understand and improve upon a non-deterministic system.

AI agents change things thanks to their superpower: tools. Just as tools have helped catalyze periods of human innovation (shout out to the hammer, the wheel, and sliced bread), tools have been the driving force in the world of agents. An LLM used to produce code, but has no idea if it works. With tools, the code can be executed and improved until it is ready to show to the user.

Tools have been around for a while, but recent models are now “smarter” and much better at using them. The injection of tools into the AI agent technical stack significantly impacts several components, which will be the focus of today’s discussion. How do stateless agents understand what tools are available? How do they communicate with one another or a user in a long-running session? How do we ensure things are run safely? If you think a bull in a china shop can be costly, wait until you see an AI agent with limitless use of tools.

AI Models

The brain of the operations. In agentic workflows, agents must take in

Agent Hosting and Serving

Models can be hosted locally or accessed via cloud providers and APIs. An agent will use the models, but also have unique requirements in how they must be hosted. Agents need to spin up quickly and support long-running stateful execution, human-in-the-loop pauses, and real-time streaming.

Further, access to tools presents a need for tight security. More on this in the Sandboxes section.

LangSmith Deployments and the Vercel AI SDK specialize in hosting agents. Or, you can use n8n which combines agent hosting with our next topic…

Agent Frameworks

Agents can get complicated quickly. A single agent with a few tools is easy enough to maintain. But, multi-agent workflows with 20+ potential tool calls, across many users, each with multiple messages and their own unique context… You need a way to orchestrate all of these moving parts.

Agent frameworks offer abstracted code to more easily control the conversations and functionalities of your agents. The cost is performance and a potentially bloated codebase, but you can really accelerate getting your hands dirty with coding agents.

Some popular agent frameworks are LangGraph, CrewAI, Microsoft Semantic Kernel, and LlamaIndex.

Data Layer

Each LLM interaction is stateless. Even when you are having a “conversation” and send a new message, the LLM doesn’t actually remember your previous messages. The whole entire conversation (or some sort of summary of it) is sent back to the LLM (aka agent).

Moreover, to give the LLM the appearance of gaining intelligence, you need to feed it context to enable it with the right “expertise”. Through the art of Context Engineering (the more comprehensive, and probably better term compared to Prompt Engineering) and Writing Agent Skills, you can build a context that best suits your agent’s use case.

Of course, this context needs to be stored somewhere as data. Long-term “memory” for agents is typically equated to database storage, basically just as we saw it in From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack. A database stores vector embedding of context, which can be used to enrich the model’s context.

In agentic interactions, an LLM may spin up, learn a slight user preference, and be expected to remember it the next day as a fresh LLM. For these types of interactions, agentic systems are starting to use a short-term “memory”.

Memory

This is not your grandfather’s memory, which probably referenced something like RAM or registers. It is a suite of approaches to persist and retrieve information gained by an agent in a manner that allows for efficiently adding to to a agent’s context.

This may be a simple markdown file like INSTRUCTIONS.md to provide additional context on every message. Or an agent maintained MEMORY.md that the agent automatically updates to remember things about you.

Tools like Zep and mem0 offer more sophisticated approaches if you’d like to explore further.

Code Execution

Remember that bull in a china shop? This is where we keep it from smashing everything. Agents don’t just think about actions, they take them. They can execute dangerous code and delete databases.

The code execution layer needs to give agents the right tools while making sure they can’t burn the house down using them.

Tool Libraries and MCPs

In From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack, Model Context Protocol (MCP) was just starting to gain traction. Today, it’s the de facto standard for connecting agents to tools. The ubiquitous description is to think of MCP as USB-C for AI tools. Before MCP, every tool integration was a custom job. Now, you define a tool server once, and any MCP-compatible agent can discover and use it.

Need your agent to search the web, read a database, and send a Slack message? Instead of writing three custom integrations, you point it at three MCP servers. The agent discovers what’s available and figures out when to use each one.

Of course, you don’t have to build every tool from scratch. Tool libraries give you pre-built collections of common integrations. Composio and Toolhouse offer hundreds of ready-made tool integrations that can be plugged into your agents. If you’d rather keep things closer to home, most agent frameworks (like the ones we discussed before) ship with their own tool ecosystems.

Sandboxes

So your agent can execute code. Great. Now what happens when it decides to rm -rf / or makes an API call that racks up a $10,000 bill?

Sandboxes are isolated execution environments that let agents run code without access to your actual system. The agent thinks it has a full machine at its disposal, but it’s really playing in a padded room. If it does something destructive, you throw away the sandbox and spin up a new one.

E2B is one of the go-to options here. Their cloud sandboxes are purpose-built for AI agents. Daytona takes a similar approach with development environment sandboxes. For simpler use cases, spinning up a Docker container per agent execution works fine.

The key consideration is latency vs. isolation. A heavier sandbox (full VM) is more secure but slower to spin up. A lighter sandbox (container, WASM) is faster but has a thinner wall between the agent and your infrastructure. Pick based on what your agent is actually doing. An agent that writes and tests code needs strong isolation. An agent that just queries a read-only API… maybe not so much.

Operational and Infrastructure

This is the “keeping the lights on” layer, but for agents, it goes beyond traditional monitoring. When a standard API call fails, you get an error code and move on. When an agent fails mid-way through a 15-step workflow, you need to understand which step went wrong, why the agent made that decision, and whether the six steps it already completed need to be rolled back.

LLM Observability tools from the RAG era (Langfuse, LangSmith, Arize AI) have evolved to handle agentic traces, where a single user request might trigger dozens of LLM calls, tool invocations, and decision points.

Evaluation also gets harder. In a RAG system, you can measure retrieval relevance and answer accuracy. With agents, you also need to evaluate whether it picked the right tool, whether it broke the task down correctly, and whether it recovered when something failed. Braintrust and AgentOps are building evaluation frameworks for these agentic concerns.

Guardrails / Safety

Sandboxes control where agents execute. Guardrails control what they’re allowed to do.

On the input side, you need to worry about users trying to trick the agent into doing something it shouldn’t. Prompt injection is still a real and unsolved problem, and an agent with tool access makes it scarier than a chatbot that can only generate text.

On the output side, you need checks before anything reaches the user. Is the agent about to share sensitive data? Did it generate something harmful or off-topic?

Then there are execution-level controls. Rate limits on API calls. Spending caps. Approval workflows for high-stakes actions. This is the human-in-the-loop pattern, where certain actions require a human to say “yes, go ahead” before the agent proceeds.

Guardrails AI and NVIDIA NeMo Guardrails offer frameworks for defining these rules. Some agent frameworks bake in their own guardrail systems as well.

Closing Thoughts

If you’ve read both this post and From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack, (first, wow, you’re a champ. Second…) the pattern is clear: the stack keeps growing. What started as “pick a model and hook up a vector database” now includes tool protocols, sandboxed execution, memory systems, and safety controls layered on top.

Is it more complicated? Yes. But the capabilities justify it. A year ago, the best we could do was “ask a question, get an answer with some relevant context.” Today, agents can plan multi-step tasks, actually execute them, and course-correct when things go sideways.

As a one-time AI skeptic, I’m excited to see how AI agents transform the way we work. There is a big shift coming. In the same way Google changed the way we work, so will AI.

Getting your hands dirty with agents now is time well spent. The mental model for building with them is different from traditional software in ways that are hard to appreciate until you’ve tried it.

What I didn’t cover here is how to actually pick the right components for your use case. That’s a conversation about requirements, trade-offs, and budget that doesn’t fit neatly into a survey post. Maybe next time.