AI Agent Development: Key Takeaways from 100+ Builds

It can be really tough to figure out what's genuinely useful and what's just a lot of hot air in the world of AI.

We're constantly trying to sift through the excitement and the uncertainty, distinguishing true signals from background noise, and ultimately, work out what actually makes sense for specific industries.

Having built AI agents for various sectors, we've had to look at their underlying architecture from a slightly different angle.

Large language models (LLMs) tend to perform best, where folks are a bit more tolerant of risk, and where it's easier to check if things are working correctly.

And the big lesson we've learned is that what works brilliantly in one area doesn't necessarily translate across to others.

Every industry, every specialism, has its own unique set of limitations, its own risk profile, and its own messy data reality.

So What Makes an AI Agent Tick?

When we talk about AI agents and these sophisticated multi-agent systems, we're not talking about one single magical entity. Instead, think of them as a clever blend of several key components.

Old School AI: Traditional ML

First up, we've got the stuff that was around long before generative AI burst onto the scene. This is your classic AI and machine learning (ML), and it's still incredibly important, sometimes even more so than the newer tech.

This includes things like:

Machine Learning algorithms: All those clever bits of maths that help computers learn from data, like regression (predicting numbers), classifiers (sorting things into categories), and clustering (finding groups).
Predictive models: Systems that try to guess what's going to happen next based on past data.
Recommendation systems: You know, like when Netflix suggests your next binge-watch or Amazon recommends a product you may like.
Custom algorithms and statistical models: Tailor-made solutions for specific problems.
Domain-specific ML pipelines: Specialised sequences of ML processes designed for particular industries.

Here's the thing: not every company that says it "uses AI" is actually playing in the agent space. Many are just running these deterministic ML pipelines or analytical engines.

The real magic happens when you start combining these traditional methods with human oversight and modern agent architecture. Imagine taking insights from these models and then dynamically acting on them, much faster, with a human guiding the overall direction. It's about intelligence driving action, with humans still firmly in the driver's seat. This is where you start to see the true potential of human-AI skill fusion.

The Logic Machine: Workflows and Automations

Next, we have workflows, bots, and automations. Now, these aren't "AI agents" in the modern sense; they're essentially just logic. But they're a crucial part of any agent system, though the level of their involvement depends heavily on the task and domain.

We're talking about:

Hard-coded flows: Simple "if X, then do Y" instructions.
Sequential workflows: Tasks that happen one after the other.
Parallel task runners: Things happening at the same time.
Bots and rule-based automations: Think of these as digital assistants following a strict script.
Multi-step deterministic procedures: Pretty much what it says on the tin, a set of defined steps.
Robotic Process Automation (RPA): Software robots mimicking human actions to automate repetitive tasks.

These might look like agents, but they're quite rigid. They can't adapt, they can't improvise, and they certainly don't "think". They just follow instructions, and if you need any customisation or strategic changes, you have to manually tweak them. For operations that need a bit more flexibility, you'll want to layer well-designed agents or agentic architecture on top of these workflows. It's similar to how Google is now integrating new AI features into Android, adding intelligent layers to existing functionality.

The Brains of the Operation: Generative AI

This is arguably the most important piece of the puzzle for modern agents. At a minimum, any agentic system needs at least one of these hooked up. It's a complex and fast-moving area, so let's break it down a bit further.

#### Foundations: Large Language Models and Beyond

This sub-category includes the big names you hear about all the time:

LLMs: Think GPT, Claude, Gemini, Llama, and all those open-source models.
Vision and Multimodal models: AI that can understand and generate content not just from text, but also images, video, and other types of data.
RLHF (Reinforcement Learning from Human Feedback) and instruction-tuned variants: Models that have been specifically trained to follow instructions and align with human preferences.

These LLMs are generally brilliant at things like coding and content creation. However, their performance can really drop off a cliff when the training data is sparse, biased, unstructured, or proprietary. That's when you get incorrect and overly confident outputs, especially when the real-world operational data isn't publicly available. This is a common issue that contributes to what some call 'AI slop', where poor quality data leads to poor quality outputs.

#### Remembering Things: Context Engineering and Memory Systems

Here's a crucial point: LLMs don't have a native memory. Every single request you make to them starts fresh, with no recollection of previous chats. Context is an engineering layer you add on top, it's not inherent to the model itself. This is absolutely vital for designing good human-in-the-loop systems and user experiences.

This involves:

Conversation history: Keeping track of what's been said before.
Long-term memory: Storing information over extended periods.
RAG (Retrieval-Augmented Generation): A clever technique where the AI pulls relevant information from a knowledge base before generating a response.
Knowledge graphs: Structured networks of information that help the AI understand relationships.
Document ingestion: Getting data into the system from various documents.
Summaries, compression, and distillation pipelines: Methods to condense and refine information.
Fine-grained context injection: Precisely feeding the AI the exact information it needs.
Domain knowledge bases: Specialised libraries of information for a particular field.
Structured memory orchestration: Organising and managing how the AI accesses and uses its memory.

It's fascinating just how much memory systems can influence an agent's behaviour. You could have two agents using the exact same LLM and workflow, but if they have different memory stacks, they'll act like completely different entities. This is a key reason why understanding the hidden limits of consumer AI chatbots is so important.

#### The "Thinking" Layer: Reasoning Frameworks

This is where the agent figures out how to solve a problem. It's about structuring the model's approach, rather than just its outputs.

This can include:

Chain-of-thought prompting: Guiding the AI to break down problems step-by-step.
Self-reflection loops: Allowing the AI to review and refine its own work.
Task decomposition patterns: Breaking large tasks into smaller, manageable ones.
Planning scaffolds: Providing a framework for the AI to plan its actions.
Multi-step reasoning architectures: Designing systems that can handle complex, multi-stage thought processes.
Dynamic, model-generated "thinking" workflows: Where the AI actually creates its own plan of action on the fly.

It's important to differentiate: workflows are fixed logic, but reasoning is about dynamic, model-generated logic. The need for this reasoning ability varies a lot. It's brilliant for creative tasks, coding, or anything with a lot of variables. However, reasoning alone won't fix poor quality training data. An LLM can "reason" its way to a completely wrong answer with absolute confidence if it doesn't have a solid foundation of domain knowledge. As one expert notes, "reasoning doesn’t fix weak training data—it can just as easily lead you to the wrong answer with 100% confidence and sound logic."^ https://www.forbes.com/sites/forbestechcouncil/2023/11/20/the-rise-of-ai-agents-unleashing-autonomous-intelligence/

Agents: The Grand Fusion

So, an AI agent isn't a single component; it's a sophisticated blend, a fusion of all these buckets:

A foundational model (3A)
Context engineering and human-in-the-loop design (3B)
Reasoning (3C)
Workflows (Component 2)
Sometimes, traditional ML or analytics (Component 1)
Specific domain instructions
An understanding of its current 'state'
A toolkit of available actions

The real challenge is that highly specialised domains demand completely different combinations of these components. For example, building an AI to generate eye-catching YouTube thumbnails will need a different fusion than an agent designed for complex financial analysis.

Why Architecture Varies So Much

One of the biggest factors dictating agent architecture is the training data. If you're building an agent for coding or content creation, you've got vast datasets available, leading to really strong performance. But for areas like advertising, legal work, finance, healthcare, or taxes, the available data is often small, unstructured, or proprietary. This means the "out-of-the-box" quality from a general LLM is usually much lower.

Also, LLMs are always playing catch-up; they're usually 6-12 months behind the present moment. Without clever techniques like retrieval-augmented generation (RAG), grounding, or fine-tuning, agents can perform quite poorly.

Then there's risk tolerance. Industries with a low tolerance for mistakes, where the stakes are high, need far more agents, more constraints, more workflows, and more human checkpoints. This isn't about over-engineering; it's about responsible engineering. Think about the careful governance frameworks developing in regions like ASEAN or Latin America; they reflect this need for careful, responsible AI deployment.

Clearing Up the Confusion

A lot of the current discussion around AI agents is heavily skewed by coding and content-creation use cases, where models perform really well from the get-go. This can lead to some misleading assumptions:

Just because an architecture works for coding, it doesn't mean it'll work everywhere else.
The lines between agents, agentic workflows, traditional AI/ML, reasoning, context engineering, and generative AI models aren't always clear to outsiders. This is why understanding the difference between Small vs. Large Language Models is so crucial.
Training data bias is absolutely critical for architecture, and this gap isn't going away anytime soon.
Multi-agent designs might look overly complex, but in specialised, high-risk industries, they're a necessity.

The Advertising Example: Why More Agents are Better

Let's take advertising operations as an example. Why does it need so many agents?

Sparse operational training data: Models might know about ad content or documentation, but they often lack the nitty-gritty details of execution, strategy, platform configurations, or actual performance data.
Biased platform knowledge: Documentation often pushes what benefits the platform, not necessarily what's best for the client. You need deep domain expertise to truly understand effective operational approaches.
Low verifiability: Success can be slow to appear, attribution is often murky, and platforms can be opaque.
High risk: We're talking about real money, so mistakes can snowball and are hard to fix mid-campaign.
Deeply contextual: What works depends on the brand, industry, budget, channel, and timing.

This means you need more memory, more scaffolding, more workflow constraints, more specialised agents (each handling specific toolsets), and many more human-in-the-loop checkpoints. It's a far cry from a simple content generation task.

The real game-changer isn't just automation; it's about bridging those resource-prohibitive gaps. Experienced professionals have always known what should be done: more granular bid adjustments, faster optimisation cycles, real-time cross-platform balancing, and large-scale multivariate testing. These weren't impossible, but they weren't worth the human effort given the constraints.

Agents are changing that equation. They allow us to:

Execute tactics that were previously "not worth an analyst's time".
Manage complexity that was impossible to handle manually at scale.
Achieve testing velocity that wasn't operationally feasible.
Perform execution at a granularity that would otherwise require ten times the team.

The folks who truly understand these operational gaps, and where resources fall short, are the ones best placed to help design and fine-tune multi-agent architectures that genuinely deliver value. It's about empowering humans to do more by offloading the repetitive, complex, or time-consuming tasks to intelligent systems.

Latest Comments (4)

Dewi Sari@dewisari

3 January 2026

when you mention "old school AI" being more important sometimes, that really resonates. i'm self-teaching ML at my media company and sometimes it feels like everyone's just hyping generative AI. but i've found that simple classification models are way more effective for some of the ad-hoc reporting requests i get, rather than trying to force a huge LLM to do it.

Sarah Chen@sarachen

18 December 2025

The emphasis on "Old School AI" and traditional ML is salient, particularly regarding regression, classification, and clustering. Given the varied risk tolerance across industries, as noted for LLMs, how were ethical considerations such as dataset bias and algorithmic fairness addressed when integrating these conventional ML components into the agent architectures? Specifically, how did the differences in data realities and risk profiles between industries impact the techniques used to mitigate potential harms, especially in domains less tolerant of error or where verifiable outcomes are harder to establish?

Sakura Nakamura@sakuran

14 December 2025

@sakuran The article points out how traditional ML is still incredibly important, sometimes more so than newer tech, especially with things like recommendation systems. This is something we've observed in Japan as well, particularly in e-commerce. My question is, when you're blending these "old school" ML approaches with the newer LLM-based agents, how do you manage the integration complexity? Are you seeing a performance hit or increased development time trying to make these disparate systems communicate effectively, especially given that "what works brilliantly in one area doesn't necessarily translate across to others"? We've found that seamless integration can be a real hurdle.

Ryota Ito@ryota

13 December 2025

this is so true about the "old school AI" still being key! in japan, with the language nuances, i've had a lot more success combining custom ml pipelines for specific data preprocessing before even touching the llm. especially for sentiment analysis in japanese, the traditional classifiers are still way more reliable than just prompting an llm directly.

Cookie Consent

By Year-End We Will Have Built 100+ Agents Across Three Industries — Here Are the Takeaways

So What Makes an AI Agent Tick?

Old School AI: Traditional ML

The Logic Machine: Workflows and Automations

The Brains of the Operation: Generative AI

Agents: The Grand Fusion

Why Architecture Varies So Much

Clearing Up the Confusion

The Advertising Example: Why More Agents are Better

Share your thoughts

Floating Data Centres Tackle Energy Crisis

This is a developing story

You Might Also Like

Floating Data Centres Tackle Energy Crisis

Small Business Wins in the AI Era

Free ChatGPT's True Cost Revealed

The Asian Honeymoon Is Over: Why Workers Are Losing Faith in AI

Comments (4)

Latest Comments (4)

Leave a Comment