268 - Agents

“AI Agents” are receiving a lot of attention right now, but mostly because people fail to understand that a “Proof-of-Concept” (POC) on a toy problem in no way represents a real POC. This is because toy problems are narrow, carried out under tightly controlled experimental conditions, and subject to none of the real world’s messy complexity.

You can take the worst idea in any room attempting to brainstorm, build a toy problem around it, and produce some “faux-POC” with a heavy dose of confirmation bias to make it look viable. One strong example of this is the over 100 “cognitive architectures” that had already been proposed and/or demonstrated with toy systems by 2020, only one of which ever actually worked in the real world for stages beyond a toy system.

If you have a PR department aimed at fraud and backed by billions of dollars in VC funding, like at least two well-known startups today, then you can make a lot of people believe in complete garbage. Thus far, no one has gone to prison for doing this, so there is no incentive for companies not to do this, and every incentive for them to double down on it.

A big part of the problem for “AI Agents” is that navigating real-world complexity in many-step processes requires the combination of discrete data, processes, and states that you won’t find in any LLM or RL-driven system. “AI Agents” aren’t viable when combined with narrow optimizers and an absence of unharmed and persistent graph structure for 99.9% of use cases.

This is because they scale exponentially poorly across dimensions of time/steps and complexity, which compound upon one another at a doubly exponential rate. When combined with biasing mechanisms this damage may be obscured from the measurement of any given KPI, and within toy problems given a very high degree of hand-engineering they may be clamped down and more or less mitigated, but in real-world complexity these problems grow unchecked and often unseen.

A hammer is exceptionally useful when faced with a nail, but a hammer is not a spoon, so it will never hold water. “GenAI” is a hammer, just one tool in the toolbox, useful for very niche problems.

People try to paint that hammer many colors (CoT, MoE, RAG, RLHF, etc.) and pretend that it can serve many functions, but actually using it for those functions looks no more or less absurd than trying to eat soup with a hammer.

“AI Agents” built on LLMs and RL are just a tornado of painted hammers flying around narrow optimizers, never escaping the fundamental uses for which a hammer is viable, only increasing in numbers and swings, unchanging in viability. The tornado might pick up a bit of soup, spraying it everywhere, and the cost will tend to be much higher than anyone can afford.

AI Agents