July 5, 2025

326 - Agents in a Nutshell

What we know about LLMs and “Agents” today, in a Nutshell:

The top 10% of your employees, people who are already experts at what they do, can potentially benefit from some tools like LLMs (but not “agents”), depending on the specific use case. This has been positively demonstrated in meta-material discovery during a multi-year study.
The same tool use by the other 90%, or worse yet, full automation, invariably produces a myriad of engineering, process, and cognitive debts that often explode before they're even noticed. The Cybersecurity domain has been flooded with examples of engineering debt via vulnerabilities being generated and linking to fictional dependency repositories (many of the common ones now filled with malware).
Dependence of non-experts on these tools (the other 90%) leads to the now demonstrated and entirely predictable cognitive decline of that group, meaning that they not only cause those substantial debts today, they'll cause worse tomorrow.
AI Benchmarks have been systematically gamed by every major company in virtually every possible way, to the point where their leaderboards are often inversely correlated with actual performance. Examples have included data contamination, "synthetic data" (also contamination), outright paying for benchmarks and exclusive access to them, and other forms of corruption and cheating.
AI "Agents", even from the most overtly fraudulent companies in the domain, have demonstrated such an absurd lack of actual real-world performance as to be satirical, to the point where they can't even manage a simple vending machine. We can laugh as "Claudius" "hallucinates" non-stop and tanks a toy business, but it won't be a laughing matter if your company is dumb enough to deploy such an "agent".
While it shouldn't need to be said, LLMs and derivative systems ARE NOT SEARCH ENGINES, they cannot perform any actual search function, and injecting them into such functions causes BS to filter into any results. Much like you don't want 34% raw sewage leaking into your morning coffee, even the 34% "hallucination" of Perplexity AI is no more acceptable for search functionality. (For comparison, Grok is 94%, so nearly pure bile)

LLMs can be useful tools, but only in the hands of people who are already experts. Bragging about how many millions of people are using them inevitably means that the overwhelming majority are suffering damage, causing it, or both, through their usage of said tools.

"Agents" are the degenerative extreme of these tools, not even useful to experts due to the extended sequences of actions going largely unchecked as they drift into absolute absurdity. It is technically possible to build an "agent" that is so tightly bounded as to prevent this, but exclusively on toy problems where older conventional forms of AI would be far easier and simpler to implement. "AI employees" based on LLMs are fraud in no uncertain terms.