March 20, 2024

163 - Exploit Patterns

The best paper I can recommend this week is "Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks", which came to my attention when several of the people behind leading research in the domain got excited about it, including the author who first published on Indirect Prompt Injection, calling it his favorite paper this year.

After reading through and thinking about this paper, I can understand his excitement. It is another major leap forward in the cybersecurity space, as it not only reliably carries executable payloads through pre-processing pipelines like RAG and past "guardrails" (fraudulent "security/alignment" measures), but it also automatically discovers "new exploitable patterns and vulnerabilities in the input space of LLMs."

Importantly, those automatically discovered exploitable patterns are highly diverse, inline, and don't rely on the kinds of tags and non-ASCII characters that can be sanitized with simple filters. They also showed that this approach could "bootstrap" hand-crafted prompt injections, greatly boosting their effectiveness, while performing only slightly lower than the average automatically discovered and exploited attacks.

*Note: The authors also don't engage in Irresponsible Disclosure, refraining from collaboration with AI industry frauds, which speaks to their credit.

While the various frauds of the AI industry talk about the (95% imaginary) business value of LLMs, and things like RAG that are duct taped to them, hyping up capacities that they do zero research in and can never deliver, like non-trivial understanding, reasoning, alignment, and cybersecurity, an increasing stream of papers like this are being published.

The simple fact is that if LLMs had any shred of the capacities that snake oil peddlers claim then none of these attack vectors could perform as they do. All of the "What ifs" in the world can't compete with actual evidence.

Also, remember that all of these LLMs converge on the data they are fed. They fit a curve for that data, and although many processing pipeline steps and biases may be added to that curve-fitting, the result is that white-box and black-box LLMs, regardless of minor architectural differences and tweaks (like RAG, MoE, CoT, etc.), will largely be subject to the same core vulnerabilities. Not all attack methods are equally transferable, but every method that automates new vectors of attack discovery and generation offers cumulative value to the adversarial toolbox.

Every time a door like this opens the attack surface that may be automated against grows exponentially, because many of these attacks may be combined to achieve benefits and potency not possible with any one method in isolation.