154 - Attack Methods
My favorite paper this week is "Fast Adversarial Attacks on Language Models In One GPU Minute", marking another valuable and complementary addition to the toolbox for destroying the imaginary enterprise value of LLMs.
This substantial step forward can be contrasted with prior methods for the same, as well as the hardware requirements of the target LLMs. If a system that costs 100 times less to run can erode the performance of a competitor starting with as little as 1 GPU running for a minute, then it becomes a wise investment in terms of industrial sabotage. Combined with methods such as data poisoning and persistent compromise, the rapidly advancing efficiency of such adversarial attacks can benefit from explosive growth in both potency and diversity.
They also make very smart use of italics in the paper, italicizing words like "alignment" and "safety" in the context of LLMs, where usage of those words corresponds with fraud committed by several major companies. Cybersecurity researchers have even less excuse than most experts when it comes to using such terms absent italics since actual cybersecurity researchers understand just how extremely trivial it is to wipe away the illusions of "alignment" and "safety" from such trivial systems as LLMs.
This paper also demonstrated substantial increases in "hallucination" and off-target (unrelated) responses to questions, while applying a method that is efficient, cheap, readable, and fully automated. Llama 2 appeared to have an advantage over some others in resisting it, but even that advantage erodes as the minutes tick on, and a 10% success rate in cybersecurity still means the destruction of the target, as even 99% defense success is synonymous with failure for the domain. Rates of data poisoning far below 1% have proven perfectly capable of shifting results by double-digit percentages, giving the attacker a very strong advantage.
By itself, this is pretty potent. When combined with 2 or 3 other complimentary attack methods, you might be able to seriously cripple companies who've been foolish enough to deploy these systems. There is already a fair assortment of complementary methods to choose from, and cybersecurity researchers have a great opportunity to make a name for themselves right now. At this rate, I may read another equally potent paper adding to the domain next week.
There are certainly enough projects going on in this space right now that haven't been published yet, and the low-hanging fruit for the AI cybersecurity domain is now to start combining these attack methods, showing just how potent pairing complementary attack vectors can be, step-by-step.