334 - Heuristic Skewing

One of the risks of associated with the prolific use of weak AI at great scales comes from the “popular” answers being constantly reinforced, while ignoring the “good”, “reasoned”, “random”, or “factual” answers to the same. This becomes an increasing problem over time for the very same reason, as it makes the “popular” answers more popular, further skewing the distribution of internet data via generated AI slop.

Many people are already familiar with the concept of “model collapse”, and some weak stop-gap measures have been shown to slow that danger in the research literature, though absurd funds thrown at scaling won’t outpace even model collapse alone for much longer. In model collapse the “artifacts”, inherent flaws in the generation process, are steadily increased, as might be seen in the more familiar examples of genetics through inbreeding.

Factor in this “Heuristic Skewing”, where the high point in any distribution, the “most popular” answer, is gradually favored ever-more heavily, and you have a potential accelerant for the first problem. That comorbidity also extends to many matters of social engineering, where large groups of people are systematically manipulated, now through the additional attack vector of chatbot responses.

For examples, we need look no further than two of the worst use cases for “GenAI” to come out of 2023’s first wave of hype, “medical & therapy chatbots”, and “AI search”. In the case of medical diagnostics and psychological therapy, a distribution which takes the 51%+ currently popular answer and begins to skew it upward over time reliably creates something like a self-reinforcing cognitive bias. That bias favoring those increasingly popular peaks offers increasingly and systematically wrong answers, all else being equal.

Apply the same dynamics to “AI Search” and you get all of the same problems, but with many new additions. Unlike the medical and therapy bots, search functions are full of feedback signals, such as all of the websites that they direct increasing traffic toward, which may then find ways of recursively reinforcing these patterns. This means that while the medical bot likely only damages the distributions that it touches internally, the “AI search” damages the real distributions found online through the increasingly poor selection quality of websites.

A reminder, LLMs and their “GenAI” derivatives are not, and never will be, “search engines”, as any expert can rant on at some length. However, attempts to use them as such do contribute to this form of damage across the information ecosystems of the internet.

This increasing emphasis on points of popularity also increasingly drags any associated data in, like a gravity well, as the distributions are further skewed and the skewing effect branches out.

Likewise, any popularity-based contest may become heavily impacted by this over time, providing extra incentives for bad actors to intentionally skew the numbers.

Comments:

This means increasing Cybersecurity risks by increasingly predictable “AI-assisted or generated” activity, and increasingly captive audiences with decreasing awareness of their own captivity.

“A rising tide raises all boats”, but a tide of AI slop is more like a sewer system flooding your schools, and you probably don’t want your kids touching anything floating in that.