330 - Results Postmortem
To briefly consolidate the fruits of the now extremely long replies I received from my random number experiment: Most people, both when explicitly using the term “random” (which isn’t actually essential to the principle being demonstrated) and not, with minor word-choice variations, landed on either 23, 27, 37, or 42, excluding only when a model invoked code to generate a number, which most did not, and at least one falsely claimed to. The first few replies I received heavily emphasized the output "27", gradually diverging over time as more people began testing it.
Some interesting ways that people already explored this principle further were in adjusting the exact wording, applying it to the task of choosing a color (“turquoise” appears to be a favorite), and for vacation destination selection.
Matthew Kamerman also explored the “randomness” angle further with Co-Pilot, finding a strong tendency to avoid using digits more than once when generating “random” numbers between 1 and 1 billion, which also bias heavily against picking numbers with a lower number of digits than the maximum allowed. The anti-repetition bias is known to be hard-coded within some of these models, as it has been exploited before in Cybersecurity. These sorts of strong biases massively reduce the search space for an attacker, leaving systematic weaknesses. I’m very tempted to scrape the comment data from the replies I received into a spreadsheet to break down the numbers for the probability distributions, though I don’t trust the tools available for that purpose enough to install them.
While the technical side of things remained unsurprising, though still novel in some cases, the psychology behind the spread of very different reactions I personally found far more interesting.
Most people appear quite happy to test these things, as it falls loosely into the category of “playing with” these systems. However, Confirmation Bias, Belief Biases, and Base-rate Neglect play a heavy-handed role from there on out.
Among those who do test, most only test once, and many of those paraphrase, not using exact wording. This produces a narrow distribution of possible outputs any time code isn’t invoked for the “random” task, but a very wide range of reactions from the humans interpreting said data.
Most landed on 27 consistently at the start, gradually declining in frequency, though still remaining fairly high. Among those who didn’t, I’d estimate that perhaps ~20% of them quickly leapt to their own easily debunked conclusions, most obviously having not actually read the post or adjacent comments. This is consistent with “Your Brain on ChatGPT” and related findings in papers, where individuals may be suffering from “cognitive atrophy”, or it could simply be the elevated levels of more typical cognitive biases for other reasons.
Only 3 true trolls appeared, requiring me to block them, while the vast majority of people favored actual conversations, a few of which proved interesting.
Thank you to all who participated.