August 5, 2023

022 - Credibility Crisis

A recent prominent trend in AI research papers has been to use GPT-4 to grade the results of systems, including having GPT-4 grade its own results.

Even someone with no technical background in AI can see how spectacularly terrible this is.

People familiar with the phrase "Garbage In, Garbage Out" (GIGO), likely feel an even deeper sense of contempt and revulsion upon hearing that this method was used. However, it gets worse.

Having GPT-4 grade results demonstrates negative credibility to begin with, but when you factor in that the model changes over time you also introduce further serious issues for reproducibility. Regardless of how people may feel about whether or not that model is degrading over time, any inconsistency causes this issue, not just declining performance.

Garbage generators are quickly flooding #AI #research with toxic waste. Even when the text of the papers isn't AI-generated, if the results are graded by those systems then a scientific crime has been committed. The damage is then multiplied as subsequent models are further contaminated by output from the first.

If research is to retain any shred of credibility, this must change, and soon.

Even if this shift toward extreme laziness were a one-way road, at a bare minimum systems that are actually capable of assisting in the research process could be used in place of chatbots like GPT-4. The ICOM technology stack offers a number of strong benefits in this domain. One of the use cases currently planned is specifically to review and build humanity's deepest, broadest, and most robust understanding of peer-reviewed material across multiple domains.

For neural networks and #GPT-based systems, such a thing would be architecturally impossible in any meaningful sense. However, by adding the human-like motivational system, memory, and dynamic concept learning of ICOM-based systems dynamic growth across arbitrary domains and petabyte scale becomes possible. These capacities have been demonstrated, and the scalability engineering work is in progress.

They can apply the scientific method to scrutinize results, propose hypotheses, and generally lessen the burden placed on the peer review process by raising the bar and reducing inconsistencies. Separate systems could also be used in assisting the research process and scrutinizing submissions for peer review. This could add further security and validation to the process, while also extracting more added value than the associated overhead costs would entail.

How soon this technology can be deployed and scaled will depend on how quickly proper funding for it may be secured. One thing we can be certain of between now and then is that the cost of the status quo is a thing that humanity cannot afford.

The phrase "Publish or Perish" has often been used to describe a set of archaic practices still used in some academic and industry circles today, which contribute to this. As Edward O. Wilson famously put it:

"The real problem of humanity is the following: We have Paleolithic emotions, medieval institutions and godlike technology. And it is terrifically dangerous, and it is now approaching a point of crisis overall."

*For more on what happens when Garbage Generators eat large quantities of AI-generated content, see: Self-Consuming Generative Models Go MAD