021 - Assymetric Trust

Trust is a critical component that human society requires to function.

It is asymmetrical, in that it is much harder to rebuild than it is to lose. Trust is currently being lost at an alarming pace in many domains, a process accelerated by "generative AI". While criticisms of fraud in peer review are often met with skepticism, Nature, one of the few remaining unquestioned journals, has recently released a damning article on the topic.

Randomized controlled trial (RCT) studies in medicine are one of the most serious kinds of peer review studies anyone can publish. These research papers face some of the greatest scrutiny among the vast variety of papers that get published today...and yet:

"...44% of these trials contained at least some flawed data: impossible statistics, incorrect calculations or duplicated numbers or figures, for instance. And 26% of the papers had problems that were so widespread that the trial was impossible to trust, he judged - either because the authors were incompetent, or because they had faked the data."

These are some of the most serious peer-reviewed medical papers, and this is only the subset of those papers that willingly shared the raw data. How many fraudulent papers simply didn't submit their raw data?

Even if these numbers held steady across the papers that never submitted their raw data, this paints a very grim picture for sources like ArXiv.org, where there is no peer review, only a fraternity-like system gating entry. Setting aside that the choice of gating mechanism encourages bad actors and discourages credible submissions, the contents of these pre-print repos circulate freely.

This is partly an artifact of archaically slow peer-review publishing processes, unable to keep up with the pace of modern industries. However, it has many consequences. ArXiv papers can easily mirror the dynamics of systematic reviews that were contaminated with falsified data from papers and studies they cite, and with such a high volume of contamination, the odds of this can be extremely high.

Two of the core capacities the ICOM technology stack has demonstrated a strong aptitude for are the ability to learn human-like concepts from any data source, and the ability to critically examine those concepts and sources through the scientific method. To address the problem of academic "paper mills", and other forms of fraud, both capacities are required.

From a statistical perspective, if the most scrutinized medical peer review data is between 26 and 44% contamination/noise, and AI papers in ArXiv are likely in even worse shape, how much clarity might be gained by removing that noise?

Many answers may be buried under the rubble of #misinformation, but some serious spring cleaning is required.