261 - Training Data
“Training Data” is a term that is ubiquitous in the AI/ML space, to such a point that it is assumed to be a deciding factor in everything by default. However, systems that aren’t fundamentally built to fit the curve of data distributions don’t inherently require “Training Data”, and some of those systems vastly outperform neural networks in the real world.
Somewhere someone probably just spit out their coffee in surprise and confusion, as the concept of systems that don’t rely on training data is entirely alien to many in AI/ML. Most wouldn’t have the faintest idea of where to even begin trying to build such systems absent the crutch of training data.
Normally a large body of data, often many datasets pooled together, is used to train a neural network, where a process of brute-force math is applied to turn that data into a set of “weights” for the neural network. This turns the weights into a fuzzy representation of the data that was fed into it, curve-fitting to the distributions.
However, an entirely different set of dynamics is not only possible but frequently preferable. Three key factors are required to make the alternative work, a graph database structure, a human-like motivational system embedded within that structure, and a working cognitive architecture to process both the data in the database and the motivational values attached to all nodes and surfaces within it.
What this entirely different architecture and set of dynamics allows is for human-like data efficiency in the learning process (an over 10,000x improvement compared to neural networks), as well as graph-native structure which preserves kinds of value that can’t be reliably stored in weights. This architecture has held cutting-edge status for half a decade now, and even with many billions of dollars wasted companies like OpenAI are no closer to catching up than they were 5 years ago.
A useful litmus test to keep in mind related to the term “Training Data”, is that it is mutually exclusive with the term “General Intelligence”. If someone is using both terms to refer to the same system, you can reliably determine that they either don’t have the faintest clue what they’re talking about or that they are a bad actor.
Neural networks have many viable use cases, but they are a hammer, not the entire toolbox. When one of our systems exceeded a score of 80% on the ARC-AGI challenge it didn’t use “training data” at all, whereas OpenAI’s latest massive model could only manage 21% on the same benchmark, while burning orders of magnitude more compute.
This is the difference that using the right architecture for the job makes, the orders of magnitude separating success from failure.