Data Driving

Driving with Data

As of writing, GPT5 had finally been released. Before this day, a great many theorized about its future capabilities, of the jobs that would be on the line and the bright future we were headed to. Sam Altman even had the confidence to post a photo of the death star looming over a planet, no doubt alluding to what was to come. Instead, OpenAI’s newest iteration was more about lowering costs, consolidating resources and shifting its personality, not terribly unfamiliar to its GPT 4.5 release. Modest improvements for each paying user, at most, and a terrific letdown by a great many.

alt text

I’d be lying if I said I wasn’t excited to see what would come next as well. It truly is rare to have products generate anticipation in so many consumers!

It’s not OpenAI’s fault. It’s not Meta’s fault, it’s not even the scientists working at the company’s fault. Ultimately, it’s the very nature of data-driven models. Time to get a bit technical.

Data driven models are, by the name suggests, trained off of data. With machine learning as a whole, the concept is that given properly labelled data, that patterns and trends in any set of data invisible (or visible) to the human eye and mind can be identified and used to predict future datapoints, assuming there exists patterns and trends to latch onto. More technically, these systems operate by optimizing mathematical functions (specifically loss functions) over billions or even trillions of examples. A neural network is adjusted via gradient descent until it produces outputs that best fit the input/output relationships found in its training data.

Large Language Models (LLMs for short) are not much different. Transformers operate via sequential layers of attention mechanisms that determine which parts of the input data should influence which parts of the output^1.Rather than learning explicit rules about language, they learn statistical relationships between tokens—essentially, which words are likely to follow which other words, conditioned on context. It was a hunch that with sufficient scale, LLMs could learn enough to approximate what it means to reason and potentially transcend its own data, much like machine learning models attempt to, but pattern matching upon thought itself.

I do not believe there is a hard rule in the universe that suggests that one cannot inductively determine what a deductive or abductive mind would do in any given scenario. What is difficult is if you need training data, labelled or unlabelled, to do so. At its theoretical limits, an unassisted Large Language Model can only excel at what it has been trained to predict. This does not mean it cannot generate novel thoughts or sentences. Nor that it will not replace a great deal of our manual and intellectual labor. But in an important, and more physical than philosophical way, it cannot generalize.

Square Hole; Data-Driven Peg

I will pose one example, a philosophical analysis and perhaps studies that conduct a more sophisticated test of these claims. But imagine you need to determine how to wash a strange dish. It is shaped like a square, but with two small square shaped holes parallel to each other on opposite sides. The item has a hollow shell and tends to have things trapped inside from the fish tank you pulled it out of. The holes are big enough to pass a teaspoon through, but not a tablespoon. How would one go about cleaning this? An LLM would actually have no problem answering this question. Such a scenario is not particularly unique; cleaning unorthodox objects is a notorious complication in industry and home life. But what about a transformer-trained robot? And why would these two be different?

In order to determine how to clean a peculiar object, you must know what it means to clean an object. This technically depends on the individual and scenario; many believe that water and friction are good enough for many scenarios. So long as no particulates are visible, the dish is cleaned. The gold standard in households is soap and water, but more disaster states sometimes require bleach or other hard chemicals. For a fish tank, hard chemicals are ill-advised if it is to return to a fish tank, opting for vinegar and water.

Now, ChatGPT has quite the advantage here. Ultimately, the solution for problems like these exist across the net. For the robot, these are maneuvers that may never have occurred before, or with the context of this object. Which means any data-driven model attempting to solve this has to either:

  • Learn what it means to clean objects
  • Be taught what it means to clean this object.

The former is a general pattern match and rule-based conceptual system of dishwashing that suggests the model has not confused the forest for the trees, and that dishwashing is a thing it fully understands, with or without specific examples.

The latter suggests its neurons fizzle out on sufficiently unique cases, having only learned the similarities between scenarios of washing dishes, rather than its conceptual foundation and requires more examples to learn.

As I currently understand it, there is no reason that scenario 1 cannot occur. It is hard to wrap one’s head around and we don’t have the right architecture yet, but it could be possible. LLMs specifically have not accomplished this yet, this is certain. This is most clear with mathematics, as the rules are foundational enough for calculators to never fail, but even accounting for statistical variance of LLMs, it is clear that the abstract concept of arithmetic has not been derived through training alone. The same pattern appears in domains like storytelling or humor, where LLMs can imitate surface forms but often lack the deep coherence or self-reflection that true conceptual mastery would produce. Robots built off of transformers today will fail to clean a great many dishes.

Put simply, LLMs, by their nature, fail at two particular tasks: deduction and abduction, something us humans can accomplish rather plainly and effectively. These resources describe them in great detail, but without them: they are defined as the propensity to follow a rule-based logic system to derive facts, and an ability to generate probable post-hoc rationalizations for a given scenario, respectively. The consequences of such are better elaborated here, but I digress. If we grant that this is both a big deal, and currently unsolved, what do I pose? The better question is what have I discovered…