The Attention is All You Need paper started the LLM gold rush, but it initially came from Google scientists who only expected it to do language translation. As more data was used in training sets, and the number of parameters grew orders of magnitude it was less easy to understand what the model was doing.
Emergent Instruction Following
In Ars Technica from 2023 we have this summary of how the “emergent” AI behaviour was being seen popularly, in the press:
The biggest breakthrough came in the jump from GPT2 to GPT3 in 2020. GPT2 had about 1.5 billion parameters, which would easily fit in the memory of a consumer graphics card. GPT3 was 100 times bigger, with 175 billion parameters in its largest manifestation. GPT3 was much better than GPT2. It can write entire essays that are internally consistent and almost indistinguishable from human writing.
But there was also a surprise. The OpenAI researchers discovered that in making the models bigger, they didn’t just get better at producing text. The models could learn entirely new behaviors simply by being shown new training data. In particular, the researchers discovered that GPT3 could be trained to follow instructions in plain English without having to explicitly design the model that way.
Instead of training specific, individual models to summarize a paragraph or rewrite text in a specific style, you can use GPT-3 to do so simply by typing a request. You can type “summarize the following paragraph” into GPT3, and it will comply. You can tell it, “Rewrite this paragraph in the style of Ernest Hemingway,” and it will take a long, wordy block of text and strip it down to its essence.
They are I believe talking about RLHF post-training when they say “trained to follow instructions in plain English” (my emphasis).
Abuse of Academic Privileges for Corporate Gain
But as I say in the video, this should not be surprising. Datasets like this exams one from 2022, are on the public internet for the taking.
Even worse though is that via academic library access, researchers at OpenAI could get huge datasets from universities, and governments relating to education and training. Of course the claim was that it is all to further science. Fast forward a couple of years and Sam Altman has taken all that and privatised it for profit.
A little bit of Transformer Architecture
The absolutely excellent 3-blue-1-brown YouTuber Grant Sanderson has produced some stellar explainers on the architecture of Transformer LLMs.
The important take-away from this diagram is that the “magical” attention layer is tuning weights based on queries into positionally-coded information across the training set being fed in. Even for big training runs. This means that the model gains a surface level structural mapping of how say a question and answer dialogue is laid out.
In this screenshot we see inside the Multi-layer perceptron. There are many of these stacked in the transformer. But it’s important to understand a “perceptron” is just a very basic mathematical equation. The inputs to the equations are the current weights in the nodes (depicted as small spheres here) and the incoming data fed-forward through the layer, multiplying out to the final resulting tensor (shown as a “matrix” on the right in white square brackets).
These are all floating point numbers. There is zero correlation with how a human brain works here — impressive as this is, a perceptron is not a human neurone. One of the biggest differences is that once the weights in the perceptron are set — the data from the training set is encoded, during the pre-training phase, and after in RLHF — it does not change.
A GPT trained in 2022 cannot spontaneously learn things that happened in 2026. There are tons of hacks to make it seem as though they can but they don’t.
When You have Very Very Large sets of Weights you have lots of Attention Encoded
Very large data sets encoded into LLMs means attention processes are capturing juxtapositions of all sorts of documents, and inputs. Associations. Rules about what follows what, that can seem like an understanding.
The important things to understand about this are:
* There is no reasoning here, GPTs do not reason they predict next tokens
* The encoding is lossy and sampling biases appear
* Patterns can be captured automatically like math problem solving, or sycophantic chatting as if between good friends
RLHF alignment is the process of using human feedback to coax the pre-trained GPT toward a different weighting that suits the goals of the LLM vendor building the GPT, while still leveraging the encoded information.
Lossy Encoding - to try to explain this, look at an analogous process of an old-school convolutional neural network encoding (warning - this is not how LLMs work - the Perceptrons there are massive):
This image is from Stanfords Deep Learning tutorial. It shows how feature extraction works as 3x3 convolution is mapped over the source image. This could extract edges, or other features that are then used in deeper layers of the CNN.
The analogy I want to create is that when creating the initial embedding by automatically training from datasets, the LLMs are effectively running a sampling window over the entirety of the data. There are massive differences between an LLM and a CNN, but the point is the losses - when you capture a feature by definition you throw away the other information; if you have a much deeper layer that can detect faces or smiles; or guns, tanks and soldiers; then it cannot understand other aspects of the image.
In an LLM its not being told what to ignore, but it is attending to the context by making queries across to other positions, and thus its averaging out the values in its weights to capture an encoding of the data set. These losses cannot be understated.
An LLM is like that know-it-all person at a party who’s skimmed everything, but knows nothing in depth, and certainly doesn’t know anything beyond superficial “this appears after that” associations.
Why Claims of Emergent Behaviour are Dangerous
Vendors of AI SaaS products have been constantly engaged in drumming up mystique and wonder around their products. At times they’ve insinuated that they have no idea what their products are capable, of and they could “achieve AGI” (a marketing term for anything that makes us money) and end humanity. Wow — who wouldn’t want that behaviour to emerge.
So “emergent behaviour” as a narrative is like declaring their LLM factory as a goldmine. A never-ending bonanza of free technology upgrades, that justifies more and more investment so that bigger and bigger models can be built.
Who knows what exciting behaviour will emerge next, they enthuse.
Regarding so-called emergent instruction following behaviour:
* It required alignment training during the RLHF phase of model development
* It required users to add in extra context during the model session
* And its bounded by the datasets available (as I say in my video)
If you don’t have instructions and text showing those instructions being followed in a document set that is fed into a training corpus, the model cannot learn that certain text follows from those instructions.
Generative AI is a next token predictor engine. It maps what you prompt onto what should follow, based on its limited understanding. It may have additional context being fed into it as RAG, from a vector store — depending on what kind of LLM setup we are talking about.
But believing that LLMs are somehow reasoning intelligently about their outputs is dangerously anthropomorphic.
Its also corrupting the ethical responsibility of the vendors of these SaaS products as they can claim anything bad was “unpredictable” and “emergent” - they will go off and create yet more system prompts (a kind of ground level context added to every query) in the hope that the bad thing won’t happen again.
Conclusion
Fight bad framing. This notion of “emergent” behaviour is yet another marketing gimmick at this stage.
Emergent behaviour was the idea that sudden big jumps in capability occur when you pump in more money and more data. It’s like a gambling addict whose eyes light up at the spinning dials of the one-armed bandit. But as further studies show this behaviour is incremental in all likelihood and in large part comes from context, and — as I argue above — directly from the data.
Emergent behaviour feeds into the self-serving narratives of the generative AI vendor CEOs who want to have their investors and the public — via idiotic and corrupt billionaire welfare — keep bailing out their failing companies.
And I call BS on that.