How LLMs Make Meaning
Context is King
Imagine I gave you two sentences to complete, "She borrowed books from the..." and "She borrowed money from the...". I would be confident you know how to finish them.
You'll recognise this is intuitive. We know the idea of ‘borrowing books’ is different from ‘borrowing money’, and this changes where they are borrowed from.
Large Language Models (LLMs) work in a similar way.
Modelling Meaning
By processing vast amounts of text, a model of the patterns of relationships between words in language is built. These patterns represent concepts, and these concepts are embodied in words via a set of numbers called 'embeddings'. This is how words become linked: not directly to each other, but through shared numerical patterns in these embeddings that represent concepts.
As the model processes the often unique sentences that you as a user create, a mechanism called 'attention' is used to reflect this context in a temporary set of number values for each word. Which is to say, the context of the sentence the word is in is used to create a unique value to reflect its refined meaning.
In our example, “She borrowed money from the…”, when the model gets to the word ‘the’, it has in a sense paid attention to the context of ‘borrowing money’.
From Meaning to Words
To decide the next word, the model takes rich meaning (numbers) from this process and compares it against the values of every other word it knows. The embedding for ‘bank’ will be the closest mathematical match and because of this, ‘bank’ is selected as the next word.
That an LLM can embody unique strings of concepts in these values is mind-bending and something worth reflecting on as you consider how to make best use of this extraordinary technology. However, never forget that at its core this is a probabilistic process, generating outputs and not retrieving facts, and your mental model for using it should always incorporate both the risks and opportunities this creates.