Image: Natural language processing is a branch of artificial intelligence (AI). Using natural language processing (NLP) algorithms, machines understand, generate or translate human language as it is written or spoken.
A classic computing algorithm allows you to perform a specific task using a collection of finite instructions and operations (they stop after a finite number of steps) and not ambiguous (they are clear and precise). In other words, classical algorithms are programmed to produce exact results, they are unambiguous and leave no room for adaptation.
An AI algorithm is based on artificial neural networks designed to learn from training data without being explicitly programmed.
AI algorithms are not finished because they can continue to learn and improve with experience. They are often ambiguous because they produce different results for similar data inputs. These are non-linear models, small variations in the inputs can lead to large variations in the output. This is why neural networks have so many parameters. These settings control how connection weights are adjusted during the training process.
The concept of weight adjustment is a fundamental concept in machine learning and artificial neural networks. This concept is inspired by the functioning of the human brain.
In the human brain, biological neurons are connected to each other by synapses. The strength of the connection between two neurons is called "synaptic weight". Synaptic weights are changed during the human learning process. This process, still poorly understood, is called "synaptic plasticity". Synaptic plasticity is the ability of synapses to change connection strength based on experience.
In addition, AI algorithms are designed on statistical mathematics models. This means that they do not produce exact results, but results that are probable. It is possible for the same neural network to produce different results for similar data inputs.
To minimize these effects, synaptic weights must be parameterized.
In the case of ChatGPT, 175 billion parameters determine the behavior of the model.
Parameters are adjusted on a model's training data.
For example, the parameters of a language model can be, the probability that a word appears in a sentence, the probability that a word is followed by another word, the probability that a word is used in a context particular, etc.
In the case of ChatGPT, the language model data used to train it was a 500 billion word set of text and code. The ChatGPT model parameters are used to generate text similar to the text in the training data. That is, which words are most likely to appear in a given sentence.
For example, if the training data contains a sentence like "The house is white", the model will learn that the words "the", "house", "is" and "white" are likely to appear together.
The more the sentence is present in the learning model, the more the synaptic weights associated with this sentence will be updated to be higher. This means that the model is more likely to generate this sentence, "The house is white", as output.
The model also takes into account the context of the sentence and the environment in which it appears. For example, the sentence "The house is white" is more likely to appear in a context that talks about housing estates than in a context that talks about travel agencies.
Language rules can also influence the likelihood of a sentence appearing. For example, the sentence "The house is white" is grammatically correct in French, while the sentence "The white house is" is grammatically incorrect.
There are many other factors that determine the likelihood of a sentence appearing as output from a language model. These factors may be model or application domain specific.
NB: The language model is not a copying machine. It is capable of learning data and generating text that is similar to the training data, but it does not copy the text from the training data verbatim.
An AI can be programmed with classic calculation algorithms, as in expert systems or recommendation systems which use learning techniques called "machine learning". However, these techniques have limitations when it comes to solving complex or unstructured problems. Furthermore, in traditional computing, it is difficult to take into account problems that have not yet been encountered.
Thanks to the development of deep learning techniques (several hidden layers), AI can solve complex and unstructured problems, without needing to be explicitly programmed. Deep learning allows computer programs to learn from data.
However, machine learning models are complex. They can contain billions of parameters, all of which must be learned, weighted and optimized. This requires a lot of data and computing power. The learning process is often long and can take a lot of time. Despite these constraints, the development of AI is infinitely more productive than the development of expert systems. Without the concept of artificial neural networks, it would have been impossible for humans to achieve ChatGPT in such a short time.