Image description: Generative AIs (GPT-3, Copilot, Gemini, Gopher, Chinchilla, PaLM, Human, etc.) train on large datasets (texts, images, audio, or videos) produced by humans. However, these AIs go "mad" when they generate their own training data. Source image astronoo AI.
The concept of "Self-Consuming Generative Models Go Mad" (Self-Consuming Generative Models Go Mad) refers, in the field of artificial intelligence, to the production of training data by the AI itself.
Generative models are algorithms that learn to generate new data by "imitating" a set of training data produced by humans. The production of training data is costly and time-consuming. It requires collecting the data, cleaning it, annotating it, and formatting it so that it can be used correctly by the AI.
Scientists could not resist the temptation to use the synthetic data generated by the generative models themselves to form new models more quickly.
The central idea is to create a generative model capable of producing its own training data. This process is then iterated, with the model refining itself and becoming increasingly capable of generating complex, numerous, and novel data.
The imagined advantages are numerous. First, the model is not limited by the amount of initial training data. It can thus explore unknown domains and accidentally discover new concepts. Moreover, the model, through its own self-supervised learning, could refine itself and improve its performance iteratively. For example, the model could generate novel molecular structures as candidates for new drugs.
However, there is a huge challenge associated with this approach.
Self-Consuming Generative Models Go Mad is a phenomenon that occurs when generative AI models train on synthetic data produced by other generative AI models, thus creating self-consuming loops. When an AI tries to learn content generated by an AI, it goes mad.
The repetition of this process, poorly understood, creates a self-consuming loop whose training data becomes chaotic. Moreover, it is tempting to use synthetic data to augment or replace real data due to a lack of available data. In other words, when the model does not have enough fresh real data at each generation of a self-consuming loop, future generative models are doomed to fail.
This process of self-consumption leads to a gradual decrease in quality and a dilution of the diversity of the generated content. This is characterized by a degeneration of the process, where the model begins to produce increasingly incoherent and redundant outputs.
If the model is not exposed to a sufficient variety of examples, it struggles to learn significant patterns and thus falls back on repetitive productions. Similarly, if the model is encouraged to focus solely on optimizing its own production, it can drift away from reality and generate increasingly aberrant outputs. Finally, the model will tend to overfit its responses to the training data. It begins to memorize insignificant details and loses its ability to generalize to new examples. Moreover, it can reproduce its own biases and shortcomings indefinitely.
In some theoretical scenarios, generative models could go "mad" or malfunction in an unpredictable, potentially self-destructive manner. For example, a generative model might prioritize "novelty," and this relentless quest could push it to explore increasingly unknown territories.
The lack of regulation exposes the model to a runaway situation where the content could become increasingly extreme, offensive, disturbing, or shocking, bordering on the unacceptable. We might no longer understand or interpret the results generated by the model.
This speculative notion highlights the potential concerns associated with the use of autonomous or insufficiently controlled AI models. Although it may seem like a science fiction idea, it is an important consideration in the AI community regarding the responsible design and regulation of these technologies.
In summary, when AI models train on their own data, they become increasingly isolated from the real world and its values, and they go mad!