Updated March 3, 2024

When AI models train on their own data, they go mad!

Picture : Generative AIs (GPT-3, Copilot, Gemini, Gopher, Chinchilla, PaLM, Human, etc.) train on large sets of data (texts, images, audio or videos) produced by humans.
However, these AIs go “mad” when they generate their own training data themselves.
Image: image generated by an AI.

What is Self-Consuming Generative Models Go Mad?

The concept of “Self-Consuming Generative Models Go Mad” refers, in the field of artificial intelligence, to the production of learning data by the AI itself.

Generative models are algorithms that learn to generate new data, by “mimicking” a set of training data produced by humans. Producing training data is expensive and time-consuming. The data must be collected, cleaned, annotated and formatted so that it can be used correctly by the AI.
Scientists have not resisted the temptation to use the synthetic data generated by the generative models themselves in order to train new models more quickly.

The central idea is to create a generative model capable of producing its own training data. This process is then iterated, with the model refining and becoming increasingly capable of generating complex, increasingly numerous and new data.

The imagined benefits are numerous. First of all, the model is not limited by the amount of initial training data. He can thus explore unknown areas and discover new concepts by chance. In addition, the model, thanks to its own self-supervised learning, could refine itself and improve its performance iteratively. For example, the model could generate novel molecular structures that are candidates for new drugs.

However, there is a huge challenge associated with this approach.

When the model goes mad!

Self-Consuming Generative Models Go Mad is a phenomenon that occurs when generative artificial intelligence models train on synthetic data produced by other generative AI models, thereby creating autophagic loops (which consume themselves). -same). When an AI tries to learn content generated by an AI, it goes mad.

The repetition of this poorly understood process creates an autophagic loop whose learning data becomes chaotic. Additionally, it is tempting to use synthetic data to augment or replace real data due to lack of available data. In other words, when the model does not have enough fresh real data each time an autophagic loop is generated, future generative models are doomed to failure.

This process of autophagy leads to a progressive decrease in the quality and a dilution of the diversity of the content generated. This is characterized by a degeneracy of the process, where the model begins to produce increasingly inconsistent and redundant outputs.

If the model is not exposed to a sufficient variety of examples, it has difficulty learning meaningful patterns and thus falls back on repetitive productions.
Likewise, if the model is encouraged to focus only on optimizing its own output, it can stray from reality and generate increasingly aberrant outputs.
Finally, the model will tend to adjust its responses too much to the training data (overfitting). He begins to memorize insignificant details and loses his ability to generalize to new examples. Moreover, it can reproduce its own biases and shortcomings endlessly.

In some theoretical scenarios, generative models could go "mad" or malfunction in an unforeseen, potentially self-destructive way. For example, a generative model could favor “novelty” and this incessant quest could push it to explore increasingly unknown territories.

The absence of regulation exposes the model to a runaway whose content could be increasingly extreme, offensive, disturbing or shocking, flirting with the unacceptable. We might no longer understand or interpret the results generated by the model.

This speculative notion highlights potential concerns associated with the use of autonomous or insufficiently controlled artificial intelligence models. While this may seem like a science fiction idea, it is an important thought in the AI community about how to design and regulate these technologies responsibly.

In summary, when AI models train on their own data, they isolate themselves more and more from the real world and its values, they go mad!

Artificial intelligence: the explosion of gigantism

When AI models train on their own data, they go mad!

Emergence of artificial intelligence: Illusion of intelligence or intelligence?

The horseshoe crab, a living fossil!

Biosignatures or presence of life in the Universe

Challenge and threat of Artificial Intelligence

Artificial intelligence and natural language

How do machines understand, interpret and generate language in a similar way to humans?

How does an artificial neural network work?

Origin of life on Earth: Panspermia theory

Origin of life on Earth: White smoker theory

Why 37 degrees Celsius?

Thermodynamics of the sandpile

Are we alone in the universe?

Trace of frozen life in Siberia

Ice cores tell us about our past

Life evolves in the shelter of glaciations

Organ regeneration, the salamander

Cosmic rays and the mutation of species

Mephisto, the little worm of the depths

Discovery of solid buckyballs in space

Bipedalism in hominids

Kamchatka giant crab

The passage between the inert and the living

From particles to biochemical life

Egocentric vision, the man at the center

Megapod uses volcanic heat

Ardi is 4.4 million years old

Natural selection, the birch moth

The explosion of life in the Ordovician

Liquid water, an accelerator of chemical reactions

Neandertal

Asimo the future humanoid

Conditions for the appearance of life

Fermi's paradox or Plato's cave

The Tardigrade, the immortal animal

Toumaï, 7 million years old

Border between inanimate and living

The incredible life of the abyss

Cyanobacteria create toxic gas

The short history of the evolution of life

The smallest frog in the world

The explanation of the Little Ice Age

Ashen light, the proofs of life

Bioluminescence of living organisms

Beyond our senses, the great scientific revolutions

The primitive soup

1997 © Astronoo.com − Astronomy, Astrophysics, Evolution and Ecology.
"The data available on this site may be used provided that the source is duly acknowledged."

Contact − Legal mentions − Sitemap − How Google uses data