The term "stochastic parrot" was popularized by researchers Emily Bender (1973-) and Timnit Gebru (1982-) to emphasize that LLMs (Large Language Models) do not "understand" the language they use. They reproduce patterns learned from massive data, without consciousness or intentionality. When a parrot mimics a word like "Hello," it does not intend to greet someone. It is an automatic behavior, triggered by external stimuli (e.g., the presence of humans) or internal stimuli (e.g., boredom).
Regarding AI, the term "stochastic" refers to their probabilistic operation: each word is generated based on statistical calculations, like a parrot randomly repeating learned phrases without grasping their meaning.
A dangerous feedback loop has been established on the Internet: the more a content attracts views, the more it generates revenue or notoriety, and the more it encourages the production of even more shallow content. Humans, smarter than machines, quickly understood this dynamic. They realized they could use AI to automate the production of online content (texts, images, videos).
Each new AI-generated content attracts some attention, encouraging the production of even more content. Over time, the total amount of content increases exponentially, but the average quality collapses. Feeds are saturated with artificial content, real information gets lost in the noise, and public trust diminishes.
The neologism "enshittification" is a critical and sociological term used to describe the progressive degradation of online content quality, in favor of quantity, visibility, and immediate profit. Enshittification refers to the massive filling of the web with mediocre, shallow, or even misleading content, produced automatically or opportunistically to generate traffic, clicks, or advertising revenue.
The more mediocre content generates clicks, the more it is demanded by advertisers, encouraging users to produce even more. The system self-feeds until saturation or the collapse of global trust. When everyone speaks at the same time to be seen, no one listens, and knowledge ends up dissolving in the digital noise.
When the bubble bursts, it will not just be an economic or media crash, but a collapse of global informational value, with profound consequences for platforms, creators, and the public. Result: Everyone will lose.
The Internet is gradually turning into a vast information dump where noise overpowers the signal. Quality and truthfulness fade away in the face of mass production and competition for audience.
Digital platforms (Amazon, Spotify, YouTube, TikTok, etc.) systematically promote AI-generated "engaging" content for its massive audience potential and advertising revenue. However, this strategy reveals a destructive paradox: the more algorithms favor these productions, the more they devalue the entire ecosystem, creating an informational bubble where quantity stifles quality. The more AI content there is, the less each piece is worth.
The race for ever more powerful models requires colossal resources, which have become inaccessible to most players. Investors risk finding that returns on investment are not forthcoming.
Saturation has already been reached. Google is the most visible example: its search results, once ranked by relevance and source reliability, are now saturated with SEO-optimized but valueless content. Pages automatically produced by online content farms or language models flood search engine indexes, making the search for reliable information increasingly laborious. Search engine algorithms struggle to sort the relevant from the irrelevant and unwittingly amplify this digital noise. Relevance is replaced by virality, and knowledge dissolves into a mass of artificially hollow content. Search engines, once symbols of access to knowledge, are becoming a digital dump, where users can no longer distinguish signal from noise.
Without raw material (human works), AIs will no longer be able to improve. We cannot continue to feed AIs with their own food digested a thousand times.
Read the article on: What is Self-Consuming Generative Models Go Mad?
| Content Type | Example | Platform / Mechanism | Problem |
|---|---|---|---|
| Automated Books | - Novels or guides generated by AI, filled with repetitions or trivial information - Repackaged counterfeits of existing books | Amazon Kindle Direct Publishing, Lulu, Apple Books, Kobo Writing Life, Google Play Books, JD.com, Dangdang, WeChat Reading | Huge volume of hollow, absurd, or unreadable publications, automatically produced by software, without human review or author. |
| Blog Articles or News | - Naive articles automatically generated to optimize audience | Google Search / Adsense, Facebook Instant Articles, Apple News, LinkedIn, Medium, WeChat Official Accounts, Toutiao, Baidu Baijia, Weibo | Degradation of information quality. Proliferation of shallow content to capture traffic. Half of AI-generated news sites contain false information. |
| Images | - Stereotypical illustrations on social media to solicit clicks | ArtStation, Shutterstock, Canva, Getty Images, Adobe Stock, Weibo, Xiaohongshu, Douyin, Baidu Tieba | Saturation of image banks without original value. Decline in human creativity. Deepfakes are undetectable to 70% of internet users. |
| Videos | - Short clickbait clips - Automatically generated animations or deepfakes | YouTube, TikTok, Instagram, Facebook Reels, Douyin, Kuaishou, Bilibili, WeChat Channels | Empty content designed to attract attention. Exaggerated or misleading clickbait. Increased exposure to misinformation. Monetization via ads. |
| Music | - Full playlists of AI-generated tracks - Catalogs filled with synthetic creations | Spotify, SoundCloud, Apple Music, YouTube Music, QQ Music, NetEase Cloud Music, Kugou, Kuwo | Payment for works without real creativity. Saturation of the music market. Virtual artists generate millions of streams. |
| Viral Hook Content | - Massively generated humorous images and texts - Maximizing likes with fake accounts | Facebook, Instagram, Reddit, X (Twitter), WeChat Moments, Douyin, Weibo, Xiaohongshu | Mass production to capture attention. Dilution of original content. Easily copyable and adaptable by other users. |
| Product Design and 3D Models | - Mass-generated designs for objects, furniture, jewelry | Etsy, Thingiverse, Cults3D, MyMiniFactory, Taobao, Tmall, JD.com, 1688.com | Market flooding with non-functional designs. Devaluation of designers' work. |
| Tutorials | - Attractive but erroneous tutorials generated automatically | Stack Overflow, Quora, Reddit, YouTube, Zhihu, Baidu Zhidao, Bilibili, CSDN | Pollution of knowledge bases. Large-scale propagation of errors. Loss of trust in information sources. |
| Applications and Code | - Basic applications - Copied-pasted vulnerable scripts - Dubious open-source packages | GitHub, GitLab, App Store, Google Play Store, Gitee, Coding.net, Chinese app stores (Huawei, Xiaomi, Tencent) | Increased security risks. Proliferation of unoptimized or malicious code. |
In research, several signs indicate a progressive contamination of the scientific corpus by synthetic or automated content, which can alter the reliability of sources and the reference chain.
Article factories produce thousands of pseudo-scientific publications generated or rephrased by AI, sometimes even accepted in journals. Their content is often without real experimentation but optimized to appear "scientific" (superficial formalism, fabricated citations, vague methodologies).
Rushed or insufficiently trained researchers reuse AI-generated formulations (introductions, summaries, literature reviews) and insert them into their work. This introduces subtle semantic errors and undetected approximations, which sometimes pass the review filter and then propagate in the literature.
Platforms like Google Scholar, ResearchGate, or Semantic Scholar now index automatically generated papers. These texts pollute recommendation algorithms and academic search engines, distorting relevance metrics and increasing the risk of unfounded citations.
When an AI reformulates an excerpt from multiple papers without properly citing the authors, the reference chain is broken. The reader believes they are reading a reliable synthesis, when in fact it is a mix of undifferentiated sources, which harms scientific transparency and complicates fact-checking.