Image description: Artificial intelligence, inspired by biological processes, increasingly simulates certain aspects of human cognition while continuing to evolve and improve in its own domains. Image source astronoo AI.
Generative models for text, images, videos, and music rely on similar mechanisms in that they share common principles in their architecture and learning processes, despite the diversity of the data types they handle.
For text-generating AI, large language models (LLMs) are undeniably advanced applications of machine learning. LLMs are pre-trained on vast amounts of text, enabling them to learn language structures, relationships between words, and contexts of use. In other words, they can predict the next word in a sentence with remarkable accuracy. These models excel in text generation, automatic translation, chatbots, and virtual assistants, but they lack the ability to reason to form ideas, make judgments, or make decisions.
Future machines will need to learn about the Physical World in a manner similar to humans and animals. By doing so, they could become more efficient and approach human-level intelligence.
Humans and animals develop an intuitive and contextual understanding of their environment through observation and experience. They gradually assimilate an understanding of objects, forces, and causal relationships. For example, a child learns that objects fall when dropped. Without understanding the effects of gravity, they can adjust their behavior accordingly.
In other words, machines will need not only to detect objects but also to understand their behavior in different situations. This means they will have to interpret sensory data in a contextual way, like an animal that knows when a noise is threatening or when food is appetizing.
Understanding the physical world allows for the acquisition of persistent memory, the ability to plan actions, achieve goals, and, in short, to reason. While AI progress is impressive, many hurdles remain before we can speak of Human Intelligence.
Current models under development, based on Inference by Optimization, are a promising approach to simulating human intelligence.
Inference is a concept that allows conclusions to be drawn based on observations. Inference plays a crucial role in decision-making, reasoning, and learning.
Optimization is about finding the best possible solution to achieve a specific goal. When optimizing, one seeks the best balance between different criteria, such as speed and accuracy.
Inference by Optimization can be observed in the cognitive development of children, even before they start speaking. For example, a baby trying to pull on a toy attached to a play gym quickly learns that pulling harder or in a different direction can make the toy move. The child optimizes their technique by observing the results of their actions. The child remembers past experiences with each toy and optimizes their choice based on what brought them the most pleasure or interest.
The human brain is often compared to an Optimization System. It uses inference by optimization to reason, constantly updating its beliefs based on new observations.
When a person makes a decision, they evaluate the different available options and seek to maximize certain criteria, such as well-being, satisfaction, or benefit. This decision-making process often involves evaluating the risks and rewards associated with each choice under uncertainty. But in many cases, the brain uses heuristics, approximate shortcuts, which are mental strategies or practical rules that allow for quick decision-making without the need for an exhaustive analysis of all options.
Thus, humans build their understanding of reality based on experiential learning and the model of the surrounding world. Individuals adjust their behaviors by integrating new knowledge based on what they have experienced. For example, when someone prepares a new recipe, they adjust the ingredient quantities based on the taste obtained from previous attempts.
Human intelligence is deeply tied to aspects such as emotion, self-awareness, perception of the world, and social interaction. The way humans and animals come to understand their environment, reason with "common sense," or plan complex actions seems natural to us, but it is still beyond the reach of artificial intelligence, as of 2024.
Current AI models do not have this understanding of the physical world, which limits their ability to predict future situations. It is essential for artificial intelligence to learn from multimodal data, among which videos will play a crucial role. However, this presents enormous challenges in terms of data and information processing, but it is a rapidly expanding field of research, with many perspectives and hopes for its future development.
Artificial General Intelligence (AGI) requires a combination of massive data, learning through interaction in the real world or simulations, and advances in architectures and algorithms. The road to AGI is still long, but by drawing inspiration from the cognitive mechanisms of the brain, it is likely that future systems will exhibit intelligence comparable to that of humans.