Generating robust and general real-world behavior by exploiting regularities at multiple levels of abstraction
Selecting the most appropriate action to achieve a task

Photo by Jordi Fernandez on Unsplash
Intelligent behavior must consider information about the world. Based on this information, an intelligent agent must select the most appropriate action to achieve a task. We hypothesize that agents perform action selection by considering information at different levels of abstraction. We argue that this, in fact, is a principle of intelligence. Abstractions capture regularities in the relationship between perception and suitable action. These regularities are either present in the physics of the world or can be imposed via feedback control. Both types of regularities restrict the evolution of the system state. This facilitates action selection because only actions consistent with these restrictions need to be considered. Regularities exist at different levels of abstraction, with each level contributing different limitations to the evolution of the system state. We hypothesize that the exploitation of regulatities on different levels of abstraction is a principle of intelligence.
To gather support for this hypothesis, we will apply the principle to two example scenarios from the example behaviors in RU1, namely “Escaping from an escape room” and “Collective shepherding”. We will identify different levels of abstractions, investigate how levels of abstractions depend on specific tasks, and which regularities such abstractions exploit. We aim at understanding how abstraction levels interact, or, more precisely, how different actions generated on the basis of regularities at different levels of abstraction lead to an overall behavior that solves the given tasks. By understanding these relationships, we will make them usable for the systematic solution of multi-level decision and control problems for intelligent behavior. This will enable us to deliver a sound methodological underpinning for generating intelligent behavior based on multi-layer control.
Project Results
First Results
The team’s main scientific findings indicate that an agent’s ability to solve complex sensorimotor problems by leveraging regularities is fundamentally tied to how it represents its experiences. In this context, regularities are defined as reproducible and predictable relationships between an agent’s embodiment, environmental features, and the behavioral consequences of their actions. A good representation is the transformation of high-dimensional observational data into a lower-dimensional and semantically meaningful latent space that captures the underlying factors of data-generating processes. Experiments demonstrated that the choice of representation is critical: the same physical regularity became easily learnable or obscure depending on the representation used, significantly impacting policy learning efficiency. This led to the insight that the challenge of discovering regularities is intrinsically linked to the problem of finding appropriate representations.
The team’s current research explores disentanglement as a promising approach for learning these representations, aiming to transform high-dimensional sensory data into lower-dimensional, meaningful factors. For embodied systems, disentanglement must capture the dynamic interplay between an agent’s actions and its sensory inputs. However, their work, particularly with the RBO Hand 3 and synthetic datasets, highlighted significant complexities:
- Embodied disentanglement is distinct: A robot’s actions actively influence its sensory inputs, and factors related to self-motion and external interactions are often deeply intertwined, not easily separated by simple models.
- Generalization is not guaranteed: Models appearing to disentangle known factors may not generalize well to unseen combinations, as real-world factors often interact.
- Metrics can be misleading: Common disentanglement metrics don’t always correlate with downstream task performance or utility.
- Superficial learning is a risk: Networks can learn to “disentangle” features based on arbitrary or even incorrect supervisory signals, without truly capturing underlying physical factors.
These findings indicate that while disentanglement is a powerful concept, it’s not the solution. The focus has evolved towards understanding that regularities are often embedded within the structure of lower-dimensional task-execution manifolds, which capture the constraints governing successful action. “Computational sensing” is emerging as a key process to discover and operate on these manifolds, integrating raw sensor data with predictions, task knowledge, and an understanding of the robot’s morphology. Disentanglement is now viewed as a valuable tool that can aid in understanding the structure within and around these critical manifolds, rather than an end goal in itself. Additionally, they are exploring the ability to switch between different regularities or controllers, recognizing this as an important step in combining different regularities to address complex problems effectively.