SCIoI @ECCV 2024 – Advancing robotic vision with event cameras and intelligent motion tracking

At the European Conference on Computer Vision (ECCV) 2024, held this year in Milan from 29 September to 4 October, thousands of scientists, engineers, and industry leaders came together to explore cutting-edge developments that are potentially shaping the future of artificial intelligence. Recognized as one of the top global conferences in computer vision, ECCV serves as a major platform for presenting advancements in image recognition, autonomous driving, and human-computer interaction, with workshops and presentations diving deep into machine learning techniques and real-world applications. For the SCIoI researchers Guillermo Gallego, Friedhelm Hamann with their team members Suman Ghosh, Shuang Guo and Hanxiong Li, the conference offered a stage to present their latest research through a variety of papers and workshops, fostering dialogues that may well influence the direction of computer vision for years to come.

At the main conference, the SCIoI members presented three papers and a live demo addressing different topics of their research with event cameras. With a time resolution of microseconds, six thousand times the contrast of conventional cameras, and potentially much smaller file sizes for the recorded data, event cameras promise a revolution in photography. Unlike conventional cameras, they do not record entire images at once, but instead register only the changes in brightness (the “events”) for each light-sensitive pixel separately. Because these devices mimic the visual systems of animals, they are often called “silicon retinas” and find application in the field of robotics. At SCIoI, the researchers use these cameras to evaluate images and possibly create new eyes for robots, allowing them to navigate their environment with accuracy.

ECCV focuses on fundamental research areas like image and video analysis, 3D reconstruction, object detection, and recognition, as well as deep learning, neural network architectures, and AI applications in vision technology. With their presentations at the conference, Guillermo, Friedhelm and the team contributed significantly to setting a future direction for computer vision advancements.

“Motion-prior contrast maximization for dense continuous-time motion estimation (Friedhelm Hamann et al.),”

Paper

This paper explores a smart new way to track motion that feels almost like giving “eyes” and “intuition” to event cameras.  Though these cameras can handle tricky lighting or fast movement with ease, typical video-based methods aren’t quite suited to them.

To bridge this gap, the researchers introduce an approach that combines a “motion-prior” (basically, a guess at how things are likely to move) with self-taught motion tracking. This setup enables the system to learn on its own and adapt to real-world scenarios, even when tracking complex, curvy, or unpredictable motion paths without the need for vast labeled datasets. It’s like giving the system a natural instinct for following motion!

The results from real-world tests motion-tracking challenges speak for themselves: The method didn’t just match what’s possible with standard tools; it actually outperformed them, improving tracking accuracy. The system could quickly learn from synthetic data (artificial data designed for training) and then apply its skills to real-world situations, making it versatile and reliable.

What’s really exciting is that this approach can track motion in complex environments—think of navigating a busy road with lots of moving parts—without needing loads of labeled data to “teach” it each time. It’s fast, smart, and flexible, setting a new standard for how event cameras can be used in robotics, self-driving cars, and any tech that needs to see and react in real time.

“Motion and structure from event-based normal flow (Zhongyang Ren, Guillermo Gallego et al.),”

Paper

This paper tackles a challenging problem in 3D vision: figuring out how a camera moves and understanding the depth of objects in the scene, using only data from “event cameras.” However, because they don’t capture a continuous sequence but only changes in brightness at each pixel, it’s tough to determine exact object positions or even which parts of the image represent the same moving object over time.

The authors introduce a new way to calculate both the camera’s movement and the 3D structure of the scene by using something called “event-based normal flow.” This process allows them to make accurate motion and depth estimates by focusing on the instant changes detected by the event camera. They developed two solvers (you might even say “helpers”): a fast, linear one to get quick solutions for simpler tasks and a continuous-time, nonlinear solver that works in real-time, even if the camera is moving unpredictably. In other words, the linear helper is great for quick, simple answers, like keeping track of a car driving in a straight line. But when things get complicated—like a camera on a drone flying around tight corners—the nonlinear helper steps in. It can adjust to sudden twists and turns, staying focused on the exact path the camera or object is taking.

Their approach can initialize and stabilize more complex motion estimation methods, proving useful for robotics, drones, and autonomous navigation that demand reliable, agile vision in dynamic settings.

“Event-based mosaicing bundle adjustment (Shuang Guo and Guillermo Gallego),”

Paper

This paper introduces a new method to enhance the speed and clarity of event camera mapping and orientation accuracy, focusing on making processing faster and results clearer. The technique uses a mathematical optimization called “bundle adjustment,” traditionally used for aligning frames in standard cameras, but adapted here specifically for event camera data. By directly working with the raw data from the camera’s unique event stream, this approach efficiently processes only the most relevant information, saving time and computational resources.

The key innovation is a custom structure that identifies and processes “blocks” of data, avoiding redundant calculations and speeding up the entire mapping and alignment process. This method, tested on both synthetic and real-world scenarios, showed a 50% improvement in error reduction. The outcome is more refined, high-resolution panoramas, even in complex scenes, without needing extensive pre-processing.

Applications range from high-speed panoramas on mobile devices to mapping in challenging, low-light, or high-motion environments—such as smartphones or drone navigation—where clarity and speed are crucial. This method represents a leap forward in using event cameras effectively for tasks that demand both detail and efficiency.

Workshop on neuromorphic vision

During the workshop on neuromorphic vision, the SCIoI members contributed many more of their research perspectives with eight papers, including “MouseSIS: A frames-and-events dataset for space-time instance segmentation of mice”  where, true to the SCIoI spirit, interdisciplinarity is key and researchers from the analytic side (studying intelligent – in this case – mouse behavior and researchers from the synthetic side – event camera data of this behavior – work together on principles of intelligence.

At ECCV 2024, the SCIoI team showed how event cameras can open up practical new ways for machines to track movement and respond to the world around them. Their research offers real tools for clearer, faster, and more reliable robotic vision in everyday situations. With these insights, SCIoI’s work is helping make vision technology smarter, simpler, and more adaptable to real-life needs, setting the stage for more responsive robotics in everything from self-driving cars to dynamic drones.

Research

An overview of our scientific work

See our Research Projects