Abstract:
Models of vision have come far in the past 10 years. Deep neural networks can recognise objects with near-human accuracy, and predict brain activity in high-level visual regions. However, most networks require supervised training using ground-truth labels for millions of images, whereas brains must somehow learn from sensory experience alone. We have been using unsupervised deep learning, combined with computer-rendered artificial environments, as a framework to understand how brains learn rich scene representations without ground-truth information about the world. I will show how an unsupervised deep neural network trained on an artificial environment of surfaces that have different shapes, materials and lighting, spontaneously comes to encode those factors in its internal representations. Most strikingly, the model makes patterns of errors in its perception of material that follow, on an image-by-image basis, the patterns of errors made by human observers. Unsupervised deep learning may provide a coherent framework for how many perceptual dimensions form, in mid-level vision and beyond.
Hosted by Martin Rolfs
The Zoom Link will be sent the day before the lecture.