Neuroscience and Computer Vision Collaborate to Better Understand Visual Information Processing

A new dataset of unprecedented size, comprises brain scans of four volunteers will aid researchers to understand better how the brain processes images. Each volunteer have viewed 5,000 images according to neuroscience and computer vision scientists.

In their study published in the journal Science Data, researchers at Carnegie Mellon University and Fordham University said that acquiring functional magnetic resonance imaging (MRI) scans at this scale presented unique challenges.

The scientists took the volunteers through the participation of 20 or more hours of MRI scanning, challenging both their perseverance and the experimenters' ability to coordinate across scanning sessions. The comprehensive design decision to run the same people over so many sessions was necessary for disentangling the neural responses connected with individual images.

The team dubbed the resulting dataset BOLD5000, and it allows cognitive neuroscientists to leverage better the deep learning models that have dramatically improved artificial vision systems. The inspiration behind the deep learning was from the architecture of the human visual system and by pursuing new insights into how human vision works. Also, by having studies of how human vision better reflect modern computer vision methods, scientists may further improve it. Consequently, BOLD5000 measured neural activity arising from viewing images taken from two accessible computer vision datasets, ImageNet and COCO.

Michael J. Tarr, the co-author of the study, said that the intertwining of brain science and computer science means that scientific discoveries can flow in both directions. Further studies of vision that employ the BOLD5000 dataset should help neuroscientists better understand the organization of knowledge in the human brain.

An essential part of the BOLD5000 project from its onset was to improve computer vision. The senior author, Elissa Aminoff, initiated this research direction with co-author Abhinav Gupta, an associate professor in the Robotics Institute.

Some of the challenges the researchers faced while connecting biological and computer vision is that the majority of human neuroimaging studies include few stimulus images, offer 100 or less, which typically are simplified to depict only single objects against a neutral background. In contrast, BOLD5000 includes more than 5,000 real-world, complex images of scenes, separate objects and interacting objects.

Tarr explained that the BOLD5000 dataset is still way too small. He suggested that a reasonable fMRI dataset would require at least 50,000 stimulus images and many more volunteers to make headway because the class of deep neural nets used to analyze visual imagery are trained on millions of images. So far, the field's response has been positive.