Yeah, but what has ML ever done for neuroscience?

This question has been going round the neurotwitters over the past day or so.

Let’s limit ourselves to ideas that came from machine learning that have had an influence on neural implementation in the brain. Physics doesn’t count!

  • Reinforcement learning is always my go-to though we have to remember the initial connection from neuroscience! In Sutton and Barto 1990, they explicitly note that “The TD model was originally developed as a neuron like unit for use in adaptive networks”. There is also the obvious connection the the Rescorla-Wagner model of Pavlovian conditioning. But the work to show dopamine as prediction error is too strong to ignore.
  • ICA is another great example. Tony Bell was specifically thinking about how neurons represent the world when he developed the Infomax-based ICA algorithm (according to a story from Terry Sejnowski). This obviously is the canonical example of V1 receptive field construction
    • Conversely, I personally would not count sparse coding. Although developed as another way of thinking about V1 receptive fields, it was not – to my knowledge – an outgrowth of an idea from ML.
  • Something about Deep Learning for hierarchical sensory representations, though I am not yet clear on what the principal is that we have learned. Progressive decorrelation through hierarchical representations has long been the canonical view of sensory and systems neuroscience. Just see the preceding paragraph! But can we say something has flowed back from ML/DL? From Yemins and DiCarlo (and others), can we say that maximizing the output layer is sufficient to get similar decorrelation as the nervous system?

And yet… what else? Bayes goes back to Helmholtz, in a way, and at least precedes “machine learning” as a field. Are there examples of the brain implementing…. an HMM? t-SNE? SVMs? Discriminant analysis (okay, maybe this is another example)?

My money is on ideas from Deep Learning filtering back into neuroscience – dropout and LSTMs and so on – but I am not convinced they have made a major impact yet.

Deep learning and vision

Object recognition is hard. Famously, an attempt to use computers to automatically identify tanks in photos in the 1980s failed in a clever way:

But the scientists were worried: had it actually found a way to recognize if there was a tank in the photo, or had it merely memorized which photos had tanks and which did not? This is a big problem with neural networks, after they have been trained you have no idea how they arrive at their answers, they just do. The question was did it understand the concept of tanks vs. no tanks, or had it merely memorized the answers? So the scientists took out the photos they had been keeping in the vault and fed them through the computer. The computer had never seen these photos before — this would be the big test. To their immense relief the neural net correctly identified each photo as either having a tank or not having one…

Eventually someone noticed that in the original set of 200 photos, all the images with tanks had been taken on a cloudy day while all the images without tanks had been taken on a sunny day. The neural network had been asked to separate the two groups of photos and it had chosen the most obvious way to do it – not by looking for a camouflaged tank hiding behind a tree, but merely by looking at the colour of the sky. The military was now the proud owner of a multi-million dollar mainframe computer that could tell you if it was sunny or not.

But Deep Learning – and huge data sets – have propelled a huge breakthrough over the last few years:

Today, Olga Russakovsky at Stanford University in California and a few pals review the history of this competition and say that in retrospect, SuperVision’s comprehensive victory was a turning point for machine vision. Since then, they say, machine vision has improved at such a rapid pace that today it rivals human accuracy for the first time. [NE: I don’t think this is quite true…]

Convolutional neural networks consist of several layers of small neuron collections that each look at small portions of an image. The results from all the collections in a layer are made to overlap to create a representation of the entire image. The layer below then repeats this process on the new image representation, allowing the system to learn about the makeup of the image.

An interesting question is how the top algorithms compare with humans when it comes to object recognition. Russakovsky and co have compared humans against machines and their conclusion seems inevitable. “Our results indicate that a trained human annotator is capable of outperforming the best model (GoogLeNet) by approximately 1.7%,” they say…But the trend is clear. “It is clear that humans will soon outperform state-of-the-art image classification models only by use of significant effort, expertise, and time,” say Russakovsky and co.

The Connectionists

Labrigger (via Carson Chow) pointed to a sprawling debate on the Connectionist mailing list concerning Big Data and theory in neuroscience. See the list here (“Brain-like computing fanfare and big data fanfare”). There seem to be three main threads of debate:

(1) Is Big Data what we need to understand the brain?

(2) What is the correct level of detail?

(3) Can we fruitfully do neuroscience in the absence of models? Do we have clearly-posed problems?

Here are some key comments you can read.

There are few points that need to be made. First, one of the ongoing criticisms through the thread concerns the utility of Deep Learning models. It is repeatedly asserted that one beauty of the brain is that it doesn’t necessarily needs gobs of data to be able to perform many important behaviors. This is actually not true in the slightest: the data has been collected through many generations of evolution. In fact, Deep Learning ‘assembles’ its network through successive training of layers in a manner vaguely reminiscent of the development of the nervous system.

In terms of the correct level of detail, James Bower is ardent in promoting the idea that we need to go down to the nitty-gritty. In cerebellum, for instance, you need to understand the composition of ion channels on the dendrites to understand the function of the cells. Otherwise, you miss the compartmentalized computations being performed there. And someone else points out that, in fact, from another view this is not even reduced enough; why aren’t they considering transcription? James Bower responds with:

One of the straw men raised when talking about realistic models is always: “at what level do you stop, quantum mechanics?”. The answer is really quite simple, you need to model biological systems at the level that evolution (selection) operates and not lower. In some sense, what all biological investigation is about, is how evolution has patterned matter. Therefore, if evolution doesn’t manipulate at a particular level, it is not necessary to understand how the machine works.

…although genes are clearly a level that selection operates on…

But I think the underlying questions here really are:

(1) What level of detail do we need to understand in order to predict behavior of [neurons/networks/organisms]?

(2) Do we understand enough of the nervous system – or general organismal biology – to make theoretical predictions that we can test experimentally?

I think Geoff Hinton’s comment is a good answer:

A lot of the discussion is about telling other people what they should NOT be doing. I think people should just get on and do whatever they think might work. Obviously they will focus on approaches that make use of their particular skills. We won’t know until afterwards which approaches led to major progress and which were dead ends. Maybe a fruitful approach is to model every connection in a piece of retina in order to distinguish between detailed theories of how cells get to be direction selective. Maybe its building huge and very artificial neural nets that are much better than other approaches at some difficult task. Probably its both of these and many others too. The way to really slow down the expected rate of progress in understanding how the brain works is to insist that there is one right approach and nearly all the money should go to that approach.

Learning to see through semantics

Humans have a visual bias: everything in vision seems easy and natural to us, and it can seem a bit of a mystery why computers are so bad at it. But there is a reason such a massive chunk (about 30%) of cortex is devoted to it. It’s really hard! To do everything that it needs to, the brain splits up the stream of visual information into a few different streams. One of these streams, which goes down the ventral (purple, above) portion of the brain, is linked to object recognition and representing abstract forms.

For companies like Facebook or Google, copying this would be something of a holy grail. Think how much better image search would be if you could properly pull out what objects are in the image. As it is, though, these things are fairly hard.

Jon Shlens recently visited from Google and gave a talk about their recent research on improving the search (which I see will be presented as a poster at NIPS this week). In order to extract abstract form, they decided, they must find a way to abstract the concept of each image. There is one really obvious way to do this: use words. Semantic space is rich and very easily trainable (and something Google has ample practice at).

Shlens filters

First, they want a way to do things very quickly. One way to get at the structure of an image is to use different ‘filters’ that represent underlying properties of the image. When moved across an image, the combination of these filters can reconstruct the image and identify what are the important underlying components. Unfortunately, these comparisons go relatively slowly over many, many dot products. Instead, they just choose a few points on the filters to compare (left) which improves performance without a loss of sensitivity.

Once they can do that quickly, they train a deep-learning artificial neural network (ANN) on the images to try to classify them. This does okay. The fancy-pants part is where they also train an ANN on words in Wikipedia. This gives them relationships between all sorts of words and puts the words in an underlying continuous space. Now words have a ‘distance’ between them that tells how similar they are.

ANN guess

By combining the word data with the visual data, they get a ~83% improvement in performance. More importantly, even when the system is wrong it is only kind of wrong. Look at the sample above: on the left are the guesses of the combined semantic-visual engine and on the right is the vision-only guesser. With vision-only, guesses vary widely for the same object: a punching bag, a whistle, a bassoon, and a letter opener may all be long straight objects but they’re not exactly in the same class of things. On the other hand, an English horn, an oboe and a bassoon are pretty similar (good guesses); even a hand is similar in that it is used for an instrument. Clearly the semantic-visual engine can understand the class of object it is looking at even if it can’t get the precise word 100% of the time. This engine does very well on unseen data and scales very well across many labels.

This all makes me wonder: what other sensory modalities could they add? It’s Google, so potentially they could be crawling data from a ‘link-space’ representation. In animals we could add auditory and mechanosensory (touch) input. And does this mean that the study of vision is missing something? Could animals have a sort of ‘semantic’ representation of the world in order to better understand visual or other sensory information? Perhaps multimodal integration is actually the key to understanding our senses.

References

Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, & Mikolov T (2013). DeViSE: A Deep Visual-Semantic Embedding Model NIPS

Dean T, Ruzon MA, Segal M, Shlens J, Vijayanarasimhan S, & Yagnik J (2013). Fast, Accurate Detection of 100,000 Object Classes on a Single Machine Proceedings of IEEE Conference on Computer Vision and Pattern Recognition DOI: 10.1109/CVPR.2013.237