Yeah, but what has ML ever done for neuroscience?

This question has been going round the neurotwitters over the past day or so.

Let’s limit ourselves to ideas that came from machine learning that have had an influence on neural implementation in the brain. Physics doesn’t count!

  • Reinforcement learning is always my go-to though we have to remember the initial connection from neuroscience! In Sutton and Barto 1990, they explicitly note that “The TD model was originally developed as a neuron like unit for use in adaptive networks”. There is also the obvious connection the the Rescorla-Wagner model of Pavlovian conditioning. But the work to show dopamine as prediction error is too strong to ignore.
  • ICA is another great example. Tony Bell was specifically thinking about how neurons represent the world when he developed the Infomax-based ICA algorithm (according to a story from Terry Sejnowski). This obviously is the canonical example of V1 receptive field construction
    • Conversely, I personally would not count sparse coding. Although developed as another way of thinking about V1 receptive fields, it was not – to my knowledge – an outgrowth of an idea from ML.
  • Something about Deep Learning for hierarchical sensory representations, though I am not yet clear on what the principal is that we have learned. Progressive decorrelation through hierarchical representations has long been the canonical view of sensory and systems neuroscience. Just see the preceding paragraph! But can we say something has flowed back from ML/DL? From Yemins and DiCarlo (and others), can we say that maximizing the output layer is sufficient to get similar decorrelation as the nervous system?

And yet… what else? Bayes goes back to Helmholtz, in a way, and at least precedes “machine learning” as a field. Are there examples of the brain implementing…. an HMM? t-SNE? SVMs? Discriminant analysis (okay, maybe this is another example)?

My money is on ideas from Deep Learning filtering back into neuroscience – dropout and LSTMs and so on – but I am not convinced they have made a major impact yet.

How a neural network can create music

Playing chess, composing classical music, __: computer programmers love creating ‘AIs’ that can do this stuff. Music, especially is always fun: there is a long history of programs that can create new songs that are so good that they fool professional musicians (who cannot tell the difference between a Chopin song and a generated song – listen to some here; here is another video).

I do not know how these have worked; I would guess a genetic algorithm, hidden markov model, or neural network of some sort. Thankfully Daniel Johnson has just created such a neural network and laid out the logic behind it in beautiful detail:

Music composing neural network

The power of this is that it enables the network to have a simple version of memory, with very minimal overhead. This opens up the possibility of variable-length input and output: we can feed in inputs one-at-a-time, and let the network combine them using the state passed from each time step.

One problem with this is that the memory is very short-term. Any value that is output in one time step becomes input in the next, but unless that same value is output again, it is lost at the next tick. To solve this, we can use a Long Short-Term Memory (LSTM) node instead of a normal node. This introduces a “memory cell” value that is passed down for multiple time steps, and which can be added to or subtracted from at each tick. (I’m not going to go into all of the details, but you can read more about LSTMs in the original paper.)…

However, there is still a problem with this network. The recurrent connections allow patterns in time, but we have no mechanism to attain nice chords: each note’s output is completely independent of every other note’s output. Here we can draw inspiration from the RNN-RBM combination above: let the first part of our network deal with time, and let the second part create the nice chords. But an RBM gives a single conditional distribution of a bunch of outputs, which is incompatible with using one network per note.

The solution I decided to go with is something I am calling a “biaxial RNN”. The idea is that we have two axes (and one pseudo-axis): there is the time axis and the note axis (and the direction-of-computation pseudo-axis). Each recurrent layer transforms inputs to outputs, and also sends recurrent connections along one of these axes. But there is no reason why they all have to send connections along the same axis!

What blows me away – and yes, I am often blown away these days – is how relatively simple all these steps are. By using logical, standard techniques for neural networks (and these are not deep), the programmer on the street can create programs that are easily able to do things that were almost unfathomable a decade ago. This is not just pattern separation, but also generation.

What has neuroscience done for machine intelligence? (Updated)

Today on the twitters, Michael Hendricks asked, “Why do AI people bother with how animal brains work? Most good inventions work by doing things totally unlike how an animal would.”

The short answer is that animal brains can already solve the problems that AI researchers want to solve; so why not look into how they are accomplishing it?

The long answer is that in the end, the algorithms that we ultimately use may end up being dramatically different – but we need a starting point somewhere. By looking at some of the algorithms that have a neural inspiration, it is clear that by thinking about ideas of how the nervous system works machine learning/AI researchers can come up with clear solutions to their problems:

  1. Neural networks. In the 1940s and 50s, McCulloch, Pitts, and Hebb all contributed to modeling how a nervous system might work. In some sense, neural nets are trapped in this 1940s view of the nervous system; but why not? At an abstract level, it’s close…ish.
  2. Deep learning. Currently the Hot Shit in machine learning, these are like “neural networks 2.0”. Some quick history: traditionally, neural networks were done one layer at a time, with strict feedforward connectivity. One form of recurrent neural network proposed by Hopfield can be used to memorize patterns, or create ‘memories’. A variant on this, proposed by (computational neuroscientist) Terry Sejnowski and Geoff Hinton is the Boltzmann machine. If you combine multiple layers of Boltzmann machines with ideas from biological development, you get Deep Learning (and you publish it in the journal Neural Computation!).
  3. Independent Component Analysis. Although this story is possibly apocryphal, one of the earliest algorithms for computing ICA was developed – by Tony Bell and Terry Sejnowski (again) – by thinking about how neurons maximize their information about the physical world.
  4. Temporal difference learning. To quote from the Scholarpedia page: “This line of research work began with the exploration of Klopf’s 1972 idea of generalized reinforcement which emphasized the importance of sequentiality in a neuronal model of learning”

Additionally, companies like Qualcomm and the Brain Corporation are attempting to use ideas from spiking neural networks to make much more energy efficient devices.

In the other direction, neuroscientists can find that the brain appears to be implementing already-known ML algorithms (see this post on Nicole Rust). Many ideas and many biological specifics will be useless – but research is the hope of finding the tiny fraction of an idea that is useful to a new problem.


Over on reddit, downtownslim offers two more examples:

Neocognitron was the foundation for the ConvNet. Fukushima came up with the model, LeCun figured out how to train it.

Support Vector Machines This last one is quite interesting, not many people outside the neural computation community know that Support Vector machines were influenced by the neural network community. They were originally called Support Vector Networks.

Learning to see through semantics

Humans have a visual bias: everything in vision seems easy and natural to us, and it can seem a bit of a mystery why computers are so bad at it. But there is a reason such a massive chunk (about 30%) of cortex is devoted to it. It’s really hard! To do everything that it needs to, the brain splits up the stream of visual information into a few different streams. One of these streams, which goes down the ventral (purple, above) portion of the brain, is linked to object recognition and representing abstract forms.

For companies like Facebook or Google, copying this would be something of a holy grail. Think how much better image search would be if you could properly pull out what objects are in the image. As it is, though, these things are fairly hard.

Jon Shlens recently visited from Google and gave a talk about their recent research on improving the search (which I see will be presented as a poster at NIPS this week). In order to extract abstract form, they decided, they must find a way to abstract the concept of each image. There is one really obvious way to do this: use words. Semantic space is rich and very easily trainable (and something Google has ample practice at).

Shlens filters

First, they want a way to do things very quickly. One way to get at the structure of an image is to use different ‘filters’ that represent underlying properties of the image. When moved across an image, the combination of these filters can reconstruct the image and identify what are the important underlying components. Unfortunately, these comparisons go relatively slowly over many, many dot products. Instead, they just choose a few points on the filters to compare (left) which improves performance without a loss of sensitivity.

Once they can do that quickly, they train a deep-learning artificial neural network (ANN) on the images to try to classify them. This does okay. The fancy-pants part is where they also train an ANN on words in Wikipedia. This gives them relationships between all sorts of words and puts the words in an underlying continuous space. Now words have a ‘distance’ between them that tells how similar they are.

ANN guess

By combining the word data with the visual data, they get a ~83% improvement in performance. More importantly, even when the system is wrong it is only kind of wrong. Look at the sample above: on the left are the guesses of the combined semantic-visual engine and on the right is the vision-only guesser. With vision-only, guesses vary widely for the same object: a punching bag, a whistle, a bassoon, and a letter opener may all be long straight objects but they’re not exactly in the same class of things. On the other hand, an English horn, an oboe and a bassoon are pretty similar (good guesses); even a hand is similar in that it is used for an instrument. Clearly the semantic-visual engine can understand the class of object it is looking at even if it can’t get the precise word 100% of the time. This engine does very well on unseen data and scales very well across many labels.

This all makes me wonder: what other sensory modalities could they add? It’s Google, so potentially they could be crawling data from a ‘link-space’ representation. In animals we could add auditory and mechanosensory (touch) input. And does this mean that the study of vision is missing something? Could animals have a sort of ‘semantic’ representation of the world in order to better understand visual or other sensory information? Perhaps multimodal integration is actually the key to understanding our senses.


Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, & Mikolov T (2013). DeViSE: A Deep Visual-Semantic Embedding Model NIPS

Dean T, Ruzon MA, Segal M, Shlens J, Vijayanarasimhan S, & Yagnik J (2013). Fast, Accurate Detection of 100,000 Object Classes on a Single Machine Proceedings of IEEE Conference on Computer Vision and Pattern Recognition DOI: 10.1109/CVPR.2013.237

Why does Gary Marcus hate computational neuroscience?

OK, this story on the BRAIN Initiative in the New Yorker is pretty weird:

To progress, we need to learn how to combine the insights of molecular biochemistry…with the study of computation and cognition… (Though some dream of eliminating psychology from the discussion altogether, no neuroscientist has ever shown that we can understand the mind without psychology and cognitive science.)

Who, exactly, has suggested eliminating psychology from the study of neuroscience? Anyone? And then there’s this misleading paragraph:

The most important goal, in my view, is buried in the middle of the list at No. 5, which seeks to link human behavior with the activity of neurons. This is more daunting than it seems: scientists have yet to even figure out how the relatively simple, three-hundred-and-two-neuron circuitry of the C. Elegans worm works, in part because there are so many possible interactions that can take place between sets of neurons. A human brain, by contrast, contains approximately eighty-six billion neurons.

As a C. elegans researcher, I have to say: it’s true there’s a lot we don’t know about worm behavior! There’s also not quite as many worm behavioralists as there are, say, human behavioralists. But there is a lot that we do know. We know full circuits for several behaviors, and with the tools that we have now that numbers going to explode over the next few years.

But then we learn that, whatever else, Gary Marcus really doesn’t like the work that computational neuroscientists have done to advance their tools and models:

Perhaps the least compelling aspect of the report is one of its justifications for why we should invest in neuroscience in the first place: “The BRAIN Initiative is likely to have practical economic benefits in the areas of artificial intelligence and ‘smart’ machines.” This seems unrealistic in the short- and perhaps even medium-term: we still know too little about the brain’s logical processes to mine them for intelligent machines. At least for now, advances in artificial intelligence tend to come from computer science (driven by its longstanding interest in practical tools for efficient information processing), and occasionally from psychology and linguistics (for their insights into the dynamics of thought and language).

Interestingly, he gives his own field, psychology and linguistics, a pass for how much more they’ve done.  So besides, obviously, the study of neural networks, let’s think about what other aspects of AI have been influenced by neuroscience. I’d count deep learning as a bit separate and clearly Google’s pretty excited about that. Algorithms for ICA, a dimensionality reduction method used in machine learning, were influenced by ideas about how the brain uses information (Tony Bell). The role of dopamine and serotonin have contributed to reinforcement learning. Those are just the first things that I can think of off the top of my head (interestingly, almost all of this sprouted out of the lab of Terry Sejnowski.) There have been strong efforts on dimensionality reduction – an important component of machine learning – from many, many labs in computational neuroscience. These all seem important to me; what, exactly, does Gary Marcus want? He doubles down on it in the last paragraph:

There are plenty of reasons to invest in basic neuroscience, even if it takes decades for the field to produce significant advances in artificial intelligence.

What’s up with that? There are even whole companies whose sole purpose is to design better algorithms based on principles from spiking networks. Based on his previous output, he seems dismissive of modern AI (such as deep learning). Artificial intelligence is no longer the symbolism we used to think it was: it’s powerful statistical techniques. We don’t live in the time of Chomskian AI anymore! It’s the era of Norvig. And the modern AI focuses on statistical principles which are highly influenced by ideas neuroscience.

What is the question about your field that you dread being asked? (Human collective behavior)

At Edge:

And with this hurricane of digital records, carried along in its wake, comes a simple question: How can we have this much data and still not understand collective human behavior?

There are several issues implicit in a question like this. To begin with, it’s not about having the data, but about the ideas and computational follow-through needed to make use of it—a distinction that seems particularly acute with massive digital records of human behavior. When you personally embed yourself in a group of people to study them, much of your data-collection there will be guided by higher-level structures: hypotheses and theoretical frameworks that suggest which observations are important. When you collect raw digital traces, on the other hand, you enter a world where you’re observing both much more and much less—you see many things that would have escaped your detection in person, but you have much less idea what the individual events mean, and have no a priori framework to guide their interpretation. How do we reconcile such radically different approaches to these questions?

In other words, this strategy of recording everything is conceptually very simple in one sense, but it relies on a complex premise: that we must be able to take the resulting datasets and define richer, higher-level structures that we can build on top of them.

What could a higher-level structure look like? Consider one more example—suppose you have a passion for studying the history of the Battle of Gettysburg, and I offer to provide you with a dataset containing the trajectory of every bullet fired during that engagement, and all the movements and words uttered by every soldier on the battlefield. What would you do with this resource? For example, if you processed the final day of the data, here are three distinct possibilities. First, maybe you would find a cluster of actions, movements, and words that corresponded closely to what we think of as Pickett’s Charge, the ill-fated Confederate assault near the close of the action. Second, maybe you would discover that Pickett’s Charge was too coarse a description of what happened—that there is a more complex but ultimately more useful way to organize what took place on the final day at Gettysburg. Or third, maybe you wouldn’t find anything interesting at all; your analysis might spin its wheels but remain mired in a swamp of data that was recorded at the wrong granularity.

We don’t have that dataset for the Battle of Gettysburg, but for public reaction to the 2012 U.S. Presidential Election, or the 2012 U.S. Christmas shopping season, we have a remarkable level of action-by-action detail. And in such settings, there is an effort underway to try defining what the consequential structures might be, and what the current datasets are missing—for even with their scale, they are missing many important things. It’s a convergence of researchers with backgrounds in computation, applied mathematics, and the social and behavioral sciences, at the start of what is by every indication a very hard problem. We see glimpses of the structures that can be found—Trending Topics on Twitter, for example, is in effect a collection of summary news events induced by computational means from the sheer volume of raw tweets—but a general attack on this question is still in its very early stages.

What is the question about your field that you dread being asked?

(In neuroscience?  Anything.)