# #cosyne14 day 2: what underlies our neural representation of the world?

Now that I’ve been armed with a tiny notepad, I’m being a bit more successful at remembering what I’ve seen. For other days (as they appear): 1, 3, 4

Connectivity and computations

The second day started with a talk by Thomas Mrsic-Flogel motivated by the question of, how does the organization of the cortex give rise to computations? He focused on connectivity between excitatory neurons in layer 2/3 of V1 in mice. Traditionally when we think of these neurons, we think of how they respond to visual stimulation: what patterns of light activity, what shapes or edges are they responding to? This ‘receptive field’ has a characteristic shape and (tends to) respond to certain orientations of edges [see left]. They receive input from more primary visual neurons, but you still want to know: what type of input do they receive from other neurons in the same layer?

By imaging the neurons during behavior and then making posthumous brain slices, they are able to match the direct connectivity with the visual responses. It turns out that the neurons they connect to are most likely to be neurons that respond in a similar way. Yet despite our fetish for connectivityomics, it is not the fact of connections that matter but the strength of those connections. And if you look at the excitatory neurons that are providing input to another postsynaptic neuron, the weighted sum of their response is exactly what the postsynaptic neuron responds to!

So theoretically, if you cut off all the external input to that neuron, it would still respond to the same visual input as before. In fact, if you only use the strongest 12% of the connections, that’s enough to maintain the visual representation. Of course, L2/3 neurons do receive external input so this mechanism is probably for denoising?

Everything’s non-linear Hebb

A long-standing question with a thousand answers is why primary visual neurons respond in the manner that they do (first image above). There have been several theories (most notably from Olshausen (1996) and Bell & Sejnowski (1996)) dealing with sparsity of responses or the fact that these are the optimal independent components of natural images. But the strange fact is that almost everything you do gives these same receptive fields! Why is that? Carlos Stein Naves de Brito (whew) dug into it and found that the commonality to all these algorithms is that they are essentially implementing a non-linear Hebbian learning rule ($\delta w \prop x f(wx)$). One result from ICA is that it doesn’t matter which nonlinearity $f$ you use, because if it doesn’t work you can just use $-f$ and it will… so this is a very nice result. The paper will be well worth reading.

Maximum entropy, minimal assumptions

Elad Schneidman gave his normal talk about using maximum entropy models to understand the neural code. Briefly, there is a class of statistical relationships between observed data that uses as few assumptions about the organization of that data as possible (see also: Ising models). If you use a model that only looks at first-order correlations, ie correlations between pairs of neurons, that’s enough to describe how populations of neurons will respond to white noise.

But it turns out that it’s not enough to describe their response natural stimuli! The correlations induced by these stimuli must trigger fundamentally different computations than the white noise. The model that does work is something they call the reliable interaction model (RIM). It uses few parameters and fits using only the most common patterns (instead of trying to find all orders of correlation, ie correlations between triplets of neurons etc). This fits extremely well which suggests that a high-order interaction network underlies a highly structured neural code.

If you then examine the population responses to stimuli, you’ll find that the brain responds to the same stimulus with different population responses. They’re using this to construct a ‘thesaurus’ of words, in which they find high structure when using the Jensen-Shannon divergence D(p(s|r1),p(s|r2)). What I think they are missing (and are going to miss with their analyses) is a focus on the dynamics. What is the context that in which each synonymous word arises? Why are there so many synonymous words? etc. But it promises to be pretty interesting.

Why so many neurons?

When we measure the response of a population of neurons to something that stimulates them, it often seems like the dimensionality of the stimulus (velocity, orientation, etc) is much, much lower than the number of neurons being used to represent it. So why do we need so many neurons? Is it not more efficient to just use one neuron per dimension?

I didn’t entirely follow the logic of Peiran Gao’s talk (I got distracted by twitter…) but they relate it to the complexity of the task and say that random projection theory predicts how many neurons are needed, which is much  more than the dimensionality of the task.

References

Ganmor E, Segev R, & Schneidman E (2011). Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proceedings of the National Academy of Sciences of the United States of America, 108 (23), 9679-84 PMID: 21602497