How Deep Mind learns to win

About a year ago, DeepMind was bought for half a billion dollars by Google for creating software that could learn to beat video games. Over the past year, DeepMind has detailed how they did it.

deepmindvsevolution

Let us say that you were an artificial intelligence that had access to a computer screen, a way to play the game (an imaginary video game controller, say), and its current score. How should it learn to beat the game? Well, it has access to three things: the state of the screen (its input), a selection of actions, and a reward (the score). What the AI would want to do is find the best action to go along with every state.

A well-established way to do this without any explicit modeling of the environment is through Q-learning (a form of reinforcement learning). In Q-learning, every time you encounter a certain state and take an action, you have some guess of what reward you will get. But the world is a complicated, noisy place, so you won’t necessarily always get the same reward back in seemingly-identical situations. So you can just take the difference between the reward you find and what you expected, and nudge your guess a little closer.

This is all fine and dandy, though when you’re looking at a big screen you’ve got a large number of pixels – and a huge number of possible states. Some of them you may never even get to see! Every twist and contortion of two pixels is, theoretically, a completely different state. This would make it implausible to check each state, choose the action and play it again and again to get a good estimate of reward.

What we could do, if we were clever about it, is to use a neural network to learn features about the screen. Maybe sometimes this part of the screen is important as a whole and maybe other times those two parts of the screen are a real danger.

But that is difficult for the Q-learning algorithm. The DeepMind authors list three reasons: (1) correlations in sequence of observations, (2) small updates to Q significantly change the policy and the data distribution, and (3) correlations between action values and target values. It is how they tackle these problems that is the main contribution to the literature.

The strategy is to implement a Deep Convolutional Neural Network to find ‘filters’ that can more easily represent the state space. The network takes in the states – the images on the screen – processes them, and then outputs a value. In order to get around problems (1) and (3) above (the correlations in observations), they take a ‘replay’ approach. Actions that have been taken are stored into memory; when it is time to update the neural network, they grab some of the old state-action pairs out of their bag of memories and learn from that. They liken this to consolidation during sleep, where the brain replays things that had happened during the day.

Further, even though they train the network with their memories after every action, this is not the network that is playing the game. The network that is playing the game stays in stasis and only ‘updates itself’ with what it has learned after a certain stretch of time – again, like it is going to “sleep” to better learn what it had done during the day.

Here is an explanation of the algorithm in a hopefully useful form:

DeepMind

Throughout the article, the authors claim that this may point to new directions for neuroscience research. This being published in Nature, any claims to utility should be taken with a grain of salt. That being said! I am always excited to see what lessons arise when theories are forced to confront reality!

What this shows is that reinforcement learning is a good way to train a neural network in a model-free way. Given that all learning is temporal difference learning (or: TD learning is semi-Hebbian?), this is a nice result though I am not sure how original it is. It also shows that the replay way of doing it – which I believe is quite novel – is a good one. But is this something that  sleep/learning/memory researchers can learn from? Perhaps it is a stab in the direction of why it is useful (to deal with correlations).

References

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, & Hassabis D (2015). Human-level control through deep reinforcement learning. Nature, 518 (7540), 529-533 PMID: 25719670

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, & Martin Riedmiller (2013). Playing Atari with Deep Reinforcement Learning arXiv arXiv: 1312.5602v1

Is it white and gold? Is it blue and black?

B-0MqaKUUAIz_S_

By now, I am sure that you have seen this picture. Some people see it as blue and black and some people see it as white and gold. Two people can be sitting right next to each other and see totally different things! It happened to me last night.

Wired attempts to explain it:

“What’s happening here is your visual system is looking at this thing, and you’re trying to discount the chromatic bias of the daylight axis,” says Bevil Conway, a neuroscientist who studies color and vision at Wellesley College. “So people either discount the blue side, in which case they end up seeing white and gold, or discount the gold side, in which case they end up with blue and black.” (Conway sees blue and orange, somehow.) [ed. that makes her the devil.]

Essentially, it is an issue of color constancy: that the color we perceive is due to its context. Brightening and darkening the image supports that:

See also XKCD:

But that explains one, trivial why – why one ‘color’ can look different depending on context. What it does not explain is why some people see it as white and gold and others see it totally the opposite. Why is there this individual level variation?

It seems to exist right on some threshold: some people have an in-built or learned bias to favor – well, something. Light images? Dark images? Overhead light? And others have a different bias. If it was light or dark, presumably you could lock five people in a closet and when they came out they would see it one way (maybe blue and black). Push five others out in the sun and they’d see it differently (white and gold). But I haven’t seen a good explanation of this nor why it is so bimodal. I would bet someone money there will be a scientific paper on this illusion published within the next year or two.

In conclusion, it’s white and gold because that’s all I can see. Case closed.

Posted in Art

Microcircuits are SO HOT right now

gscholar microcircuits

So hot.

We use the tools that we have, and right now that means genetic specificity with calcium imaging and channelrhodopsin. In other words: how do groups of identified neurons operate? In even fewer words: microcircuits.

I am probably reading too much into things, but it seems like microcircuits are the new hotness. Every week, there’s a new paper using the word (soon to solidify its buzzword status). I looked up the publications that used the term and found something interesting. Compare the number of publications I found through google scholar* (above) – which indexes a very broad and interdisciplinary mix of journals – and pubmed (below) – which indexes mostly biomedical journals:

pubmed microcircuits

The number of publications in google scholar is fairly steady until 1999 when it starts steadily increasing. There’s very little action in pubmed until 2002 when it starts rocketing off. What’s happening? Many of the papers on google scholar have a computational or physics-y bent, appearing in such exciting places as the ‘ International Journal of Man-Machine Studies’. For years, these poor computational fools labored away not to be noticed until the experimental tools caught up to the theory: hence the sudden interest.

The very first reference that I can find is A reinterpretation of the mathematical biophysics of the central nervous system in the light of neurophysiological findings by N. Rashevksy in 1945. Yes, that Rashevsky. Unfortunately, I can’t seem to get access to the paper itself. Same goes for the next paper in 1957 from D. Stanley-Jones (Reverberating circuits). And then it took off from there…

In a different tradition, we can trace this fine term to Eric Kandel who appears to have coined its neuroscience term in this 1978 paper where they “reduced this isolated reflex to a microcircuit (consisting of a single sensory cell and single motor cell) so as to causally relate the contribution of individual cells to the expression and plastic properties of the behavior.” Nary a peep was heard from microcircuitry until 1991 when Douglas and Martin mapped a “function microcircuit for visual cortex”.

(What is the equivalent of Mainen and Sejnowski in microcircuits? Someone has to write that crisp paper so that they, too, can get ALL the citations.)

* technically, I searched for ‘microcircuits neural’

The Chronicle vs. The Human Brain Project

In case you haven’t seen this hilariously vicious anti-Human Brain Project article:

If you want to make a neuroscientist scoff, mention the billion-dollar-plus Human Brain Project…Even before it began, the project was ridiculed by those in the know. Words like “hooey” and “crazy” were thrown around, along with less family-friendly terms…Almost no one—except for those on the project’s ample payroll—seemed to think it was a good idea.

In reply to an interview request, Konrad Kording, a neuroscientist at Northwestern University, wrote back: “Why do you want to talk about this embarrassing corpse?” He added a smiley emoji, but he’s not really kidding. Mr. Kording has nothing nice to say about a project that, according to him, has become a reliable punchline among his colleagues. “I’m 100-percent convinced that virtually all the money spent on it will lead to no insight in neuroscience whatsoever,” he said. “It’s a complete waste. It’s boneheaded. It’s wrong. Everything that they’re trying to do makes no sense whatsoever.”

Jeremy Freeman is similarly skeptical, if a touch more diplomatic. Mr. Freeman, a neuroscientist at the Howard Hughes Medical Institute, sees it as “kind of an absurd project” and misguided to boot. “Insofar as the goal is to establish a working simulation of the entire human brain, or even a single cortical column, I believe that it’s premature,” he said, chuckling. “I also don’t think rushing toward a simulation is the right avenue for progress.”

et cetera. I mean, regardless of how you feel about the project you have to appreciate inspired academic vitriol when you see it (unless you are the target of it, obviously).

Konrad Koerding’s objections to the HBP are probably more informative, however (more detail in link):

1) We lack the knowledge needed to build meaningful bottom up models and I will just give a few examples:
a) We know something about a small number of synapses but not how they interact
b) We know something about a small number of cell types, but not about the full multimodal statistics (genes, connectivity, physiology)

The degree of the lack of knowledge is mindboggling. There are far more neurons in the brain than people on the planet.

2) We do not know how to combine multimodal information

3) We do not know what the right language is for simulating the brain.

Von Neumann

It was the anniversary of John von Neumann’s death last Sunday. If I had an intellectual hero it would be von Neumann; he basically was an expert in everything. Like many of those involved with the Manhattan Project, he died fairly young (53) of cancer. From On This Day In Math:

John von Neumann (28 Dec 1903, 8 Feb 1957 at age 53) Hungarian-American mathematician who made important contributions in quantum physics, logic, meteorology, and computer science. He invented game theory, the branch of mathematics that analyses strategy and is now widely employed for military and economic purposes. During WW II, he studied the implosion method for bringing nuclear fuel to explosion and he participated in the development of the hydrogen bomb. He also set quantum theory upon a rigorous mathematical basis. In computer theory, von Neumann did much of the pioneering work in logical design, in the problem of obtaining reliable answers from a machine with unreliable components, the function of “memory,” and machine imitation of “randomness.”

While he was in the hospital being treated for cancer, he worked on notes that became The Computer and the Brain. On twitter, @mxnmnkmnd pointed me to work that von Neumann had done on computability in neural networks. Claude Shannon wrote a review of this work:

One important part of von Neumann’s work on automata relates to the problem of designing reliable machines using unreliable components…Given a set of building blocks with some positive probability of malfunctioning, can one by suitable design construct arbitrarily large and complex automata for which the overall probability of incorrect output is kept under control? Is it possible to obtain a probability of error as small as desired, or at least a probability of error not exceeding some fixed value (independent of the particular automaton) ?

We have, in human and animal brains, examples of very large and relatively reliable systems constructed from individual components, the neurons, which would appear to be anything but reliable, not only in individual operation but in fine details of interconnection… Merely performing the same calculation many times and then taking a majority vote will not suffice. The majority vote would itself be taken by unreliable components and thus would have to be taken many times and majority votes taken of the majority votes. And so on. We are face to face with a “Who will watch the watchman” type of situation.

So how do we do it? von Neumann offers two solutions. The first is what I would call the “mathematician’s” approach:

This solution involves the construction from three unreliable sub-networks, together with certain comparing devices, of a large and more reliable sub-network to perform the same function. By carrying this out systematically throughout some network for realizing an automaton with reliable elements, one obtains a network for the same behavior with unreliable elements….In the first place, the final reliability cannot be made arbitrarily good but only held at a certain level. If the individual components are quite poor the solution, then, can hardly be considered satisfactory. Secondly, and even more serious from the point of view of application, the redundancy requirements for this solution are fantastically high in typical cases. The number of components required increases exponentially…

The second approach is the more statistical one, and is probably an important link between the McCulloch-Pitts school of computers as logic devices and the information-theoretic approach that is more relevant to today:

The second approach involves what von Neumann called the multiplex trick. This means representing a binary output in the machine not by one line but by a bundle of N lines, the binary variable being determined by whether nearly all or very few of the lines carry the binary value 1. An automaton design based on reliable components is, in this scheme, replaced by one where each line becomes a bundle of lines, and each component is replaced by a subnetwork which operates in the corresponding fashion between bundles of input and output lines…

He also makes some estimates of the redundancy requirements for certain gains in reliability. For example, starting with an unreliable “majority” organ whose probability of error is 1/200, by a redundancy of 60,000 to 1 a sub-network representing a majority organ for bundles can be constructed whose probability of error is 10^-20 . Using reasonable figures this would lead to an automaton of the complexity and speed of the brain operating for a hundred years with expectation about one error. In other words, something akin to this scheme is at least possible as the basis of the brain’s reliability.

So not only is the second approach more feasible, it’s just plain better.

This is still extremely relevant. I went to a very nice talk two weeks ago on fruit fly larva. These are worms, just like the nematode C. elegans, that move around and do a lot of the same things that C. elegans do. Yet they have orders of magnitude more neurons! Why do they need so many? It does not seem like they do that much more behavior (I don’t know, maybe they do). It could be pattern separation – perhaps they can break the world into tinier pieces – but it seems more like a better candidate may be error correction. I would hazard a guess that the C. elegans nervous system is more sensitive to noise. The fact that neural responses seem slower – there are no spikes, and neurons respond over seconds rather than milliseconds – would indicate that they solve the problem through temporal integration. Whatever works.

The original paper is very readable and full of quite interesting ideas; go read it.

Two other quotes I like from these:

If we think of the brain as some kind of computing machine it is perfectly possible that the external language we use in communicating with each other may be quite different from the internal language used for computation (which includes, of course, all the logical and information-processing phenomena as well as arithmetic computation). In fact von Neumann gives various persuasive arguments that we are still totally unaware of the nature of the primary language for mental calculation. He states “Thus logics and mathematics in the central nervous system, when viewed as languages, must be structurally essentially different from those languages to which our common experience refers.

and

“It also ought to be noted that the language here involved may well correspond to a short code in the sense described earlier, rather than to a complete code: when we talk mathematics, we may be discussing a secondary language, built on the primary language truly used by the central nervous system. Thus the outward forms of our mathematics are not absolutely relevant from the point of view of evaluating what the mathematical or logical language truly used by the central nervous system is. However, the above remarks about reliability and logical and arithmetic depth prove that whatever the system is, it cannot fail to differ considerably from what we consciously and explicitly consider as mathematics.”

Inequality in faculty placement

inequality in academia

How does your PhD institution affect your chances at a faculty position?

Across disciplines, we find steep prestige hierarchies, in which only 9 to 14% of faculty are placed at institutions more prestigious than their doctorate…Under a meritocracy, the observed placement rates would imply that faculty with doctorates from the top 10 units are inherently two to six times more productive than faculty with doctorates from the third 10 units. The magnitude of these differences makes a pure meritocracy seem implausible, suggesting the influence of nonmeritocratic factors like social status.

Academia ROCSome factoids:

  • This falloff in faculty production is sufficiently steep that only the top 18 to 36% of institutions are net producers of within-discipline faculty
  • Differences by gender are greatest for graduates of the most prestigious institutions in computer science and business, where median placement for women graduating from the top 15% of units is 12 to 18% worse than for men from the same institutions. That is, the hierarchy is slightly steeper for elite women than for elite men in these disciplines.
  • These results are broadly consistent with an academic system organized in a classic core-periphery pattern (17), in which increased prestige correlates with occupying a more central, better connected, and more influential network position…As a result, faculty at central institutions literally perceive a “small world”  as compared to faculty located in the periphery.
  • Reinforcing the association of centrality and insularity with higher prestige, we observe that 68 to 88% of faculty at the top 15% of units received their doctorate from within this group, and only 4 to 7% received their doctorate from below the top 25% of units.

You can find their prestige rankings for Computer Science, Business, and History in their supplemental material (figure S10).

Obviously, things are more complicated when you have postdocs or are in a “high status” lab in a “low prestige” university.

The problems of academic insularity and flow of good ideas is evident.

Reference

Clauset A, Arbesman S, & Larremore DB (2015). Systematic inequality and hierarchy in faculty hiring networks Science Advances

Walter Pitts was Will Hunting

Apparently Walter Pitts (of McCulloch-Pitts) was Good Will Hunting:

Standing face to face, they were an unlikely pair. McCulloch, 42 years old when he met Pitts, was a confident, gray-eyed, wild-bearded, chain-smoking philosopher-poet who lived on whiskey and ice cream and never went to bed before 4 a.m. Pitts, 18, was small and shy, with a long forehead that prematurely aged him, and a squat, duck-like, bespectacled face. McCulloch was a respected scientist. Pitts was a homeless runaway. He’d been hanging around the University of Chicago, working a menial job and sneaking into Russell’s lectures, where he met a young medical student named Jerome Lettvin.

This article is so great I could quote the whole thing. But I won’t! You only this and then you must go and read all of it:

Pitts was soon to make a similar impression on one of the towering intellectual figures of the 20th century, the mathematician, philosopher, and founder of cybernetics, Norbert Wiener. In 1943, Lettvin brought Pitts into Wiener’s office at the Massachusetts Institute of Technology (MIT). Wiener didn’t introduce himself or make small talk. He simply walked Pitts over to a blackboard where he was working out a mathematical proof. As Wiener worked, Pitts chimed in with questions and suggestions. According to Lettvin, by the time they reached the second blackboard, it was clear that Wiener had found his new right-hand man. Wiener would later write that Pitts was “without question the strongest young scientist whom I have ever met … I should be extremely astonished if he does not prove to be one of the two or three most important scientists of his generation, not merely in America but in the world at large.”

…His work with Wiener was “to constitute the first adequate discussion of statistical mechanics, understood in the most general possible sense, so that it includes for example the problem of deriving the psychological, or statistical, laws of behavior from the microscopic laws of neurophysiology … Doesn’t it sound fine?”

That winter, Wiener brought Pitts to a conference he organized in Princeton with the mathematician and physicist John von Neumann, who was equally impressed with Pitts’ mind. Thus formed the beginnings of the group who would become known as the cyberneticians, with Wiener, Pitts, McCulloch, Lettvin, and von Neumann its core. And among this rarified group, the formerly homeless runaway stood out. “None of us would think of publishing a paper without his corrections and approval,” McCulloch wrote. “[Pitts] was in no uncertain terms the genius of our group,” said Lettvin. “He was absolutely incomparable in the scholarship of chemistry, physics, of everything you could talk about history, botany, etc. When you asked him a question, you would get back a whole textbook … To him, the world was connected in a very complex and wonderful fashion.”

Here is the original research article – fascinating historically and not at all what I would have expected. Very Principia Mathematica-y for neuroscience, but I suppose that was the time?

On Quantity of Information

“On Quantity of Information”

by Walter Pitts

Random remarks are traced by little boys
In wet cement; synapses in the brain
Die off; renewing uplift glyphs mountain
And valley in peneplane; the mouth rounds noise
To consonants in truisms: Thus expands law
Cankering the anoetic anonymous.
“If any love magic, he is most impious:
Him I cut off, who turn his world to straw,
Making him know Me.” So speaks the nomothete
Concealed in crystals, contracting myosin,
Imprisoning man by close-packing in his own kind.
We, therefore, exalt entropy and heat,
Fist-fight for room, trade place, momentum, spin,
Successful enough if life is undesigned.

From Nautilus