Learn by consuming the brains of your enemies

A few people have sent this my way and asked about it:

In a paper published Monday in the journal eNeuro, scientists at the University of California-Los Angeles reported that when they transferred molecules from the brain cells of trained snails to untrained snails, the animals behaved as if they remembered the trained snails’ experiences…

In experiments by Dr. Glanzman and colleagues, when these snails get a little electric shock, they briefly retract their frilly siphons, which they use for expelling waste. A snail that has been shocked before, however, retracts its siphon for much longer than a new snail recruit.

To understand what was happening in their snails, the researchers first extracted all the RNA from the brain cells of trained snails, and injected it into new snails. To their surprise, the new snails kept their siphons wrapped up much longer after a shock, almost as if they’d been trained.

Next, the researchers took the brain cells of trained snails and untrained snails and grew them in the lab. They bathed the untrained neurons in RNA from trained cells, then gave them a shock, and saw that they fired in the same way that trained neurons do. The memory of the trained cells appeared to have been transferred to the untrained ones.

The full paper is here.

Long and short of this is that there is a particular reflex (memory) that changes when they have experienced a lot of shocks. How memory is encoded is a bit debated but one strongly-supported mechanism (especially in these snails) is that there are changes in the amount of particular proteins that are expressed in some neurons. These proteins might make more of one channel or receptor that makes it more or less likely to respond to signals from other neurons. So for instance, when a snail receives its first shock a neuron responds and it withdraws its gills. Over time, each shock builds up more proteins that make the neuron respond more and more. These proteins are built up by the amount of RNA (the “blueprint” for the proteins, if you will) that are located in the vicinity of the neuron that can receive this information. There are a lot of sophisticated mechanisms that determine how and where these RNAs are built and then shipped off to the place in the neuron where they can be of the most use.

This new paper shows that in these snails, you can just dump the RNA on these neurons from someone else and the RNA has already encoded something about the type of protein it will produce. This is not going to work in most situations (I think?) so it is surprising and cool that it does here! But hopefully you can begin to see what is happening and how the memory is transferring. The RNA is now in the cell, it is now marked in a way that will lead it to produce some protein that will change how the cell responds to input, etc, etc.

One of the people who asked me about this asked specifically in relation to AI. Could this be used as a new method of training in Deep Networks somehow? The closest analogy I can think of is if you have two networks with the same architecture that have been trained in the same way (this is evolution). Then you train a little more, maybe on new stimuli or maybe on a new task, or maybe you are doing reinforcement learning and you have a network that predicts a different action-value pair. Now the analogy would be if you chose a few units (neurons) and directly copied the weights from the first network into the second network. Would this work? Would this be useful? I doubt it, but maybe? But see this interesting paper on knowledge distillation that was pointed to me by John O’Malia.

What HASN’T Deep Learning replicated from the brain?

The brain represents the world in particular ways. Here are a few:

1. The visual world on the retina

The retina is thought to whiten images, or transform them so that they always have roughly the same average, maximum and minimum (so that you can see in very bright and very dark environments. This was originally shown very nicely in two papers from Atick and Redlich (1990, 1992). Essentially, you want to smooth the visual scene around each point depending on the noise. You get receptive fields that look something like this:

Or more generally this:

A denoising autoencoder – a network that tries to replicate a corrupted image which smooths locally – has neural representations that look similar:

2. The visual world in first order visual cortex

Similarly, if you want to efficiently represent the visual world (once it is denoised) you want to represent things sparsely or independently. This was shown by Olshausen and Field 1996  and Bell and Sejnowski 1997 and is equivalent to doing ICA on natural images. Note that doing PCA on natural images will give you Fourier components.

If you train a Deep Network on ImageNet (AlexNet example below), the filters on the first layer look similar:

3. The auditory world

The best representation of the auditory world is also efficiently encoded. Lewicki 2002 show that if you run ICA on acoustic data you get filters that look like nearly identical to the sounds neurons respond to (wavelet basis functions).

I have not seen a visualization of the first few layers of a neural network that classifies speech (for instance) but I would guarantee it has features that look like wavelets.

4. Spatial cells

Your sense of place in the world is encoded by a series of grid cells – which are a periodic representation of place – and place cells, which are precise locations in space. Dordek et al 2016 showed that non-negative PCA on place cells will give you grid cells. This is similar to the result that PCA on images gives you Fourier components. Note that Dordek et al also use a single-layer feedforward neural network and show that it has a similar property.

It turns out if you train a Deep recurrent network on network navigation task, you get grid cells (once you have assumed place cells).


What else is left? Olfaction is a mess and doesn’t have a coherent coding principle as far as I can tell (the olfactory space is not clearly defined). Mechanosensation (touch) has been hard to define but Zhao et al 2017 can find first-order touch receptive fields with an autoencoder (like with vision). You can get CPGs (oscillatory movement generators) with recurrent neural networks by training an input signal to be associated with a particular sequence of movements. I’m struggling to think of other internal representations that are well understood.

A long-term principle in neuroscience has been that successive layers of the brain are attempting to decorrelate their responses to produce ever-finer features. Tishby and Zaslavsky 2015 suggest that a similar principle applies to Deep Networks: you have a constrained input output and networks are trying to find the representations that encode the most information between input and output given the limited bandwidth that they have (numbers of layers, numbers of units). It should not be surprising that this entails something like different forms of PCA or ICA or other signal-detection framework.

One of the nice things about Deep Networks is that you do not have to explicitly code for this in order to find these features – they are costless in a way. You can train for a particular task – a visually-driven one, a path-driven one, an acoustic-driven one – and these features will just fall out. Not only will these features fall out, but neurons which are deeper in the pathway will also have similar activity. This is a much harder problem and one in which “run PCA again” or “run ICA again” will not give a good answer to.

What other neural representations have we not yet seen in neural networks?

Monday Open Question: does neuroscience have anything to offer AI?

A review was published this week in Neuron by DeepMind luminary Demis Hassibis and colleagues about Neuroscience-inspired Artificial Intelligence. As one would expect from a journal called Neuron, the article was pretty positive about the use of neurons!

There have been two key concepts from neuroscience that are ubiquitous in the AI field today: Deep Learning and Reinforcement Learning. Both are very direct descendants of research from the neuroscience community. In fact, saying that Deep Learning is an outgrowth of neuroscience obscures the amount of influence neuroscience has had. It did not just gift the idea of connecting of artificial neurons together to build a fictive brain, but much more technical ideas such as convolutional neural networks that use a single function repeatedly across its input as the retina or visual cortex does; hierarchical processing in the way the brain goes from layer to layer; divisive normalization as a way to keep outputs within a reasonable and useful range. Similarly, Reinforcement Learning and all its variants have continued to expand and be developed by the cognitive community.

Sounds great! So what about more recent inspirations? Here, Hassibis &co offer up the roles of attention, episodic memory, working memory, and ‘continual learning’. But reading this, I became less inspired than morose (see this thread). Why? Well look at the example of attention. Attention comes in many forms: automatic, voluntary, bottom-up, top-down, executive, spatial, feature-based, objected-based, and more. It sometimes means a sharpening of the collection of things a neuron responds to, so instead of being active in response to an edge oriented, thisthat, or another way, it only is active when it sees an edge oriented that way. But it sometimes means a narrowing of the area in space that it responds to. Sometimes responses between neurons become more diverse (decorrelated).

But this is not really how ‘attention’ works in deep networks. All of these examples seem primarily motivated by the underlying psychology, not the biological implementation. Which is fine! But does that mean that the biology has nothing to teach us? Even at best, I am not expecting Deep Networks to converge precisely to mammalian-based neural networks, nor that everything the brain does should be useful to AI.

This leads to some normative questions: why hasn’t neuroscience contributed more, especially to Deep Learning? And should we even expect it to?

It could just be that the flow of information from neuroscience to AI  is too weak. It’s not exactly like there’s a great list of “here are all the equations that describe how we think the brain works”. If you wanted to use a more nitty-gritty implementation of attention, where would you turn? Scholarpedia? What if someone wants to move step-by-step through all the ways that visual attention contributes to visual processing? How would they do it? Answer: they would become a neuroscientist. Which doesn’t really help, time-wise. But maybe, slowly over time, these two fields will be more integrated.

More to the point, why even try? AI and neuroscience are two very different fields; one is an engineering discipline of, “how do we get this to work” and the other a scientific discipline of “why does this work”. Who is to say that anything we learn from neuroscience would even be relevant to AI? Animals are bags of meat that have a nervous system trying to solve all sorts of problems (like wiring length energy costs between neurons, physical transmission delays, the need to blood osmolality, etc) that AI has no real interest or need in including but may be fundamental to how the nervous system has evolved. Is the brain the bird to AI’s airplane, accomplishing the same job but engineered in a totally different way?

Then in the middle of writing this, a tweet came through my feed that made me think I had a lot of this wrong (I also realized I had become too fixated on ‘the present’ section of their paper and less on ‘the past’ which is only a few years old anyway).

The ‘best paper’ award at the CVPR 2017 conference went to this paper which connects blocks of layers together, passing forward information from one to the next.

That looks a lot more like what cortex looks like! Though obviously sensory systems in biology are a bit more complicated:

And the advantages? “DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters”

So are the other features of cortex useful in some way? How? How do we have to implement them to make them useful? What are the drawbacks?

Neuroscience is big and unwieldy, spanning a huge number of different fields. But most of these fields are trying to solve exactly the same problem that Deep Learning is trying to solve in very similar ways. This is an incredibly exciting opportunity – a lot of Deep Learning is essentially applied theoretical neuroscience. Which of our hypotheses about why we have attention are true? Which are useless?

Yeah, but what has ML ever done for neuroscience?

This question has been going round the neurotwitters over the past day or so.

Let’s limit ourselves to ideas that came from machine learning that have had an influence on neural implementation in the brain. Physics doesn’t count!

  • Reinforcement learning is always my go-to though we have to remember the initial connection from neuroscience! In Sutton and Barto 1990, they explicitly note that “The TD model was originally developed as a neuron like unit for use in adaptive networks”. There is also the obvious connection the the Rescorla-Wagner model of Pavlovian conditioning. But the work to show dopamine as prediction error is too strong to ignore.
  • ICA is another great example. Tony Bell was specifically thinking about how neurons represent the world when he developed the Infomax-based ICA algorithm (according to a story from Terry Sejnowski). This obviously is the canonical example of V1 receptive field construction
    • Conversely, I personally would not count sparse coding. Although developed as another way of thinking about V1 receptive fields, it was not – to my knowledge – an outgrowth of an idea from ML.
  • Something about Deep Learning for hierarchical sensory representations, though I am not yet clear on what the principal is that we have learned. Progressive decorrelation through hierarchical representations has long been the canonical view of sensory and systems neuroscience. Just see the preceding paragraph! But can we say something has flowed back from ML/DL? From Yemins and DiCarlo (and others), can we say that maximizing the output layer is sufficient to get similar decorrelation as the nervous system?

And yet… what else? Bayes goes back to Helmholtz, in a way, and at least precedes “machine learning” as a field. Are there examples of the brain implementing…. an HMM? t-SNE? SVMs? Discriminant analysis (okay, maybe this is another example)?

My money is on ideas from Deep Learning filtering back into neuroscience – dropout and LSTMs and so on – but I am not convinced they have made a major impact yet.

RIP Marvin Minsky, 1927-2016

Marvin Minsky in Detroit

I awoke to sad news this morning – Marvin Minsky passed away at the age of 88. Minsky’s was the first serious work on artificial intelligence that I ever read and one of the reasons I am in neuroscience today.

Minsky is infamously known for his book Perceptrons, which most famously showed that the neural networks at the time had problems with computations such as XOR (here is the solution, which every neuroscientist should know!).

Minsky is also known for the Dartmouth Summer Research Conference, whose proposal is really worth reading in full.

Fortunately, Minsky put many of his writings online which I have been rereading this morning. You could read his thoughts on communicating with Alien Intelligence:

All problem-solvers, intelligent or not, are subject to the same ultimate constraints–limitations on space, time, and materials. In order for animals to evolve powerful ways to deal with such constraints, they must have ways to represent the situations they face, and they must have processes for manipulating those representations.

ECONOMICS: Every intelligence must develop symbol-systems for representing objects, causes and goals, and for formulating and remembering the procedures it develops for achieving those goals.

SPARSENESS: Every evolving intelligence will eventually encounter certain very special ideas–e.g., about arithmetic, causal reasoning, and economics–because these particular ideas are very much simpler than other ideas with similar uses.

He also mentions this, which sounds fascinating. I was not aware of this but cannot find the actual paper. If anyone can send me the citation, please leave a comment!

A TECHNICAL EXPERIMENT. I once set out to explore the behaviors of all possible processes–that is, of all possible computers and their programs. There is an easy way to do that: one just writes down, one by one, all finite sets of rules in the form which Alan Turing described in 1936. Today, these are called “Turing machines.” Naturally, I didn’t get very far, because the variety of such processes grows exponentially with the number of rules in each set. What I found, with the help of my student Daniel Bobrow, was that the first few thousand such machines showed just a few distinct kinds of behaviors. Some of them just stopped. Many just erased their input data. Most quickly got trapped in circles, repeating the same steps over again. And every one of the remaining few that did anything interesting at all did the same thing. Each of them performed the same sort of “counting” operation: to increase by one the length of a string of symbols–and to keep repeating that. In honor of their ability to do what resembles a fragment of simple arithmetic, let’s call these them “A-Machines.” Such a search will expose some sort of “universe of structures” that grows and grows. For our combinations of Turing machine rules, that universe seems to look something like this:

minsky turing machines

In Why Most People Think Computers Can’t, he gets off a couple of cracks at people who think computers can’t do anything humans can:

Most people assume that computers can’t be conscious, or self-aware; at best they can only simulate the appearance of this. Of course, this
assumes that we, as humans, are self-aware. But are we? I think not. I
know that sounds ridiculous, so let me explain.

If by awareness we mean knowing what is in our minds, then, as every  clinical psychologist knows, people are only very slightly self-aware, and  most of what they think about themselves is guess-work. We seem to build  up networks of theories about what is in our minds, and we mistake these  apparent visions for what’s really going on. To put it bluntly, most of  what our “consciousness” reveals to us is just “made up”. Now, I don’t  mean that we’re not aware of sounds and sights, or even of some parts of  thoughts. I’m only saying that we’re not aware of much of what goes on inside our minds.

Finally, he has some things to say on Symbolic vs Connectionist AI:

Thus, the present-day systems of both types show serious limitations. The top-down systems are handicapped by inflexible mechanisms for retrieving knowledge and reasoning about it, while the bottom-up systems are crippled by inflexible architectures and organizational schemes. Neither type of system has been developed so as to be able to exploit multiple, diverse varieties of knowledge.

Which approach is best to pursue? That is simply a wrong question. Each has virtues and deficiencies, and we need integrated systems that can exploit the advantages of both. In favor of the top-down side, research in Artificial Intelligence has told us a little—but only a little—about how to solve problems by using methods that resemble reasoning. If we understood more about this, perhaps we could more easily work down toward finding out how brain cells do such things. In favor of the bottom-up approach, the brain sciences have told us something—but again, only a little—about the workings of brain cells and their connections.

Apparently, he viewed the symbolic/connectionist split like so:

minsky connectionist vs symbolic

How a neural network can create music

Playing chess, composing classical music, __: computer programmers love creating ‘AIs’ that can do this stuff. Music, especially is always fun: there is a long history of programs that can create new songs that are so good that they fool professional musicians (who cannot tell the difference between a Chopin song and a generated song – listen to some here; here is another video).

I do not know how these have worked; I would guess a genetic algorithm, hidden markov model, or neural network of some sort. Thankfully Daniel Johnson has just created such a neural network and laid out the logic behind it in beautiful detail:

Music composing neural network

The power of this is that it enables the network to have a simple version of memory, with very minimal overhead. This opens up the possibility of variable-length input and output: we can feed in inputs one-at-a-time, and let the network combine them using the state passed from each time step.

One problem with this is that the memory is very short-term. Any value that is output in one time step becomes input in the next, but unless that same value is output again, it is lost at the next tick. To solve this, we can use a Long Short-Term Memory (LSTM) node instead of a normal node. This introduces a “memory cell” value that is passed down for multiple time steps, and which can be added to or subtracted from at each tick. (I’m not going to go into all of the details, but you can read more about LSTMs in the original paper.)…

However, there is still a problem with this network. The recurrent connections allow patterns in time, but we have no mechanism to attain nice chords: each note’s output is completely independent of every other note’s output. Here we can draw inspiration from the RNN-RBM combination above: let the first part of our network deal with time, and let the second part create the nice chords. But an RBM gives a single conditional distribution of a bunch of outputs, which is incompatible with using one network per note.

The solution I decided to go with is something I am calling a “biaxial RNN”. The idea is that we have two axes (and one pseudo-axis): there is the time axis and the note axis (and the direction-of-computation pseudo-axis). Each recurrent layer transforms inputs to outputs, and also sends recurrent connections along one of these axes. But there is no reason why they all have to send connections along the same axis!

What blows me away – and yes, I am often blown away these days – is how relatively simple all these steps are. By using logical, standard techniques for neural networks (and these are not deep), the programmer on the street can create programs that are easily able to do things that were almost unfathomable a decade ago. This is not just pattern separation, but also generation.

Rationality and the machina economicus

Science magazine had an interesting series of review articles on Machine Learning last week. Two of them were different perspectives of the exact same question: how does traditional economic rationality fit into artificial intelligence?

At the core of much AI work are concepts of optimal ‘rational decision-makers’. That is, the intelligent program is essentially trying to maximize some defined objective function, known economics as maximizing utility. Where the computer and economic traditions diverge is in their implementation: computers need algorithms, and often need to take into account non-traditional resource constraints such as time, whereas in economics this is left unspecified outside of trivial cases.

economics of thinking

How can we move from the classical view of a rational agent who maximizes expected utility over an exhaustively enumerable state-action space to a theory of the decisions faced by resource-bounded AI systems deployed in the real world, which place severe demands on real-time computation over complex probabilistic models?

We see the attainment of an optimal stopping time, in which attempts to compute additional precision come at a net loss in the value of action. As portrayed in the figure, increasing the cost of computation would lead to an earlier ideal stopping time. In reality, we rarely have such a simple economics of the cost and benefits of computation. We are often uncertain about the costs and the expected value of continuing to compute and so must solve a more sophisticated analysis of the expected value of computation.

Humans and other animals appear to make use of different kinds of systems for sequential decision-making: “model-based” systems that use a rich model of the environment to form plans, and a less complex “model-free” system that uses cached values to make decisions. Although both converge to the same behavior with enough experience, the two kinds of systems exhibit different tradeoffs in computational complexity and flexibility. Whereas model-based systems tend to be more flexible than the lighter-weight model-free systems (because they can quickly adapt to changes in environment structure), they rely on more expensive analyses (for example, tree-search or dynamic programming algorithms for computing values). In contrast, the model-free systems use inexpensive, but less flexible, look-up tables or function approximators.

That being said, what does economics have to offer machine learning? Parkes and Wellman try to offer an answer and basically say – game theory. Which is not something that economics can ‘offer’ so much as ‘offered a long, long time ago’. A recent interview with Parkes puts this in perspective:

Where does current economic theory fall short in describing rational AI?

Machina economicus might better fit the typical economic theories of rational behavior, but we don’t believe that the AI will be fully rational or have unbounded abilities to solve problems. At some point you hit the intractability limit—things we know cannot be solved optimally—and at that point, there will be questions about the right way to model deviations from truly rational behavior…But perfect rationality is not achievable in many complex real-world settings, and will almost surely remain so. In this light, machina economicus may need its own economic theories to usefully describe behavior and to use for the purpose of designing rules by which these agents interact.

Let us admit that economics is not fantastic at describing trial-to-trial individual behavior. What can economics offer the field of AI, then? Systems for multi-agent interaction. After all, markets are what are at the heart of economics:

At the multi-agent level, a designer cannot directly program behavior of the AIs but instead defines the rules and incentives that govern interactions among AIs. The idea is to change the “rules of the game”…The power to change the interaction environment is special and distinguishes this level of design from the standard AI design problem of performing well in the world as given.

For artificial systems, in comparison, we might expect AIs to be truthful where this is optimal and to avoid spending computation reasoning about the behavior of others where this is not useful…. The important role of mechanism design in an economy of AIs can be observed in practice. Search engines run auctions to allocate ads to positions alongside search queries. Advertisers bid for their ads to appear in response to specific queries (e.g., “personal injury lawyer”). Ads are ranked according to bid amount (as well as other factors, such as ad quality), with higher-ranked ads receiving a higher position on the search results page.

Early auction mechanisms employed first-price rules, charging an advertiser its bid amount when its ad receives a click. Recognizing this, advertisers employed AIs to monitor queries of interest, ordered to bid as little as possible to hold onto the current position. This practice led to cascades of responses in the form of bidding wars, amounting to a waste of computation and market inefficiency. To combat this, search engines introduced second-price auction mechanisms, which charge advertisers based on the next-highest bid price rather than their own price. This approach (a standard idea of mechanism design) removed the need to continually monitor the bid- ding to get the best price for position, thereby end- ing bidding wars.

But what comes across most in the article is how much economics needs to seriously consider AI (and ML more generally):

The prospect of an economy of AIs has also inspired expansions to new mechanism design settings. Researchers have developed incentive-compatible multiperiod mechanisms, considering such factors as uncertainty about the future and changes to agent preferences because of changes in local context. Another direction considers new kinds of private inputs beyond preference information.

I would have loved to see an article on “what machine learning can teach economics” or how tools in ML are transforming the study of markets.

Science also had one article on “trends and prospects” in ML and one on natural language processing.


Parkes, D., & Wellman, M. (2015). Economic reasoning and artificial intelligence Science, 349 (6245), 267-272 DOI: 10.1126/science.aaa8403

Gershman, S., Horvitz, E., & Tenenbaum, J. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines Science, 349 (6245), 273-278 DOI: 10.1126/science.aac6076

Small autonomous drones

Nature has a fascinating review on drones – and especially microdrones!


For those who don’t have access, here are some highlights (somewhat technical):

Propulsive efficiencies for rotorcraft degrade as the vehicle size is reduced; an indicator of the energetic challenges for flight at small scales. Smaller size typically implies lower Reynolds numbers, which in turn suggests an increased dominance of viscous forces, causing greater drag coefficients and reduced lift coefficients compared with larger aircraft. To put this into perspective, this means that a scaled-down fixed-wing aircraft would be subject to a lower lift-to-drag ratio and thereby require greater relative forward velocity to maintain flight, with the associated drag and power penalty reducing the overall energetic efficiency. The impacts of scaling challenges (Fig. 3) are that smaller drones have less endurance, and that the overall flight times range from tens of seconds to tens of minutes — unfavourable compared with human-scale vehicles.

There are, however, manoeuvrability benefits that arise from decreased vehicle size. For example, the moment of inertia is a strong function of the vehicle’s characteristic dimension — a measure of a critical length of the vehicle, such as the chord length of a wing or length of a propeller in a similar manner as used in Reynolds number scaling. Because the moment of inertia of the vehicle scales with the characteristic dimension, L, raised to the fifth power, a decrease in size from a 11 m wingspan, four-seat aircraft such as the Cessna 172 to a 0.05 m rotor-to-rotor separation Blade Pico QX quadcopter implies that the Cessna has about 5 × 1011 the inertia of the quadcopter (with respect to roll)…This enhanced agility, often achieved at the expense of open-loop stability, requires increased emphasis on control — a challenge also exacerbated by the size, weight and power constraints of these small vehicles.

microdrone flight vs mass


Improvements in microdrones will come from becoming more insect-like and adapting knowledge from biological models:


In many situations, such as search and rescue, parcel delivery in confined spaces and environmental monitoring, it may be advantageous to combine aerial and terrestrial capabilities (multimodal drones). Perching mechanisms could allow drones to land on walls and power lines in order to monitor the environment from a high vantage point while saving energy. Agile drones could move on the ground by using legs in conjunction with retractable or flapping wings. In an effort to minimize the total cost of transport, which will be increased by the additional locomotion mode, these future drones may benefit from using the same actuation system for flight control and ground locomotion…

Many vision-based insect capabilities have been replicated with small drones. For example, it has been shown that small fixed-wing drones and helicopters can regulate their distance from the ground using ventral optic flow while a GPS was used to maintain constant speed and an IMU was used to regulate roll angle. The addition of lateral optic flow sensors also allowed a fixed-wing drone to detect near-ground obstacles. Optic flow has also been used to perform both collision-free navigation and altitude control of indoor and outdoor fixed-wing drones without a GPS. In these drones, the roll angle was regulated by optic flow in the horizontal direction and the pitch angle was regulated by optic flow in the vertical direction, while the ground speed was measured and maintained by wind-speed sensors. In this case, the rotational optic flow was minimized by flying along straight lines interrupted by short turns or was estimated with on-board gyroscopes and subtracted from the total optic flow, as suggested by biological models

The future ecology of stock traders

I am beyond fascinated by the interactions between competing intelligences that exist in the stock market. It is a bizarre mishmash of humans, AIs, and both (cyborgpeople?).

One recent strategy that exploits this interaction is ‘spoofing‘. The description from the link:

  • You place an order to sell a million widgets at $104.
  • You immediately place an order to buy 10 widgets at $101.
  • Everyone sees the million-widget order and is like, “Wow, lotta supply, the market is going down, better dump my widgets!”
  • So someone is happy to sell you 10 widgets for $101 each.
  • Then you immediately cancel your million-widget order, leaving you with 10 widgets for which you paid $1,010.
  • Then you place an order to buy a million widgets for $101, and another order to sell 10 widgets at $104.
  • Everyone sees the new million-widget order, and since no one has any attention span at all, they are like, “Wow, lotta demand, the market is going up, better buy some widgets!”
  • So someone is happy to buy 10 widgets from you for $104 each.
  • Then you immediately cancel your million-widget order, leaving you with no widgets, no orders and $30 in sweet sweet profits.

Amusingly enough, you don’t even need a fancy computer program for it – you can just hire a bunch of people who are really good at fast video games and they can click click click those keys fast enough for you.

Now some day trader living in his parent’s basement is accused of using this technique and causing the flash crash of 2010 (it possibly wasn’t him directly, but he could have caused some cascade that led to it).

I’m sitting here with popcorn, waiting to see how the ecosystem of varied intelligences evolves in competition with each other. Sounds like Wall Street needs to take some crash courses in ecology.

How Deep Mind learns to win

About a year ago, DeepMind was bought for half a billion dollars by Google for creating software that could learn to beat video games. Over the past year, DeepMind has detailed how they did it.


Let us say that you were an artificial intelligence that had access to a computer screen, a way to play the game (an imaginary video game controller, say), and its current score. How should it learn to beat the game? Well, it has access to three things: the state of the screen (its input), a selection of actions, and a reward (the score). What the AI would want to do is find the best action to go along with every state.

A well-established way to do this without any explicit modeling of the environment is through Q-learning (a form of reinforcement learning). In Q-learning, every time you encounter a certain state and take an action, you have some guess of what reward you will get. But the world is a complicated, noisy place, so you won’t necessarily always get the same reward back in seemingly-identical situations. So you can just take the difference between the reward you find and what you expected, and nudge your guess a little closer.

This is all fine and dandy, though when you’re looking at a big screen you’ve got a large number of pixels – and a huge number of possible states. Some of them you may never even get to see! Every twist and contortion of two pixels is, theoretically, a completely different state. This would make it implausible to check each state, choose the action and play it again and again to get a good estimate of reward.

What we could do, if we were clever about it, is to use a neural network to learn features about the screen. Maybe sometimes this part of the screen is important as a whole and maybe other times those two parts of the screen are a real danger.

But that is difficult for the Q-learning algorithm. The DeepMind authors list three reasons: (1) correlations in sequence of observations, (2) small updates to Q significantly change the policy and the data distribution, and (3) correlations between action values and target values. It is how they tackle these problems that is the main contribution to the literature.

The strategy is to implement a Deep Convolutional Neural Network to find ‘filters’ that can more easily represent the state space. The network takes in the states – the images on the screen – processes them, and then outputs a value. In order to get around problems (1) and (3) above (the correlations in observations), they take a ‘replay’ approach. Actions that have been taken are stored into memory; when it is time to update the neural network, they grab some of the old state-action pairs out of their bag of memories and learn from that. They liken this to consolidation during sleep, where the brain replays things that had happened during the day.

Further, even though they train the network with their memories after every action, this is not the network that is playing the game. The network that is playing the game stays in stasis and only ‘updates itself’ with what it has learned after a certain stretch of time – again, like it is going to “sleep” to better learn what it had done during the day.

Here is an explanation of the algorithm in a hopefully useful form:


Throughout the article, the authors claim that this may point to new directions for neuroscience research. This being published in Nature, any claims to utility should be taken with a grain of salt. That being said! I am always excited to see what lessons arise when theories are forced to confront reality!

What this shows is that reinforcement learning is a good way to train a neural network in a model-free way. Given that all learning is temporal difference learning (or: TD learning is semi-Hebbian?), this is a nice result though I am not sure how original it is. It also shows that the replay way of doing it – which I believe is quite novel – is a good one. But is this something that  sleep/learning/memory researchers can learn from? Perhaps it is a stab in the direction of why it is useful (to deal with correlations).


Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, & Hassabis D (2015). Human-level control through deep reinforcement learning. Nature, 518 (7540), 529-533 PMID: 25719670

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, & Martin Riedmiller (2013). Playing Atari with Deep Reinforcement Learning arXiv arXiv: 1312.5602v1