I have just returned from the Reinforcement Learning and Decision Making (RLDM) conference and it was…quite the learning experience. As a neuroscientist, I tend to only read other scientific papers written by neuroscientists so it is easy to forget how big the Reinforcement Learning community really is. The talks were split pretty evenly between the neuroscience/psychology community and the machine learning/CS community, with a smattering of other people (including someone who works on forestry problems to find the optimal response to forest fires!). All in all, it was incredibly productive and I learned a ton about the machine learning side of things while meeting great people.
I think my favorite fact that I learned was from Tom Stafford, which is that there is a color called the ‘tritan line’ which is visible to visual cortex but not to certain other visual areas (specifically the superior colliculus). Just the idea that there is a color invisible to certain visual area is…bizarre and awesome. The paper he presented is discussed on his blog here.
There were a few standout talks.
Joe Kable gave a discussion of the infamous marshmallow task, where a young child is asked to not eat a marshmallow while the adult leaves the room for some indeterminate amount of time. It turns out that if the child believes the adult’s returning time is distributed in a Gaussian fashion then it makes sense to wait but if the returning time follows a heavy-tailed distribution then it makes sense to eat the marshmallow. This is because the predicted amount of time until the adult returns increases as time passes for a heavy-tailed function. And indeed, if you ask subjects to do a delay task they act as if the distribution of delay times are heavy-tailed. See his paper here.
Yin Li used monkeys to ask how an animal’s learning rate changes depending on the situation. There is no one optimal learning rate: it depends on the situation. If you are in an environment where you a tracking a target with little noise until sudden dramatic changes (small variance in between sudden changes in mean), then you want a high learning rate; you are not at risk of being overly responsive to the internal variability of the signal while it is stationary On the other hand, if there is a very noisy signal whose mean does not change much, then you want a low learning rate. When a monkey is given a task like this, it does about as well as a Bayesian-optimal model. I’m not sure which one he used, though I think this is a problem that has gotten attention in vision (see Wark et al and DeWeese & Zador). Anyway, when they try to fit a bog-standard Reinforcement Learning model it cannot fit the data. This riled up the CS people in the audience who suggested that something called “adaptive learning RL” could have fit the data, a technique I am not aware of? Although Li’s point was that the basic RL algorithm is insufficient to explain behavior, it also highlights the lack of crosstalk between the two RL kingdoms.
Michael Littman gave an absolutely fantastic talk asking how multiple agents should coordinate their behavior. If you use RL, one possibility is just to treat other agents as randomly moving objects…but “that’s a bit autistic”, as Littman put it. Instead, you can do something like minimax or maximin. Then you just need to find the Nash equilibrium! Unfortunately this doesn’t always converge to the correct answer, there can be multiple equilibria, and it requires access to the other agent’s value. Littman suggested that side payments can solve a lot of these problems (I think someone was paging Coase).
Finally, the amazing Alex Kacelnik gave a fascinating talk about parasitism in birds, particularly cuckoos. It turns out that when you take into account costs of eggs and such, it might actually be beneficial to the host to raise 1-2 parasite eggs; at least, it’s not straight forward that killing the parasites is the optimal decision. Anne Churchland asked whether neurons in the posterior parietal cortex of rats show mixed sensory and decision signals, and then showed that they are orthogonal on the level of the population. Paul Phillips gave a very lucid talk detailing the history of dopamine and TD learning. Tom Dietterich showed how reinforcement learning is being used by the government to make decisions for fire and invasive-species control. And Pieter Abbeel showed robots! See, for instance, the PR2 Willow Garage fetching beer (other videos):
Here are some other robots he mentioned.
Some final thoughts:
1. CS people are interested in convergence proofs, etc. But in the end, a lot of their talks were really just them acting as engineers trying to get things to work in the real world. That’s not that far from what psychologists and neuroscientists are doing: trying to figure out why things are working the way that they are.
2. In that spirit, someone in psych/neuro needs to take the leading-edge of what CS people are doing and apply it to human/animal models of decision-making. I’ve never heard of Adaptive Reinforcement Learning; what else is there?
3. At the outset, it would help if they could make it clear what are the open research questions for each field. At the end, maybe there could be some discussion on how to get the fields to collaborate more.
4. Invite some economists! They have this whole thing called Decision Theory… and would have a lot to contribute.