Learning socially but not socially learning

How do we distinguish learning from our friends from learning because our friends happen to be around? When I was younger, Goldeneye on the Nintendo 64 was the game to play, but I was sadly N64-less. Did I learn how to play Goldeneye because my friends were good at it and showed me, or because whenever I was around them, Goldeneye was available for me to play? But here’s a fact: I suck at Goldeneye. If I learned anything from my friends vis-a-vis Goldeneye, it was how to be humble in the face of continual defeat.

When animals are foraging for food, they face a similar problem. If they forage on their own, they don’t lose any of their reward (like their self-respect) to other animals. But by foraging socially they are able to increase the likelihood that they will find some food.

One of the biggest problems that social foraging can solve is that of risk-aversion: the preference for guaranteed rewards over risky ones, even when risky ones will be more rewarding over the long-run. In many cases, this preference is a simple reflection of the learning that all animals undergo. Risky rewards have both very large and very small rewards. When given the choice between multiple options, a string of bad luck on the risky option will lead an animal that learns to give up that choice and stick with the less risky option.

Yet learning dynamics are slightly different when an animal is surrounded by other animals. Animals are not identical clones of each other (usually) but have a variety of personal preferences for risk and reward. If you plop sparrows in front of both more and less risky options, of course they’ll generally prefer the risky options. But sparrows come in groups! And when in groups, they can be scroungers, hanging back and waiting to see what others are doing. And this lets them take advantage both of the range of group preferences as well as the range of learning in the group.

Interestingly, individuals only learn about the desirability of an option when they were the ones to have chosen the option but not when they watched (and joined) another individual making a choice. I’m not sure if there is a lesson here on group learning? Perhaps it is better for the group to keep their knowledge uncorrelated so that it is combined their knowledge will be as diverse as possible? But either way, they are not socially learning, not learning how to do something by watching another animal. Rather, they are learning by being in a social group that allows them to take advantage of the learning of each individual in the group.

References

Ilan T., Katsnelson E., Motro U., Feldman M.W. & Lotem A. (2013). The role of beginner’s luck in learning to prefer risky patches by socially foraging house sparrows, Behavioral Ecology, 24 (6) 1398-1406. DOI:

Reinforcement Learning and Decision Making (RLDM) 2013

I have just returned from the Reinforcement Learning and Decision Making (RLDM) conference and it was…quite the learning experience. As a neuroscientist, I tend to only read other scientific papers written by neuroscientists so it is easy to forget how big the Reinforcement Learning community really is. The talks were split pretty evenly between the neuroscience/psychology community and the machine learning/CS community, with a smattering of other people (including someone who works on forestry problems to find the optimal response to forest fires!). All in all, it was incredibly productive and I learned a ton about the machine learning side of things while meeting great people.

I think my favorite fact that I learned was from Tom Stafford, which is that there is a color called the ‘tritan line’ which is visible to visual cortex but not to certain other visual areas (specifically the superior colliculus). Just the idea that there is a color invisible to certain visual area is…bizarre and awesome. The paper he presented is discussed on his blog here.

There were a few standout talks.

Joe Kable gave a discussion of the infamous marshmallow task, where a young child is asked to not eat a marshmallow while the adult leaves the room for some indeterminate amount of time. It turns out that if the child believes the adult’s returning time is distributed in a Gaussian fashion then it makes sense to wait but if the returning time follows a heavy-tailed distribution then it makes sense to eat the marshmallow. This is because the predicted amount of time until the adult returns increases as time passes for a heavy-tailed function. And indeed, if you ask subjects to do a delay task they act as if the distribution of delay times are heavy-tailed. See his paper here.

Yin Li used monkeys to ask how an animal’s learning rate changes depending on the situation. There is no one optimal learning rate: it depends on the situation. If you are in an environment where you a tracking a target with little noise until sudden dramatic changes (small variance in between sudden changes in mean), then you want a high learning rate; you are not at risk of being overly responsive to the internal variability of the signal while it is stationary On the other hand, if there is a very noisy signal whose mean does not change much, then you want a low learning rate. When a monkey is given a task like this, it does about as well as a Bayesian-optimal model. I’m not sure which one he used, though I think this is a problem that has gotten attention in vision (see Wark et al and DeWeese & Zador). Anyway, when they try to fit a bog-standard Reinforcement Learning model it cannot fit the data. This riled up the CS people in the audience who suggested that something called “adaptive learning RL” could have fit the data, a technique I am not aware of? Although Li’s point was that the basic RL algorithm is insufficient to explain behavior, it also highlights the lack of crosstalk between the two RL kingdoms.

Michael Littman gave an absolutely fantastic talk asking how multiple agents should coordinate their behavior. If you use RL, one possibility is just to treat other agents as randomly moving objects…but “that’s a bit autistic”, as Littman put it. Instead, you can do something like minimax or maximin. Then you just need to find the Nash equilibrium! Unfortunately this doesn’t always converge to the correct answer, there can be multiple equilibria, and it requires access to the other agent’s value. Littman suggested that side payments can solve a lot of these problems (I think someone was paging Coase).

Finally, the amazing Alex Kacelnik gave a fascinating talk about parasitism in birds, particularly cuckoos. It turns out that when you take into account costs of eggs and such, it might actually be beneficial to the host to raise 1-2 parasite eggs; at least, it’s not straight forward that killing the parasites is the optimal decision. Anne Churchland asked whether neurons in the posterior parietal cortex of rats show mixed sensory and decision signals, and then showed that they are orthogonal on the level of the population. Paul Phillips gave a very lucid talk detailing the history of dopamine and TD learning. Tom Dietterich showed how reinforcement learning is being used by the government to make decisions for fire and invasive-species control. And Pieter Abbeel showed robots! See, for instance, the PR2 Willow Garage fetching beer (other videos):

Here are some other robots he mentioned.

Some final thoughts:

1. CS people are interested in convergence proofs, etc. But in the end, a lot of their talks were really just them acting as engineers trying to get things to work in the real world. That’s not that far from what psychologists and neuroscientists are doing: trying to figure out why things are working the way that they are.

2. In that spirit, someone in psych/neuro needs to take the leading-edge of what CS people are doing and apply it to human/animal models of decision-making. I’ve never heard of Adaptive Reinforcement Learning; what else is there?

3. At the outset, it would help if they could make it clear what are the open research questions for each field. At the end, maybe there could be some discussion on how to get the fields to collaborate more.

4. Invite some economists! They have this whole thing called Decision Theory… and would have a lot to contribute.

 

Risk aversion

[This post is a stub that will be expanded as time goes on and I learn more, or figure out how to present the question better.]

Humans, and many animals, tend to like predictability. When things get crazy, chaotic, unpredictable – we tend to avoid those things. This is called risk aversion: preferring safe, predictable outcomes to unpredictables ones.

Take the choice between a guaranteed $1,000,000 or a 10% chance of $10,000,000 with a 90% chance of nothing at all. How many people would choose the riskier option? Very few, it turns out. We aren’t always risk-averse. When animals search for food, they tend to prefer safer areas to riskier ones until they start getting exceptionally peckish. Once starving, animals are often risk-seeking, and are willing to go to great lengths for the chance to get food.

Why are we risk-averse? There are a few reasons. First off, unpredictability means that the information we have about our environment is not as useful, and possibly downright wrong. On the other hand, it may just come from experience. Imagine that you are given the choice between two boxes, each of which will give a reward when opened, and rewards are reset when closed. One of these boxes will give you lots of rewards sometimes, and no rewards the rest of the time, while the other box will always give you a little reward. Over the long run the two boxes will give you the same amount of reward but when you start opening them up? You are likely to have a dry run from the risky box. Whenever you get no reward from a box, you feel more inclined to open the safer box. This gives you a nice little reward! So now you like this box a little better. Maybe you think it’s a good idea to peak in the risky box now? Ah, foiled again, that box sucks, better stick with the safe box that you know.

This is the basic logic behind the Reinforcement Learning model of risk-aversion as characterized by Yael Niv in 2002 (does anyone know an older reference?).

See also: Ellsberg Paradox, Prospect Theory

Neuroscience is useful: NBA edition

Antoine Walker

Although I wasn’t able to attend it, Yonatan Loewenstein apparently gave a talk at a Cosyne workshop about decision-making and related it to NBA players.  I was curious to find the paper and while ultimately I could not, I did find that he had a different one that was interesting.  One of the most commonly used methods in neuroscience to model learning is reinforcement learning.  In reinforcement learning, you learn from the consequences of your actions; intuitively, a reward will act to reinforce a behavior.  Although inspired by psychological theories of learning, it has gained support in neuroscience from the patterns of activity of dopamine cells which provide exactly the learning error signal you’d expect.

Basketball is a dynamic game where players are constantly evaluating their chance of a shot, and whether they should pass it and make a 2 or 3 point field goal attempt (FGA).  One of the most contentious issues in basketball (statistics) is the ‘hot hand effect’: if you’ve successfully made a 3 point shot, are you more likely to make the next one?  Maybe it’s just one of those nights where you are on, your form is perfect and every shot will sink.  Problem is, statistically speaking there is no evidence for it.

chanceofmade3ptBut the players sure think that it exists!  Now look at the figure to the right.  Here, the blue line represents how likely a player is to shoot a 3 point field goal if their last (0, 1, 2, 3) shots were made 3 point field goals.  In general, they shoot 3 pointers about ~40% of the time.  If they made their last 3 pointer, they now have a ~50% of shooting a 3 pointer on their next attempt.  And if they make that one?  They have a 55% chance of shooting a 3 pointer.  Similarly, the red line follows the probability of shooting a 3 pointer if you last few shots were missed 3 pointers.

Okay, so basketball players believe in the hot hand, and act like it.  Why do they act like it?  If we take our model of the learning process, Reinforcement Learning, and apply it to the data, we actually get a great prediction of how likely a player is to shoot a 3 pointer!  Our internal machinery that we use for learning the value of an action is also a good model for learning the value of taking a 3 pointer – and shooting a 3 pointer will only reinforce the idea that the next shot for a 3 pointer (get it?)!

Alas, this type of behavior does not help anything; a player who makes a 3 pointer is 6% less likely to make his next his 3 than if he had missed his last 3 pointer.  In fact, if you take our Reinforcement Learning model and see how each player behaves, we can estimate how susceptible that player is to learning.  Some players won’t change how they shoot (unsusceptible) and some players will learn a lot from each shot, with the history of made and missed shots having huge effects on how likely they are to shoot another 3.  And believe it or not, the players that are least susceptible to learning are the ones who get the most points out of each 3 point shot.  Unless you are Antoine Walker, then you will just shoot a lot of bad 3 pointers for the hell of it.

Finding non-existent ‘hidden patterns’ in noise is a natural human phenomenon and is a natural outgrowth of learning from past experiences.  So tell your parents!  Learning: not always good for you.

References

Neiman, T., & Loewenstein, Y. (2011). Reinforcement learning in professional basketball players Nature Communications, 2 DOI: 10.1038/ncomms1580