One of the most popular games in the study of cooperation is the iterated prisoner’s dilemma. It is a game that lets players cooperate or defect, with the most beneficial strategy overall being both cooperating, but the best for a single player is to defect while the other player cooperates. The most famously successful strategy is tit-for-tat: cooperate if your partner cooperated last turn, and defect otherwise. Two tit-for-tat players will converge onto harmonious cooperation and maximize their reward, while a single tit-for-tat player will avoid being conned into cooperating with a persistent defector.
William H. Press and colleague Freeman Dyson (!) have found a new solution to the iterated prisoner’s dilemma. Their paper has been covered in detail well elsewhere and has some very good commentary by the authors, so I won’t spend a ton of time explaining it. Basically, if you know this strategy you should be able to find a strategy where you can set your average score arbitrarily; similarly you can arbitrarily set your opponents’ score. By combining these two strategies, you get what they call the ‘extortionate’ strategy, where you try to extort as much as possible from your opponent.
This set of strategies has two parameters; one of these parameters (““) measures how much you want to extort from your opponent. The other (““) is a bit unclear, but I think we might be able to (kind of) give one intuition a bit later on. An interesting point to note is that the tit-for-tat strategy is one case of this class of extortionate strategies; when the parameter is set to its minimum, indicating fairness instead of total extortion, and the mysterious parameter is set to its maximum, you get the tit-for-tat strategy.
I was curious, what happens when a bunch of extortionate strategies get together and duke it out? What’s the best way to extort an extortionist?
Let’s look at the math of the strategy; feel free to skip this next paragraph if you wish. Here, for example is the probability that a player should cooperate if on the previous turn they cooperated (“c”) and their opponent defected (“d”).
The traditional payoff is (3,0,5,1) so we can simplify to:
And if we want fairness set to 1 (we’ll come back to this later).
has allowed values between 0 and 1/5.
Okay, back from the math! We can figure out what the best strategies are by using a genetic algorithm. Basically, every generation, extorters compete against each other many times and the top 20% are selected to breed the next generation. But it’s not always a perfect copy; there is a 3% mutation rate to allow novel strategies to be introduced. Let’s see what the average reward across time is (and, err, divide by 1000):
Aha! Something happened there at generation ~350! If you run the genetic algorithm for longer, it stays at this value. It seems pretty clear that there’s one strategy that’s superior to the others. And in fact, it’s the fairness strategy: dominates all the others in this model! In other words, even if you are trying to extort as much as possible from other players, the best extortionate strategy against other extorters is to be perfectly fair!
But we don’t get back pure tit-for-tat; there’s that messy parameter to worry about. At the end of 20000 generations, let’s see what the distribution of this value is between its minimum (=0) and maximum (=1, tit-for-tat):
It’s all over the place! If you rerun the simulation again and again,you get a different distribution of these values, but they always seem to be >.3 and never settle at 1 (tit-for-tat)! We can measure how diverse the distribution is by the entropy of possible states:
What this is basically showing is that after its initial random set of values, the distribution of strategies oscillates up and down around ~2-3 bits, or something like 4-8 strategies of relative importance. Sometimes one will start to be more successful against others, sending diversity slowly down, until other strategies evolve against it sending diversity back up. But fairness always wins.
(As an interesting side-note: the paper provides a formula for estimating expected reward when two strategies compete; when you two strategies with , I get a singularity (“infinity”?)…am I doing something wrong or what’s going on…?)
So what is this mystery parameter? If you have pure tit-for-tat, you can get into alternating defect-cooperate cycles, something less beneficial than everyone cooperating all the time. By adding this new parameter, maybe you can push each other into that beneficial cycle of cooperation. That would say that represents a search or exploration parameter. My intuition for this strategy is that it has two parameters; one represents fairness and the other represents sociality. Although fairness is best, exploring your options and understanding your opponent is also critical…to being an extortionist.
Update: See this post which is much more informative than mine! It explains all…
Press WH, & Dyson FJ (2012). Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences of the United States of America, 109 (26), 10409-13 PMID: 22615375