
We live in a world with other people. Almost every decision we make involves someone else in one way or another, and we face a constant choice regarding just how much we’re going to trust the person on the other side of this decision. Should we take advantage of them, go for the quick score and hope we never see them again – or should we settle for a more reasonable reward, co-operating in the hope that this peaceful relationship will continue long into the future?
We see decisions of this type everywhere, but what is less obvious is the best strategy for us to use to determine how we should act. The Golden Rule states that one should “do unto others as you would have them do unto you”. While it seems rather naive at first glance, if we run the numbers, we find something quite amazing.
A Dilemma
In order to study these types of decisions, we have to define what exactly we’re talking about. Let’s define just what a “dilemma” is. Let’s say it has two people – and they can individually decide to work together for a shared reward, or screw the other one over and take it all for themselves. If you both decide to work together, you both get a medium-sized reward. If you decide to take advantage of someone but they trust you, you’ll get a big reward (and the other person gets nothing). If you’re both jerks and decide to try to take advantage of each other, you both get a tiny fraction of what you could have. Let’s call these two people Alice and Bob – here’s a table to make things a bit more clear.
|
Alice cooperates |
Alice defects |
Bob cooperates |
Everyone wins! A medium-sized reward to both for mutual co-operation |
Poor Bob. He decided to trust Alice, who screwed him and got a big reward. Bob gets nothing. |
Bob defects |
Poor Alice. She decided to trust Bob, who took advantage of her and got a big reward. Alice gets nothing. |
No honour among thieves… both Bob and Alice take the low road, and fight over the scraps of a small reward. |
|
|
|
This specific order of rewards is referred to as the Prisoner’s Dilemma, and was formalized and studied by Melvin Dresher and Merrill Flood in 1950 while working for the RAND Corporation.
Sale, One Day Only!
Now of course the question is – if you’re in this situation, what is the best thing to do? First suppose that we’re never, ever going to see this other person again. This is a one time deal. Absent any moral consideration, your best option for the most profit is to attempt to take advantage of the other person and hope that they are clueless enough to let you, capitalism at its finest. You could attempt to cooperate, but that leaves you open to the other party screwing you. If each person acts in their own interest and is rational, they will attempt to one-up the other.

But there’s just one problem – if both people act in this way, they both get much less than they would if they simply cooperated. This seems very strange, as the economic models banks and other institutions use to model human behavior assume this type of logic – the model of the rational consumer. But this leads to nearly the worst possible option if both parties take this approach.
It seems that there is no clear ideal strategy for a one time deal. Each choice leaves you open to possible losses in different ways. At this point it’s easy to toss up your hands, leave logic behind, and take a moral stance. You’ll cooperate because you’re a good person – or you’ll take advantage of the suckers because life just isn’t fair.
And this appears to leave us where we are today – some good people, some bad people, and the mythical invisible hand of the market to sort them all out. But there’s just one little issue. We live in a world with reputations, with friends, and with foes – there are no true “one time” deals. The world is small, and people remember.
In it for the Long Run
So instead of thinking of a single dilemma, let’s think about what we should do if we get to play this game more than once. If someone screws you in the first round, you’ll remember – and probably won’t cooperate the next time. If you find someone who always cooperates, you can join them and work together for your mutual benefit – or decide that they’re an easy mark and take them for everything they’ve got.

But what is the best strategy? In an attempt to figure this out, in 1980 Robert Axelrod decided to have a contest. He sent the word out, and game theorists, scientists, and mathematicians all submitted entries for a battle royale to determine which strategy was the best.
Each entry was a computer program designed with a specific strategy for playing this dilemma multiple times against other clever entries. The programs would play this simple dilemma, deciding whether to cooperate or defect against each other, for 200 rounds. Five points for a successful deception (you defect, they cooperate), three points each for mutual cooperation, one point each if you both tried to screw each other (mutual defection), and no points if you were taken advantage of (you cooperate, they defect). Each program would play every other program as well as a copy of itself, and the program with the largest total score over all the rounds would win.
So what would some very simple programs be?
ALL-C (always cooperate) is just like it sounds. Cooperation is the only way, and this program never gets tired of being an upstanding guy.
ALL-D (always defect) is the counterpoint to this, and has one singular goal. No matter what happens, always, always, always try to screw the other person over.
RAND is the lucky dunce – don’t worry too much, just decide to cooperate or defect at random.
You can predict how these strategies might do if they played against each other. Two ALL-C strategies would endlessly cooperate in a wonderful dance of mutual benefit. Two ALL-D strategies would continually fight, endlessly grinding against each other and gaining little. ALL-C pitted against ALL-D would fare about as well as a fluffy bunny in a den of wolves – eternally cooperating and hoping for reciprocation, but always getting the shaft with ALL-D profiting.
So an environment of ALL-C would be a cooperative utopia – unless a single ALL-D strategy came in, and started bleeding them dry. But an environment entirely made of ALL-D would be a wasteland – no one would have any success due to constant fighting. And the RAND strategy is literally no better than a coin flip.
Time to Think
So what should we do? Those simple strategies don’t seem to be very good at all. If we think about it however, there’s a reason they do so poorly – they don’t remember. No matter what the other side does, they’ve already made up their minds. Intelligent strategies remember previous actions of their opponents, and act accordingly. The majority of programs submitted to Axelrod’s competition incorporated some sort of memory. For instance, if you can figure out you’re playing against ALL-C, it’s time to defect. Just like in the real world, these programs tried to figure out some concept of “reputation” that would allow them to act in the most productive manner.
And so Axelrod’s competition was on. Programs from all over the world competed against each other, each trying to maximize their personal benefit. A wide variety of strategies were implemented from some of the top minds in this new field. Disk drives chattered, monitors flickered, and eventually a champion was crowned.
And the Winner Is…

When the dust settled, the winner was clear – and the victory was both surprising and inspiring. The eventual champion seemed to be a 90 lb weakling at first glance, a mere four lines of code submitted by Anatol Rapoport, a mathematical psychologist from the University of Toronto. It was called “Tit-for-Tat”, and it did exactly that. It started every game by cooperating – and then doing exactly what the other player did in their last turn. It cooperated with the “nice” strategies, butted heads with the “mean” strategies, and managed to come out on top ahead of far more complex approaches.
The simplest and shortest strategy won, a program that precisely enforced the Golden Rule. But what precisely made Tit-for-Tat so successful? Axelrod analyzed the results of the tournament and came up with a few principles of success.
- Don’t get greedy. Tit-for-Tat can never beat another strategy. But it never allows itself to take a beating, ensuring it skips the brutal losses of two “evil” strategies fighting against each other. It actively seeks out win-win situations instead of gambling for the higher payoff.
- Be nice. The single best predictor of whether a strategy would do well was if they were never the first to defect. Some tried to emulate Tit-for-Tat but with a twist – throwing in the occasional defection to up the score. It didn’t work.
- Reciprocate, and forgive. Other programs tended to cooperate with Tit-for-Tat since it consistently rewarded cooperation and punished defection. And Tit-for-Tat easily forgives – no matter how many defections it has seen, if a program decides to cooperate, it will join them and reap the rewards.
- Don’t get too clever. Tit-for-Tat is perfectly transparent, and it becomes obvious that it is very, very difficult to beat. There are no secrets, and no hypocrisy – Tit-for-Tat gets along very well with itself, unlike strategies biased toward deception.
The contest attracted so much attention that a second one was organized, and this time every single entry was aware of the strategy and success of Tit-for-Tat. Sixty-three new entries arrived, all gunning for the top spot. And once again, Tit-for-Tat rose to the top. Axelrod used the results of these tournaments to develop ideas about how cooperative behaviour could evolve naturally, and eventually wrote a bestselling book called The Evolution of Cooperation.
But his biggest accomplishment may be showing us that being nice does pay off – and giving us the numbers to prove it.