I’ve been working on an objective ranking system lately that could be applied to groups with large numbers of individual competitors, like the sport of mixed martial arts (MMA). The biggest issue compared to typical ranking systems is that there are so many participants that they cannot all compete against each other in a round-robin tournament or similar within a reasonable time frame. In order to calculate a global ranking, all of the players must be compared against each other through a sort of “six degrees of separation” style comparison, which is vulnerable to bias and calculation error.

This problem has already been solved in the chess world with the Elo rating system, a statistical approach that requires frequent competition in order to generate statistically significant results. Unfortunately competitors in sports like mixed martial arts or boxing do not compete nearly as frequently as chess players (for obvious reasons) and this approach drowns in a sea of statistical noise. Typically combat sport rankings are done by a knowledgeable observer by hand, through consensus of many observers, or by models with a large number of tunable parameters. It is very interesting to consider that humans appear to be able to easily determine who should be ranked highly, and that many algorithmic approaches largely match these evaluations but make some seemingly obvious mistakes. My goal was to find an approach that produced rankings that seemed sensible to a human observer with a minimum of tunable parameters (preferably none).

Data Structure

The initial step is to structure our data in a sensible way. We have a large number of participants, connected by individual competition which can either result in a win or a loss. One way of structuring this data would be in a directed graph, where competitors are represented by nodes and matches as edges with direction defined by who wins or loses. We seem to be focused on losses (or win/loss ratio) as the biggest factor – a competitor with 40 wins and zero losses is typically regarded as better than a competitor with 60 wins and 20 losses. Let’s set the direction of the edge from the losing competitor to the winning competitor. A “good” competitor’s node will therefore have many incoming edges and few outgoing edges, and tend to be at the center of force-directed graph layouts.

Evaluation Algorithms

There are many possible evaluation algorithms which will produce a ranking from this data structure. After many trials, two appeared to stand above the rest.

1. The first is recommended in the journal article Ranking from unbalanced paired-comparison data by H.A. David published in Volume 74, Issue 2 of Biometrika in 1987.
2. David also discussed the Kendall-Wei algorithm in his paper, of which Google’s PageRank algorithm is a special case. PageRank is used to rank webpages which are represented as a directed graph based on the concept of network flow, and may also be applied to other directed graphs including our case. The PageRank algorithm contains one tunable parameter, a damping factor which is currently set to the default 0.85.

It was found that both algorithms seemed to emphasize different aspects important to MMA ranking. David’s “Unbalanced Pair Comparison” emphasized a grittier statistics-based approach, highlighting fighters such as Anderson Silva, Rashad Evans, and Jon Fitch. Google’s PageRank seemed to take a more social approach emphasizing fighters with a wide range of quality opponents, like Georges St-Pierre, Matt Hughes, and Forest Griffin. It was very interesting how one algorithm appeared to highlight the “hardcore mma fan” perspective, while the other seemed to be pulled straight from the UFC head office.

It was decided that both would be calculated, scores normalized, and used in combination to generate a consensus ranking similar to consensus rankings generated from human experts. This was inspired by IBM’s Watson which uses a consensus of multiple algorithms to evaluate answers to trivia questions. Two possible improvements are hypothesized but undertested:

1. Perhaps additional independent ranking algorithms incorporated in this consensus would improve accuracy. The big issue appears to be “independent” algorithms which do not simply restate the work of other algorithms, and of those, finding algorithms which display ranking behavior useful for our application.
2. Unlike Watson, confidence levels are not used. This would be a useful addition given situations like extreme upsets. A newer beta version of this ranking system determines if highly ranked fighters coincide with centrality metrics in an attempt to implement this, but is not complete at this time of this post.

Results

The ranking system was run on every UFC event from UFC 1 (November 12, 1993) to Fight for the Troops 2 (January 22, 2011). Both algorithms are shown ranked alone for comparison, and their scores were equally weighted to produce the final results.

 Lightweight (155lbs) Overall Rank PageRank Unbalanced Pair 1. Gray Maynard 1. B.J. Penn 1. Gray Maynard 2. B.J. Penn 2. Gray Maynard 2. George Sotiropolous 3. Frankie Edgar 3. Frankie Edgar 3. Frankie Edgar 4. George Sotiropolous 4. Kenny Florian 4. Jim Miller 5. Jim Miller 5. Joe Lauzon 5. Nik Lentz

First up are the lightweights – and the results aren’t too shabby. No one seems to want to admit it due to his sometimes snooze-inducing style, but Gray Maynard is a beast who is likely to cause B.J. Penn significant issues if they ever fought. Frankie Edgar deserves to be right up there but not number one, and chronically underrated George Sotiropolous and Jim Miller round out the pack.

 Welterweight (170lbs) Overall Rank PageRank Unbalanced Pair 1. Georges St-Pierre 1. Georges St-Pierre 1. Matt Hughes 2. Matt Hughes 2. Matt Hughes 2. Josh Koscheck 3. Josh Koscheck 3. Matt Serra 3. Georges St-Pierre 4. Martin Kampmann 4. Dennis Hallman 4. Martin Kampmann 5. Dennis Hallman 5. Martin Kampmann 5. Rick Story

Georges St-Pierre is the obvious frontrunner at 170. Matt Hughes at number two is a bit more debatable, but a long title reign and consistent quality opposition provide a reasonable rationale. Josh Koscheck is perpetually always the bridesmaid, never the bride at third, and Martin Kampmann and Dennis Hallman round out a somewhat thin division.

Middleweight (185lbs)
Overall Rank PageRank Unbalanced Pair
1. Anderson Silva 1. Anderson Silva 1. Anderson Silva
2. Jon Fitch 2. Jon Fitch 2. Jon Fitch
3. Yushin Okami 3. Vitor Belfort 3. Yushin Okami
4. Michael Bisping 4. Nate Marquardt 4. Michael Bisping
5. Nate Marquardt 5. Yushin Okami 5. Demian Maia

Anderson Silva provides another easy choice for number one at 185lbs. Both Jon Fitch and Yushin Okami deserve their spots with a consistent if slightly dull record. Michael Bisping has slowly been grinding his way up the charts, and Nate Marquardt rounds out the top five.

Light Heavyweight (205lbs)
Overall Rank PageRank Unbalanced Pair
2. Lyoto Machida 2. Lyoto Machida 2. Jon Jones
4. Quinton Jackson 4. Quinton Jackson 4. Lyoto Machida
5. Mauricio Rua 5. Mauricio Rua 5. Thiago Silva

Rashad Evans appears to have made a sensible call waiting for his title shot at UFC 128. The hypercompetitive light heavyweight division is always a tough one to call. A split in the consensus between the two algorithms produces a top five that seems to emphasize number of fights in the Octagon, with champion Mauricio “Shogun” Rua a surprising fifth. Too early to call Evans over Rua? Only time will tell.

Heavyweight (265lbs)
Overall Rank PageRank Unbalanced Pair
1. Frank Mir 1. Frank Mir 1. Frank Mir
2. Cain Velasquez 2. Brock Lesnar 2. Junior Dos Santos
3. Junior Dos Santos 3. Cain Velasquez 3. Cain Velasquez
4. Brock Lesnar 4. Antonio Rodrigo Nogueira 4. Cheick Kongo
5. Shane Carwin 5. Shane Carwin 5. Brendan Schaub

I initially disagreed with Frank Mir as number one here – Cain Velasquez seems to be the obvious choice. But the ranking process seems to trust number of fights over new hype, and the rest of the top five is bang on what I would choose. You can’t win them all – or perhaps I’m just being unfair to Frank Mir.

Conclusions

The approach produced excellent rankings from UFC-only data, largely coinciding with established and more complete authorities like FightMatrix. The approach used two ranking algorithms which traversed a directed graph, and produced scores which were normalized and added to produce a final score which was sorted to produce final rankings. One tunable parameter (PageRank damping factor) exists in the model, but was left at the default value of 0.85. Further work will focus on additional ranking algorithms which may be incorporated into the consensus, parametric analysis of the PageRank damping factor, and determining confidence scores.