Summer Cleaning

It’s been a while since I’ve posted on here – currently in the process of cleaning up the site and updating the look and feel. I’ve made the decision not to delete any old content despite how awkward it may appear now. It’s a bit crazy to realize that this site has been online, in one form or the other, for over nine years.

Please let me know if the changes break any functionality. In the meantime, here are some Lorenz attractor renders I completed when I got my new computer. Plenty more points than the old renders… gotta use that horsepower for something.

FIGURE-3

The Theis Equation and Flow

Mathematics is remarkably effective in describing the physical world in part due to isomorphisms, relationships between concepts that reveal a similar underlying structure. In 1935 Charles Vernon Theis was working on groundwater flow, a subject with little mathematical treatment at the time. He thought that perhaps a well tapping a confined aquifer could be described using the same mathematics as the heat flow of a thin wire drawing heat from a large plate, as this work was better established. With a little bit of help from C. I. Lubin and considering how parameters describing underground water flow could be compared to those describing heat flow in solid materials, he developed the Theis equation which is used to this day to model the response of a confined aquifer to pumping over time.

I developed a small program which allows visualization of the potentiometric surface of a confined aquifer subject to pumping using Processing. This particular example uses aquifer and pumping parameters from a Geo-Slope whitepaper.

The source code may be downloaded here. All values including aquifer, pumping, visualization, and numerical parameters may be varied to apply to a wide variety of situations. The exponential integral (or “well function”) is calculated using a numerical approximation accurate to at least 1 part in 10,000,000 .

lorenz-attractor-1600-04

Convergence in the Lorenz Attractor

Most visualizations of the Lorenz attractor are of a long history of a single point after convergence to the attractor has occurred. I was interested in what the surrounding space looked like, so I randomly selected 20,000 starting points from a three dimensional Gaussian distribution with a standard deviation of 100. Each point was iterated, and a short history displayed as a trail.

Interestingly enough the points do not simply fall in from arbitrary directions like a gravity field, but display structure by instead swirling along a clear path up the z axis.

mma-directed-graph

Graph Theory, Algorithmic Consensus, and MMA Ranking

I’ve been working on an objective ranking system lately that could be applied to groups with large numbers of individual competitors, like the sport of mixed martial arts (MMA). The biggest issue compared to typical ranking systems is that there are so many participants that they cannot all compete against each other in a round-robin tournament or similar within a reasonable time frame. In order to calculate a global ranking, all of the players must be compared against each other through a sort of “six degrees of separation” style comparison, which is vulnerable to bias and calculation error.

This problem has already been solved in the chess world with the Elo rating system, a statistical approach that requires frequent competition in order to generate statistically significant results. Unfortunately competitors in sports like mixed martial arts or boxing do not compete nearly as frequently as chess players (for obvious reasons) and this approach drowns in a sea of statistical noise. Typically combat sport rankings are done by a knowledgeable observer by hand, through consensus of many observers, or by models with a large number of tunable parameters. It is very interesting to consider that humans appear to be able to easily determine who should be ranked highly, and that many algorithmic approaches largely match these evaluations but make some seemingly obvious mistakes. My goal was to find an approach that produced rankings that seemed sensible to a human observer with a minimum of tunable parameters (preferably none).

Data Structure

The initial step is to structure our data in a sensible way. We have a large number of participants, connected by individual competition which can either result in a win or a loss. One way of structuring this data would be in a directed graph, where competitors are represented by nodes and matches as edges with direction defined by who wins or loses. We seem to be focused on losses (or win/loss ratio) as the biggest factor – a competitor with 40 wins and zero losses is typically regarded as better than a competitor with 60 wins and 20 losses. Let’s set the direction of the edge from the losing competitor to the winning competitor. A “good” competitor’s node will therefore have many incoming edges and few outgoing edges, and tend to be at the center of force-directed graph layouts.

Evaluation Algorithms

There are many possible evaluation algorithms which will produce a ranking from this data structure. After many trials, two appeared to stand above the rest.

  1. The first is recommended in the journal article Ranking from unbalanced paired-comparison data by H.A. David published in Volume 74, Issue 2 of Biometrika in 1987.
  2. David also discussed the Kendall-Wei algorithm in his paper, of which Google’s PageRank algorithm is a special case. PageRank is used to rank webpages which are represented as a directed graph based on the concept of network flow, and may also be applied to other directed graphs including our case. The PageRank algorithm contains one tunable parameter, a damping factor which is currently set to the default 0.85.

It was found that both algorithms seemed to emphasize different aspects important to MMA ranking. David’s “Unbalanced Pair Comparison” emphasized a grittier statistics-based approach, highlighting fighters such as Anderson Silva, Rashad Evans, and Jon Fitch. Google’s PageRank seemed to take a more social approach emphasizing fighters with a wide range of quality opponents, like Georges St-Pierre, Matt Hughes, and Forest Griffin. It was very interesting how one algorithm appeared to highlight the “hardcore mma fan” perspective, while the other seemed to be pulled straight from the UFC head office.

It was decided that both would be calculated, scores normalized, and used in combination to generate a consensus ranking similar to consensus rankings generated from human experts. This was inspired by IBM’s Watson which uses a consensus of multiple algorithms to evaluate answers to trivia questions. Two possible improvements are hypothesized but undertested:

  1. Perhaps additional independent ranking algorithms incorporated in this consensus would improve accuracy. The big issue appears to be “independent” algorithms which do not simply restate the work of other algorithms, and of those, finding algorithms which display ranking behavior useful for our application.
  2. Unlike Watson, confidence levels are not used. This would be a useful addition given situations like extreme upsets. A newer beta version of this ranking system determines if highly ranked fighters coincide with centrality metrics in an attempt to implement this, but is not complete at this time of this post.

Results

The ranking system was run on every UFC event from UFC 1 (November 12, 1993) to Fight for the Troops 2 (January 22, 2011). Both algorithms are shown ranked alone for comparison, and their scores were equally weighted to produce the final results.

Lightweight (155lbs)
Overall Rank PageRank Unbalanced Pair
1. Gray Maynard 1. B.J. Penn 1. Gray Maynard
2. B.J. Penn 2. Gray Maynard 2. George Sotiropolous
3. Frankie Edgar 3. Frankie Edgar 3. Frankie Edgar
4. George Sotiropolous 4. Kenny Florian 4. Jim Miller
5. Jim Miller 5. Joe Lauzon 5. Nik Lentz

First up are the lightweights – and the results aren’t too shabby. No one seems to want to admit it due to his sometimes snooze-inducing style, but Gray Maynard is a beast who is likely to cause B.J. Penn significant issues if they ever fought. Frankie Edgar deserves to be right up there but not number one, and chronically underrated George Sotiropolous and Jim Miller round out the pack.

Welterweight (170lbs)
Overall Rank PageRank Unbalanced Pair
1. Georges St-Pierre 1. Georges St-Pierre 1. Matt Hughes
2. Matt Hughes 2. Matt Hughes 2. Josh Koscheck
3. Josh Koscheck 3. Matt Serra 3. Georges St-Pierre
4. Martin Kampmann 4. Dennis Hallman 4. Martin Kampmann
5. Dennis Hallman 5. Martin Kampmann 5. Rick Story

Georges St-Pierre is the obvious frontrunner at 170. Matt Hughes at number two is a bit more debatable, but a long title reign and consistent quality opposition provide a reasonable rationale. Josh Koscheck is perpetually always the bridesmaid, never the bride at third, and Martin Kampmann and Dennis Hallman round out a somewhat thin division.

Middleweight (185lbs)
Overall Rank PageRank Unbalanced Pair
1. Anderson Silva 1. Anderson Silva 1. Anderson Silva
2. Jon Fitch 2. Jon Fitch 2. Jon Fitch
3. Yushin Okami 3. Vitor Belfort 3. Yushin Okami
4. Michael Bisping 4. Nate Marquardt 4. Michael Bisping
5. Nate Marquardt 5. Yushin Okami 5. Demian Maia

Anderson Silva provides another easy choice for number one at 185lbs. Both Jon Fitch and Yushin Okami deserve their spots with a consistent if slightly dull record. Michael Bisping has slowly been grinding his way up the charts, and Nate Marquardt rounds out the top five.

Light Heavyweight (205lbs)
Overall Rank PageRank Unbalanced Pair
1. Rashad Evans 1. Forrest Griffin 1. Rashad Evans
2. Lyoto Machida 2. Lyoto Machida 2. Jon Jones
3. Forrest Griffin 3. Rashad Evans 3. Ryan Bader
4. Quinton Jackson 4. Quinton Jackson 4. Lyoto Machida
5. Mauricio Rua 5. Mauricio Rua 5. Thiago Silva

Rashad Evans appears to have made a sensible call waiting for his title shot at UFC 128. The hypercompetitive light heavyweight division is always a tough one to call. A split in the consensus between the two algorithms produces a top five that seems to emphasize number of fights in the Octagon, with champion Mauricio “Shogun” Rua a surprising fifth. Too early to call Evans over Rua? Only time will tell.

Heavyweight (265lbs)
Overall Rank PageRank Unbalanced Pair
1. Frank Mir 1. Frank Mir 1. Frank Mir
2. Cain Velasquez 2. Brock Lesnar 2. Junior Dos Santos
3. Junior Dos Santos 3. Cain Velasquez 3. Cain Velasquez
4. Brock Lesnar 4. Antonio Rodrigo Nogueira 4. Cheick Kongo
5. Shane Carwin 5. Shane Carwin 5. Brendan Schaub

I initially disagreed with Frank Mir as number one here – Cain Velasquez seems to be the obvious choice. But the ranking process seems to trust number of fights over new hype, and the rest of the top five is bang on what I would choose. You can’t win them all – or perhaps I’m just being unfair to Frank Mir.

Conclusions

The approach produced excellent rankings from UFC-only data, largely coinciding with established and more complete authorities like FightMatrix. The approach used two ranking algorithms which traversed a directed graph, and produced scores which were normalized and added to produce a final score which was sorted to produce final rankings. One tunable parameter (PageRank damping factor) exists in the model, but was left at the default value of 0.85. Further work will focus on additional ranking algorithms which may be incorporated into the consensus, parametric analysis of the PageRank damping factor, and determining confidence scores.

Acknowledgements

The NetworkX package for Python was critical for this project. I came across Mixed Martial Arts vs Graph Theory shortly after starting work on this. Many thanks to Jack Rusher for highlighting interesting algorithms and providing a respite from entering most of the data by hand.

mmagraph-ufc003

Directed Graphs and MMA

There is underlying mathematical structure everywhere – it’s just a matter of finding the best way to unfold it from the data. I’ve been working on a project to objectively rank mixed martial arts fighters. It’s not anywhere near done yet, but I’ve collected a fair bit of data. Here’s what nearly 2000 MMA fights in the UFC and Pride FC over the last 15 years look like expressed as a directed graph with a force directed layout.

You can see fights clumped into UFC (top) and Pride (bottom) organizations, with a subset of fighters acting as ambassadors between both.

butterflyblue

Lorenz and the Butterfly Effect

In 1962, Edward Lorenz was studying a simplified model of convection flow in the atmosphere. He imagined a closed chamber of air with a temperature difference between the bottom and the top, modeled using the Navier-Stokes equations of fluid flow. If the difference in temperature was slight, the heated air would slowly rise to the top in a predictable manner. But if the difference in temperature was higher, other situations would arise including convection and turbulence. Turbulence is notoriously difficult to model, so Lorenz focused on the formation of convection, somewhat more predictable circular flows.

Focusing only on convection allowed Lorenz to simplify the model sufficiently so that he could run calculations on the primitive (by today’s standards) computer available to him at MIT. And something wonderful happened – instead of settling down into a predictable pattern, Lorenz’s model oscillated within a certain range but never seemed to do the exact same thing twice. It was an excellent approach to the problem, exhibiting the same kind of semi-regular behavior we see in actual atmospheric dynamics. It made quite a stir at MIT at the time, with other faculty betting on just how this strange mathematical object would behave from iteration to iteration.

But Lorenz was about to discover something even more astounding about this model. While running a particularly interesting simulation, he was forced to stop his calculations prematurely as the computer he was using was shared by many people. He was prepared for this, and wrote down three values describing the current state of the system at a certain time. Later when the computer was available again, he entered the three values and let the model run once more. But something very strange happened. It initially appeared to follow the same path as before, but soon it started to diverge from the original slightly, and quickly these small differences became huge variances. What had happened to cause this?

Lorenz soon found that his error was a seemingly trivial one. He had rounded the numbers he wrote down to three decimal places, while the computer stored values to six decimal places. Simply rounding to the nearest thousandth had created a tiny error that propagated throughout the system, creating drastic differences in the final result. This is the origin of the term “butterfly effect”, where infinitesimal initial inputs can cause huge results. Lorenz had discovered the first chaotic dynamical system.

We can see this effect in action here, where a red and a blue path have initial values truncated in a similar manner. Instead of graphing X, Y, and Z values separately, we’ll graph them in 3D space.

Initial Values (Red)   Initial Values (Blue)  
X 5.766 5.766450
Y -2.456 -2.456460
Z 32.817 32.817251

We can see at first that the two paths closely track each other. The error is magnified as time marches on, and the paths slowly diverge until they are completely different – all as a result of truncating to three decimal places. This is a contrast to the sensitivity to error we are typically used to, where small errors are not magnified uncontrollably as calculation progresses. Imagine if using a measuring tape that was off by a tenth of a percent caused your new deck to to collapse rather than stand strong…

There is also a hint of a more complex underlying structure, a folded three dimensional surface that the paths move along. If a longer path is drawn, the semi-regular line graphs of before are revealed to be the projections of a structure of undeniable beauty.

This is a rendering of the long journey of a single point in Lorenz’s creation, a path consisting of three million iterations developed in Processing that shows the region of allowable values in the model more clearly. Older values are darker, and the visualization is animated to show the strange dance of the current values of the system as they sweep from arm to arm of this dual-eyed mathematical monster, order generating chaos.

river-crossing

River Crossing Problems and Discrete State Spaces

A brain teaser goes as follows: a farmer is returning from market, where he has bought a wolf, a goat, and a cabbage. On his way home he must cross a river by boat, but the boat is only large enough to carry the farmer and one additional item. The farmer realizes he must shuttle the items across one by one. He must be careful however, as the wolf cannot be left alone on the same side of the river as the goat (since the goat will be eaten), and the goat cannot be left alone on the same side of the river as the cabbage (since the goat will eat the cabbage).

How can the farmer arrange his trips to move the wolf, the goat, and the cabbage across the river while ensuring that no one eats the other while he’s away? The problem usually yields to an analysis by trial and error, where one of the possible solutions is found by iterating and checking the possible decisions of the farmer.

But is there a more rigorous approach that will allow us to easily find all possible solutions without having to resort to guessing and checking?

A Discrete State Space

Let’s create a three-dimensional vector that represents the state of the system. The components of this vector will be 0 or 1 depending on which side of the river the wolf, the goat, and the cabbage are. Let’s list off all possible combinations without worrying about whether something is going to get eaten – we’ll get to that later.

Vector Location
(0,0,0) Our initial state, with all three on the starting side of the river.
(1,0,0) The wolf has crossed the river, but not the goat or cabbage.
(0,1,0) The goat has crossed the river, but not the wolf or cabbage.
(0,0,1) The cabbage has crossed the river, but not the wolf or goat.
(1,1,0) The wolf and the goat have crossed the river, but not the cabbage.
(0,1,1) The goat and the cabbage have crossed the river, but not the wolf.
(1,0,1) The wolf and cabbage have crossed the river, but not the goat.
(1,1,1) Our desired state, where all three have crossed the river.

Now a giant list like this isn’t much help. We’ve listed off all possible states, but we can structure this in a more sensible manner which will make solving the problem much easier. Let’s allow the vectors to represent the corners of a cube, with the edges of the cube representing trips across the river. Movement along the x axis represents the wolf moving, movement along the y axis represents the goat moving, and movement along the z axis represents the cabbage moving.

Alright, that seems a bit better. What we need to do now is find an allowable path along the edges of the cube from (0,0,0) to (1,1,1), and this will represent the solution to our puzzle. First we need to remove the edges where something gets eaten.

Path Removed Rationale
(0,0,0) to (1,0,0) This moves the wolf across, leaving the goat with the cabbage.
(0,0,0) to (0,0,1) This moves the cabbage across, leaving the wolf with the goat.
(0,1,1) to (1,1,1) This moves the wolf across, leaving the goat with the cabbage.
(1,1,0) to (1,1,1) This moves the cabbage across, leaving the wolf with the goat.

Our allowable state space now looks like this.

The problem has simplified itself drastically. We want to go from (0,0,0) to (1,1,1) by travelling along the black (allowable) edges of this cube. We can see that if we ignore solutions where the farmer loops pointlessly, there are two allowable solutions to this puzzle.

  1. Move goat to other side.
  2. Move cabbage to other side.
  3. Move goat back.
  4. Move wolf to other side.
  5. Move goat to other side.
  1. Move goat to other side.
  2. Move wolf to other side.
  3. Move goat back.
  4. Move cabbage to other side.
  5. Move goat to other side.

This is a beautiful example of the simple structures that underly many of these classic riddles.

Our Modular Minds

I believe that ultimately human consciousness can be described by a program. Now this doesn’t mean we’re all in the Matrix, simply that our mind is a giant seething logical machine with values that are manipulated by rules. There is no strange new science in the sense of new specialties that must be discovered in order for the mind to be understood, but a progression in a new kind of science as Wolfram dubbed it – the study of how complexity arises.

A List of Rules

When I first heard that you could program a computer as a child I was amazed. A strange wonder that I could only spend a few shared minutes with at school, something that could draw, add, and write far faster than I could ever dream of – and I could tell it what to do? I wasn’t quite sure how to inform it to bend to my every wish, so I started with Turtle (actually called LOGO I later found) upon my teacher’s recommendation. I fed Turtle long lists of instructions – move forward, draw a line, turn left, repeated in all and any ways I could think of. He would draw glowing green shapes across my screen, and never tired.

The Need for Modularity

The only problem was that while the Turtle seemed infallible, I certainly could not say the same. I was making the classic beginner’s mistake – I would write one giant chunk of code that was supposed to cause my turtle to dance in precisely the way I wanted. Any little mistake would send it widely off course and I would end up with a mess that barely looked like the original design at the best of times. I later learned that using a programming concept called “modules” could help me isolate these errors and make code more efficient and reusable. Just like a company could have a manufacturing and engineering division which could communicate with standardized blueprints, a program could have different modules that would exchange data in a standardized manner. A modular program is more stable since mistakes are typically limited in influence to the module they’re contained in, and each module can be modified by separate influences with only the understanding that they are supposed to behave and communicate in a certain manner.

Damage as Evidence

So is our mind modular? Well, if it wasn’t, we could assume that a brain injury affecting a certain part of the brain would have a consistent and general impact across all of our consciousness. The only problem is that we generally only see a nonspecific mental decline like this from a nonspecific trauma, say impact blows to the head over a long period of time. Injuries in specific areas seem to be correlated with deficits in certain mental abilities – while leaving others totally intact.

A stroke can basically be thought of as an incident where blood flow is drastically affected in a certain specific area of the brain. This subregion of the brain is unable to function due to lack of blood flow, and very strange things can occur.

Howard Engel is a Canadian novelist who had a stroke. Upon waking one morning, he found that the morning paper seemed to be written in some strange script, an alphabet he could not understand. Everything else appeared normal, except his visual cortex had been damaged in a specific area which prevented him from visually parsing letters and words. As a writer, he despaired – it seems that his livelihood had been lost. Soon he realized a critical distinction which gave him hope – he may be unable to read visually, but could he write? Howard sat down and traced these strange looking symbols, his pen gliding over the bizarre shapes over and over. And eventually, the concepts came back to him. In a strange sense, he could now read again. Years and years of writing had associated certain movements of his hand with letters and concepts. Instead of words in his head put to paper by hand movements and a pen, he had to move concepts in the opposite direction – moving his hand over shapes written previously by others, the concepts echoed back up his motor cortex.

And it worked. There was irreparable damage to his visual cortex, a critical module malfunctioning. So he hacked his brain, redistributing resources from his motor cortex which had been trained to recognized these same symbols and concepts necessary for reading by his constant writing. Howard now traces the shapes he sees on the inside of his front teeth with his tongue. His speed has steadily increased, and he says he can now read about half of the subtitles in a foreign film before they flash off the screen.

It doesn’t seem too strange to suggest that there are different localized modules in the brain for motor control, visual interpretation, and other concepts easily identifiable with different aspects of the physical world. But are there modules with finer distinctions, working on different parts of our mental experience rather than different parts of the physical world?

The Wason Selection Task

The Wason selection task is a very interesting experiment in the field of psychological reasoning. Before I spoil it for you by talking too much, let’s just do it right now. Look at the following cards.

Assume the cards have a number on one side, and a color on the other. What cards need to be flipped over to make sure that all even numbers have red backs? Make sure you’ve picked a card or cards.

Got it? Now a bit of unsettling but ego-salvaging news. When this experiment is done with undergraduates, only 10 to 20 percent get it right. The correct answer is to flip over two cards: the number 4 to make sure it has a red back, and the blue card to make sure that it doesn’t have an even number on the other side. Most people suggest flipping over the 4 and the red card – this is wrong, as it doesn’t matter if the red card has an odd number on the other side.

Now let’s mix it up a bit. Instead of numbers and colors, let’s try people and social activities. Assume the cards have a drink on one side, and a person of a certain age on the other. What cards need to be flipped over to make sure everyone drinking beer is old enough to do so?

Now the answer flows quickly and easily, and almost everyone gets it correct. We need to flip over the beer card to make sure that the person on the other side is old enough, and we need to flip over the card showing the underage drinker to make sure they’re playing by the rules.

The weird thing is that both examples are logically equivalent. Instead of numbers and colors, we’ve just used people and drinks. But something very important has occurred, and it happens time and time again as these tests are administered. It seems that people are fast and accurate at solving this task only if it is described as a test of social obligations. They both can be described identically with logic – but that doesn’t appear to matter to our mind. We appear to have a module dedicated to social reasoning and conflict, and can only solve these problems quickly if it involves determining if someone is cheating or breaking social conventions. This ancient module would hold significant survival value – a general logic verification module not quite so much.

Modules Upon Modules

There appears to be significant evidence for a modular mind, not just in terms of divisions between senses such as sight or hearing and other actions like movement but also more abstract modules that deal with concepts such as social rules. Stroke victims can literally rewire their brains, passing concepts upward into their consciousness through paths never intended to be used in such a strange manner, duplicating the work of other modules lost to injury. These modules live in a strange world of physical interaction and abstract mental space, a huge interconnected mass with no clear outline behind it. The big question now becomes: is a sufficiently complex system able to understand itself, and are we that system?