The Theis Equation and Flow

Mathematics is remarkably effective in describing the physical world in part due to isomorphisms, relationships between concepts that reveal a similar underlying structure. In 1935 Charles Vernon Theis was working on groundwater flow, a subject with little mathematical treatment at the time. He thought that perhaps a well tapping a confined aquifer could be described using the same mathematics as the heat flow of a thin wire drawing heat from a large plate, as this work was better established. With a little bit of help from C. I. Lubin and considering how parameters describing underground water flow could be compared to those describing heat flow in solid materials, he developed the Theis equation which is used to this day to model the response of a confined aquifer to pumping over time.

I developed a small program which allows visualization of the potentiometric surface of a confined aquifer subject to pumping using Processing. This particular example uses aquifer and pumping parameters from a Geo-Slope whitepaper.

The source code may be downloaded here. All values including aquifer, pumping, visualization, and numerical parameters may be varied to apply to a wide variety of situations. The exponential integral (or “well function”) is calculated using a numerical approximation accurate to at least 1 part in 10,000,000 .

Convergence in the Lorenz Attractor

Most visualizations of the Lorenz attractor are of a long history of a single point after convergence to the attractor has occurred. I was interested in what the surrounding space looked like, so I randomly selected 20,000 starting points from a three dimensional Gaussian distribution with a standard deviation of 100. Each point was iterated, and a short history displayed as a trail.

Interestingly enough the points do not simply fall in from arbitrary directions like a gravity field, but display structure by instead swirling along a clear path up the z axis.

Graph Theory, Algorithmic Consensus, and MMA Ranking

I’ve been working on an objective ranking system lately that could be applied to groups with large numbers of individual competitors, like the sport of mixed martial arts (MMA). The biggest issue compared to typical ranking systems is that there are so many participants that they cannot all compete against each other in a round-robin tournament or similar within a reasonable time frame. In order to calculate a global ranking, all of the players must be compared against each other through a sort of “six degrees of separation” style comparison, which is vulnerable to bias and calculation error.

This problem has already been solved in the chess world with the Elo rating system, a statistical approach that requires frequent competition in order to generate statistically significant results. Unfortunately competitors in sports like mixed martial arts or boxing do not compete nearly as frequently as chess players (for obvious reasons) and this approach drowns in a sea of statistical noise. Typically combat sport rankings are done by a knowledgeable observer by hand, through consensus of many observers, or by models with a large number of tunable parameters. It is very interesting to consider that humans appear to be able to easily determine who should be ranked highly, and that many algorithmic approaches largely match these evaluations but make some seemingly obvious mistakes. My goal was to find an approach that produced rankings that seemed sensible to a human observer with a minimum of tunable parameters (preferably none).

Data Structure

The initial step is to structure our data in a sensible way. We have a large number of participants, connected by individual competition which can either result in a win or a loss. One way of structuring this data would be in a directed graph, where competitors are represented by nodes and matches as edges with direction defined by who wins or loses. We seem to be focused on losses (or win/loss ratio) as the biggest factor – a competitor with 40 wins and zero losses is typically regarded as better than a competitor with 60 wins and 20 losses. Let’s set the direction of the edge from the losing competitor to the winning competitor. A “good” competitor’s node will therefore have many incoming edges and few outgoing edges, and tend to be at the center of force-directed graph layouts.

Evaluation Algorithms

There are many possible evaluation algorithms which will produce a ranking from this data structure. After many trials, two appeared to stand above the rest.

  1. The first is recommended in the journal article Ranking from unbalanced paired-comparison data by H.A. David published in Volume 74, Issue 2 of Biometrika in 1987.
  2. David also discussed the Kendall-Wei algorithm in his paper, of which Google’s PageRank algorithm is a special case. PageRank is used to rank webpages which are represented as a directed graph based on the concept of network flow, and may also be applied to other directed graphs including our case. The PageRank algorithm contains one tunable parameter, a damping factor which is currently set to the default 0.85.

It was found that both algorithms seemed to emphasize different aspects important to MMA ranking. David’s “Unbalanced Pair Comparison” emphasized a grittier statistics-based approach, highlighting fighters such as Anderson Silva, Rashad Evans, and Jon Fitch. Google’s PageRank seemed to take a more social approach emphasizing fighters with a wide range of quality opponents, like Georges St-Pierre, Matt Hughes, and Forest Griffin. It was very interesting how one algorithm appeared to highlight the “hardcore mma fan” perspective, while the other seemed to be pulled straight from the UFC head office.

It was decided that both would be calculated, scores normalized, and used in combination to generate a consensus ranking similar to consensus rankings generated from human experts. This was inspired by IBM’s Watson which uses a consensus of multiple algorithms to evaluate answers to trivia questions. Two possible improvements are hypothesized but undertested:

  1. Perhaps additional independent ranking algorithms incorporated in this consensus would improve accuracy. The big issue appears to be “independent” algorithms which do not simply restate the work of other algorithms, and of those, finding algorithms which display ranking behavior useful for our application.
  2. Unlike Watson, confidence levels are not used. This would be a useful addition given situations like extreme upsets. A newer beta version of this ranking system determines if highly ranked fighters coincide with centrality metrics in an attempt to implement this, but is not complete at this time of this post.

Results

The ranking system was run on every UFC event from UFC 1 (November 12, 1993) to Fight for the Troops 2 (January 22, 2011). Both algorithms are shown ranked alone for comparison, and their scores were equally weighted to produce the final results.

Lightweight (155lbs)
Overall Rank PageRank Unbalanced Pair
1. Gray Maynard 1. B.J. Penn 1. Gray Maynard
2. B.J. Penn 2. Gray Maynard 2. George Sotiropolous
3. Frankie Edgar 3. Frankie Edgar 3. Frankie Edgar
4. George Sotiropolous 4. Kenny Florian 4. Jim Miller
5. Jim Miller 5. Joe Lauzon 5. Nik Lentz

First up are the lightweights – and the results aren’t too shabby. No one seems to want to admit it due to his sometimes snooze-inducing style, but Gray Maynard is a beast who is likely to cause B.J. Penn significant issues if they ever fought. Frankie Edgar deserves to be right up there but not number one, and chronically underrated George Sotiropolous and Jim Miller round out the pack.

Welterweight (170lbs)
Overall Rank PageRank Unbalanced Pair
1. Georges St-Pierre 1. Georges St-Pierre 1. Matt Hughes
2. Matt Hughes 2. Matt Hughes 2. Josh Koscheck
3. Josh Koscheck 3. Matt Serra 3. Georges St-Pierre
4. Martin Kampmann 4. Dennis Hallman 4. Martin Kampmann
5. Dennis Hallman 5. Martin Kampmann 5. Rick Story

Georges St-Pierre is the obvious frontrunner at 170. Matt Hughes at number two is a bit more debatable, but a long title reign and consistent quality opposition provide a reasonable rationale. Josh Koscheck is perpetually always the bridesmaid, never the bride at third, and Martin Kampmann and Dennis Hallman round out a somewhat thin division.

Middleweight (185lbs)
Overall Rank PageRank Unbalanced Pair
1. Anderson Silva 1. Anderson Silva 1. Anderson Silva
2. Jon Fitch 2. Jon Fitch 2. Jon Fitch
3. Yushin Okami 3. Vitor Belfort 3. Yushin Okami
4. Michael Bisping 4. Nate Marquardt 4. Michael Bisping
5. Nate Marquardt 5. Yushin Okami 5. Demian Maia

Anderson Silva provides another easy choice for number one at 185lbs. Both Jon Fitch and Yushin Okami deserve their spots with a consistent if slightly dull record. Michael Bisping has slowly been grinding his way up the charts, and Nate Marquardt rounds out the top five.

Light Heavyweight (205lbs)
Overall Rank PageRank Unbalanced Pair
1. Rashad Evans 1. Forrest Griffin 1. Rashad Evans
2. Lyoto Machida 2. Lyoto Machida 2. Jon Jones
3. Forrest Griffin 3. Rashad Evans 3. Ryan Bader
4. Quinton Jackson 4. Quinton Jackson 4. Lyoto Machida
5. Mauricio Rua 5. Mauricio Rua 5. Thiago Silva

Rashad Evans appears to have made a sensible call waiting for his title shot at UFC 128. The hypercompetitive light heavyweight division is always a tough one to call. A split in the consensus between the two algorithms produces a top five that seems to emphasize number of fights in the Octagon, with champion Mauricio “Shogun” Rua a surprising fifth. Too early to call Evans over Rua? Only time will tell.

Heavyweight (265lbs)
Overall Rank PageRank Unbalanced Pair
1. Frank Mir 1. Frank Mir 1. Frank Mir
2. Cain Velasquez 2. Brock Lesnar 2. Junior Dos Santos
3. Junior Dos Santos 3. Cain Velasquez 3. Cain Velasquez
4. Brock Lesnar 4. Antonio Rodrigo Nogueira 4. Cheick Kongo
5. Shane Carwin 5. Shane Carwin 5. Brendan Schaub

I initially disagreed with Frank Mir as number one here – Cain Velasquez seems to be the obvious choice. But the ranking process seems to trust number of fights over new hype, and the rest of the top five is bang on what I would choose. You can’t win them all – or perhaps I’m just being unfair to Frank Mir.

Conclusions

The approach produced excellent rankings from UFC-only data, largely coinciding with established and more complete authorities like FightMatrix. The approach used two ranking algorithms which traversed a directed graph, and produced scores which were normalized and added to produce a final score which was sorted to produce final rankings. One tunable parameter (PageRank damping factor) exists in the model, but was left at the default value of 0.85. Further work will focus on additional ranking algorithms which may be incorporated into the consensus, parametric analysis of the PageRank damping factor, and determining confidence scores.

Acknowledgements

The NetworkX package for Python was critical for this project. I came across Mixed Martial Arts vs Graph Theory shortly after starting work on this. Many thanks to Jack Rusher for highlighting interesting algorithms and providing a respite from entering most of the data by hand.

Directed Graphs and MMA

There is underlying mathematical structure everywhere – it’s just a matter of finding the best way to unfold it from the data. I’ve been working on a project to objectively rank mixed martial arts fighters. It’s not anywhere near done yet, but I’ve collected a fair bit of data. Here’s what nearly 2000 MMA fights in the UFC and Pride FC over the last 15 years look like expressed as a directed graph with a force directed layout.

You can see fights clumped into UFC (top) and Pride (bottom) organizations, with a subset of fighters acting as ambassadors between both.

Color Cycled Lorenz Attractor

Another visualisation of the Lorenz attractor discussed in Lorenz and the Butterfly Effect, color cycled between green and blue.

A path of one million points, newer values are green, older values are blue. I’ve also produced some higher resolution stills with better antialiasing.

Built with Processing.

Lorenz and the Butterfly Effect

In 1962, Edward Lorenz was studying a simplified model of convection flow in the atmosphere. He imagined a closed chamber of air with a temperature difference between the bottom and the top, modeled using the Navier-Stokes equations of fluid flow. If the difference in temperature was slight, the heated air would slowly rise to the top in a predictable manner. But if the difference in temperature was higher, other situations would arise including convection and turbulence. Turbulence is notoriously difficult to model, so Lorenz focused on the formation of convection, somewhat more predictable circular flows.

Focusing only on convection allowed Lorenz to simplify the model sufficiently so that he could run calculations on the primitive (by today’s standards) computer available to him at MIT. And something wonderful happened – instead of settling down into a predictable pattern, Lorenz’s model oscillated within a certain range but never seemed to do the exact same thing twice. It was an excellent approach to the problem, exhibiting the same kind of semi-regular behavior we see in actual atmospheric dynamics. It made quite a stir at MIT at the time, with other faculty betting on just how this strange mathematical object would behave from iteration to iteration.

But Lorenz was about to discover something even more astounding about this model. While running a particularly interesting simulation, he was forced to stop his calculations prematurely as the computer he was using was shared by many people. He was prepared for this, and wrote down three values describing the current state of the system at a certain time. Later when the computer was available again, he entered the three values and let the model run once more. But something very strange happened. It initially appeared to follow the same path as before, but soon it started to diverge from the original slightly, and quickly these small differences became huge variances. What had happened to cause this?

Lorenz soon found that his error was a seemingly trivial one. He had rounded the numbers he wrote down to three decimal places, while the computer stored values to six decimal places. Simply rounding to the nearest thousandth had created a tiny error that propagated throughout the system, creating drastic differences in the final result. This is the origin of the term “butterfly effect”, where infinitesimal initial inputs can cause huge results. Lorenz had discovered the first chaotic dynamical system.

We can see this effect in action here, where a red and a blue path have initial values truncated in a similar manner. Instead of graphing X, Y, and Z values separately, we’ll graph them in 3D space.

Initial Values (Red)   Initial Values (Blue)  
X 5.766 5.766450
Y -2.456 -2.456460
Z 32.817 32.817251

We can see at first that the two paths closely track each other. The error is magnified as time marches on, and the paths slowly diverge until they are completely different – all as a result of truncating to three decimal places. This is a contrast to the sensitivity to error we are typically used to, where small errors are not magnified uncontrollably as calculation progresses. Imagine if using a measuring tape that was off by a tenth of a percent caused your new deck to to collapse rather than stand strong…

There is also a hint of a more complex underlying structure, a folded three dimensional surface that the paths move along. If a longer path is drawn, the semi-regular line graphs of before are revealed to be the projections of a structure of undeniable beauty.

This is a rendering of the long journey of a single point in Lorenz’s creation, a path consisting of three million iterations developed in Processing that shows the region of allowable values in the model more clearly. Older values are darker, and the visualization is animated to show the strange dance of the current values of the system as they sweep from arm to arm of this dual-eyed mathematical monster, order generating chaos.

River Crossing Problems and Discrete State Spaces

A brain teaser goes as follows: a farmer is returning from market, where he has bought a wolf, a goat, and a cabbage. On his way home he must cross a river by boat, but the boat is only large enough to carry the farmer and one additional item. The farmer realizes he must shuttle the items across one by one. He must be careful however, as the wolf cannot be left alone on the same side of the river as the goat (since the goat will be eaten), and the goat cannot be left alone on the same side of the river as the cabbage (since the goat will eat the cabbage).

How can the farmer arrange his trips to move the wolf, the goat, and the cabbage across the river while ensuring that no one eats the other while he’s away? The problem usually yields to an analysis by trial and error, where one of the possible solutions is found by iterating and checking the possible decisions of the farmer.

But is there a more rigorous approach that will allow us to easily find all possible solutions without having to resort to guessing and checking?

A Discrete State Space

Let’s create a three-dimensional vector that represents the state of the system. The components of this vector will be 0 or 1 depending on which side of the river the wolf, the goat, and the cabbage are. Let’s list off all possible combinations without worrying about whether something is going to get eaten – we’ll get to that later.

Vector Location
(0,0,0) Our initial state, with all three on the starting side of the river.
(1,0,0) The wolf has crossed the river, but not the goat or cabbage.
(0,1,0) The goat has crossed the river, but not the wolf or cabbage.
(0,0,1) The cabbage has crossed the river, but not the wolf or goat.
(1,1,0) The wolf and the goat have crossed the river, but not the cabbage.
(0,1,1) The goat and the cabbage have crossed the river, but not the wolf.
(1,0,1) The wolf and cabbage have crossed the river, but not the goat.
(1,1,1) Our desired state, where all three have crossed the river.

Now a giant list like this isn’t much help. We’ve listed off all possible states, but we can structure this in a more sensible manner which will make solving the problem much easier. Let’s allow the vectors to represent the corners of a cube, with the edges of the cube representing trips across the river. Movement along the x axis represents the wolf moving, movement along the y axis represents the goat moving, and movement along the z axis represents the cabbage moving.

Alright, that seems a bit better. What we need to do now is find an allowable path along the edges of the cube from (0,0,0) to (1,1,1), and this will represent the solution to our puzzle. First we need to remove the edges where something gets eaten.

Path Removed Rationale
(0,0,0) to (1,0,0) This moves the wolf across, leaving the goat with the cabbage.
(0,0,0) to (0,0,1) This moves the cabbage across, leaving the wolf with the goat.
(0,1,1) to (1,1,1) This moves the wolf across, leaving the goat with the cabbage.
(1,1,0) to (1,1,1) This moves the cabbage across, leaving the wolf with the goat.

Our allowable state space now looks like this.

The problem has simplified itself drastically. We want to go from (0,0,0) to (1,1,1) by travelling along the black (allowable) edges of this cube. We can see that if we ignore solutions where the farmer loops pointlessly, there are two allowable solutions to this puzzle.

  1. Move goat to other side.
  2. Move cabbage to other side.
  3. Move goat back.
  4. Move wolf to other side.
  5. Move goat to other side.
  1. Move goat to other side.
  2. Move wolf to other side.
  3. Move goat back.
  4. Move cabbage to other side.
  5. Move goat to other side.

This is a beautiful example of the simple structures that underly many of these classic riddles.

Emily’s Trees v2.0

I’ve been working on a new website for my dear girlfriend Emily – check it out, and buy lots of things with the new and improved shopping cart system!

The Amen Break

Imagine you’ve just phoned into a radio station as part of a contest. You’re the ninth caller, and you can win a fabulous vacation simply if you identify the song about to be played. The radio DJ will play a 15 second snippet, and you have 30 seconds to identify this song. He asks if you’re ready, and then he presses play…


alternate link if the inline player doesn’t work for you

Alright, 30 seconds to think, don’t panic… the first few second sound like some random trumpets, but that drum beat in the middle – you know that beat! You’ve heard it everywhere, hundreds of times. That “dum-dum-tish” break has driven itself into your brain, but you’ll be damned if you can remember where it’s originally from.

Is it “Straight Outta Compton” by NWA?
You can definitely hear part of it in the Powerpuff Girls theme song.
It sounds like it’s in the beat of Slipknot’s “Eyeless“.

But it’s originally from none of those. Not David Bowie’s “Little Wonder“, not Oasis’ “D’You Know What I Mean“, not Maestro Fresh Wes’ “Bring it On“, not Orbital’s “Satan“, nor Nas’ “Hip Hop is Dead” – but instead the song “Amen Brother” from funk and soul outfit The Winstons recorded in 1969.

Hip Hop and Sampling

The song “Amen Brother” was initially recorded in 1969, but it took until the late 1980s and early 1990s for it to be begin to be revived as an integral part of pop culture. This was the time when samplers, machines which allowed short bits of music to be recorded and played back at will, finally became affordable to the hip hop artists of the streets. Short bit of drum beats that could be stored in a sampler were called “break beats”, or simply “breaks”. No one is certain who was the first to sample the Amen Break, but its use became ubiquitous in early nineties hip hop. Eventually the beat became overplayed, and the Amen Break seemed to be a flash in the plan, something brought up for nostalgic value if anything on new tracks.

Jumping the Pond

But as “Straight out of Compton” was selling millions in the US, the rave scene was exploding in the UK. This was a scene with little money, little social support, and was a subculture by any definition. But it was growing rapidly, and it demanded new music. The Amen Break was adopted as a foreign son, and the very basis for an entire genre and hundreds of songs. The break was sliced up into tiny pieces, each high hat, each snare, each bass drum hit cut into a tiny piece of audio. These pieces could be saved in a sampler, and played back in any permutation or combination. At first the variations were homages, but quickly the variation and volume spread. A sampler was cheap, the breaks were freely passed along, and flimsy acetate records of your new track could be written in the morning and played at the rave in the evening. This organic creativity caught the attention of the art scene, and effectively undanceable constructions were born, a fusion of the fundamentalism of the Amen Break homage and the polymeter and highbrow musical structure of the new creators who now went to art school. A prime example of this new wave was Tom Jenkinson, known as Squarepusher, who created multiple albums worth of songs all based on chopping and mixing the Amen Break.

The Rewards of Sampling

But what of the Winstons, the creators of the oft-sampled beat now so widespead that Fortune 500 companies use it in their ads? Did they blow the royalties on the fleeting rewards of fame, or have they hidden themselves and their fortune away? The staggering fact is that despite a recent Supreme Court ruling that even three notes can be considered a sample and subject to royalties, the Winstons haven’t received a single cent. Instead of property, the Amen Break now appears to be a part of the popular collective unconscious, ever-present and unownable.

For more detail, check out this great video by Nate Harrison from 2004.

Rumors of God

The Higgs boson is one of the most important, most abstract, and most interesting ideas in current physics. Dubbed the “god” particle after an editor disallowed one of the co-discoverers from calling it “that goddamn particle”, it is the only remaining particle in the Standard Model that has not been directly observed by humans.

The fact that we’ve never seen any evidence of its presence is a big deal. The Standard Model is a shining jewel of modern physics, a mathematical model describing the most basic facets of the world using a symmetry group. The predictions of this model have been confirmed so many times and in so many different ways that it’s almost become more exciting to think of what it would mean if the theory were broken by some strange event than to rejoice at yet another correct prediction.

So this is why the missing Higgs boson fascinates so many. But why is it missing at all? The answer comes from the Standard model itself and its mathematical foundations. Like the sides of a many-faceted jewel, a symmetry group has many parts, all reflecting some basic property of an underlying structure. Every single observation so far indicates a jewel of a certain shape, and we’ve seen every face except for one. There seems to be no other possible structure that would fit the observations and create a complete crystal to speak, so we’re left inevitably with the conclusion that one face remains in the dark.

Broken Silence

But now something very interesting has happened. Evidence of the Higgs boson may have been found! Not at the new and fancy Large Hadron Collider, but the good old Tevatron, home of Fermilab. Apparently a rather prominent physicist passed some rumours going around the physics world to a blogger.

On a completely different issue, I’ve heard that there’s a rumor going around Aspen that the Tevatron will be announcing discovery of gluon + b → b + Higgs, which would then require large tan(beta), which would fit the MSSM. I guess we’ll find out in a couple of weeks.

This is big news – the observations indicate a “light” Higgs boson, which is why the older Tevatron was able to find it when everyone thought the higher power Large Hadron Collider would be required. What does a light Higgs boson mean? Something very inspiring – that particle physics isn’t quite done figuring out the world quite yet.

…a light Higgs boson pretty much proves that there has to be new physics beyond the Standard Model well below the Planck scale. The new particles should be bosons and the only natural reasons why the Higgs, or the new bosons, should stay light is supersymmetry.

The best fits based on this assumption imply that the squarks and slepton masses – and more generally, other superpartner masses – should be close to 500 GeV or so and they should be observable by the LHC, possibly by the end of 2011.

But what does Fermilab, the apparent source of this grand discovery have to say about it?

I find it interesting that there is not an explicit denial, but a statement that these rumblings in the physics community are “just rumours”. Rumours can’t be published in scientific journals – so I suppose we’ll have to wait and see just what this all means.

Copyright © 2014 gmilburn.ca.

Options Theme