Horse Betting vs. Data Science

What happens when one of the oldest traditions of mankind meets today’s most powerful tools? In 2001, Bill Benter, a professional gambler and mathematician, put this question to the test.

Horseracing, or the “Sport of Kings”, dates back as far as 4500 BC, with the nomadic tribesmen of Central Asia, who were the first to domesticate horses. Since then, most eras of our history have instances of horse racing; whether it was the Ancient Greeks, English Knights, or Native Americans, they all have instances of racing horses. There are many important variables that make a successful racehorse, a few examples are ancestry, breed, training, build, and most importantly luck. An interesting fact, 95 percent of all living Thoroughbreds, which are the most dominant racing horses, can trace their lineage back to 3 specific horses; The Byerley Turk (1680s), the Darley Arabian (1704), and the Godolphin Arabian (1729).

As the racing itself has evolved through history, horse betting has grown parallel. While there is evidence of trades and wages in ancient horse races, the first evidence of organized horse betting is in the 1600s with King James I. The first English settlers then brought the sport with them to the New World, and in 1665 opened a racetrack in what is now Nassau County, New York. Throughout the years horse betting has had a very volatile relationship with the law with brief stints of the practice being outlawed. Now, horse betting is legal in most places around the world, with a structured system and regulation.

Modern day horse betting can be seen as a very complex system to which the novice gambler will have a difficult time getting acquainted with. to try and simplify, all betting at American tracks today is done using a pari-mutuel wagering system. Under this system, a fixed percentage (usually 14%-25%) of the total amount wagered is taken out for racing purses, track operating costs and state and local taxes. The remaining sum is divided by the number of individual correct wagers to determine the payoff on each bet. The two most basic categories for wagering are straight bets and exotic bets. A straight bet is betting on your horse to Win, Place, or Show; win is first place, place is first or second place, and show is first, second, or third place. Exotic bets have four basic categories; Exacta, Quinella, Trifecta, or Superfecta. An Exacta picks the first place and second place horses. A Quinella picks the first and second place horse but does not require the order. A Trifecta picks the top 3 horses in order. Then last, a Superfecta, picks the first four finishers, in order. Those are the most basic examples of horse wagers, but it can become a lot more complex.

Now that we have a basic understanding of the history of or racing and how horse betting works, where does data science come into play? As you could imagine, after millions of races throughout the modern world, horse racing has the potential to generate a vast amount of data. For example, when entering the racetrack on any given race day, you are handed a guidebook, this guidebook has data on every race and data on every horse’s past races. Below is an example of a racing forum for one horse, “Color Chart”, entered in one race.

As you can see, there is a lot of data kept for each individual horse.

As a data scientist our first goal would be to predict with confidence the probability a horse has of winning the a given race. When analyzing a horse, The Rating Bureau, describes 7 major factors when analyzing a horse; class, recent form, fitness, in-run position, jockeys, track condition, and weight. I gave an example of one below to show how it can work in our statistical model.

1. Class

Summary: Class is the ability of a horse compared to its rivals. In this respect, class can be described as the combination of speed, stamina and determination a horse possesses, qualities that allow it to win or be competitive at a given level of competition. The higher the grade of race, the greater the speed, stamina and determination a horse needs to win. A horse that has won or placed at Group 1 level has more class (ability) than a horse that can only be competitive in restricted grade.

Potential Model Features:

· What is the highest class the horse has won in? How many times?

· What is the highest class the horse has been competitive in (within two lengths of the winner)? How many times?

· Outside of the above, what class has the horse been expected to be competitive in? That is, races where it started up to $7 in betting.

· What class has the horse attempted and failed in?

We are not the first ones to try and implement a successful model to predict horse races. Ruth N. Bolton and Randall G Chapman of The University of Alberta describes their primary variables when predicting an outcome. Their research suggested that “average amount of money earned per race in the current year” and “average speed rating over the last four races” were the two most important factors. “Lifetime win percentage” was also considered a significant variable, but not so much as the first two.

While predicting the outcome of the race seems like a difficult task, it is only one of the two major barricades. The second issue would be finding the precise way to place our bet. The track is actually able to do a pretty good job at estimating the potential winners, this is shown when placing odds next to a horse. The odds of a horse winning could be converted into a percentage by adding 1, and then divide 100 by that number. (Example. 3–1 odds: 3+1=4 100/4=25% chance). So, the goal of our model would be to go up against the house when predicting which horse has the best chance. When we find that edge, it is important to capitalize.

But has anyone ever been successful when going up against the racetrack? The answer is surprisingly, yes! Bill Benter, a professional gambler, left Vegas in 1984 to take on this challenge at the Hong Kong Jockey club. The Hong Kong racetrack was a perfect laboratory for an early data scientist; there are two major racetracks that dominate, and 100s of races each year. Benter was able to get the results of races dating back 5 years and spent 9 months feeding the data into a computer. After years of tweaking and improving the statistical model he improved the model to the point he could confidently predict the outcome of a given race more successfully than the bookmakers. A major breakthrough in predicting the races was a feature incorporating public odds into the model. Benter made over a billion dollars on horse racing and was eventually restricted from betting in Hong Kong. To prove how successful his model was, he put it to one final test. The Triple Trio in 2001 was a wager predicting the top 3 finishers in 3 consecutive races. The jackpot was $20,000,000 and after placing 50,000 bets worth over $1,000,000 Benter was successful. This assured him that he had taken on the house and won.


Bolton, R. N., & Chapman, R. G. (1986, August). Searching for Positive Returns at the Track: A Multinomial Logit Model for Handicapping Horse Races.

The Rating Bureau. (2008). The eight most important analysis factors.