
Tags: probability, markov, sports, betting
I was recently putting some order at home when I stumbled on a set of notes from some years ago. The idea I had at the time was to earn some money with sports betting. In the end I abandoned the idea for several reasons since I didn't think it was worth the effort mainly because the betting house holds all the cards (i.e they will retain 60% of your profits if they are big and can kick you out anytime). Unsurprisingly the one that controls the market wants to earn all the money.
The first, and only, sport that I considered was tennis. The reason is that there are just two players and a binary output for every match which simplifies things. It is also popular enough which increases the possibility of gathering data and having liquidity while betting.
The bad news about tennis is the scoring. Just to refresh concepts (I may be wrong since it has been years):
We want to know which player wins the match.
A match is made of several sets (3 or 5).
A set is made of a minimum of 6 games.
A game is made of a minimum of 4 points.
The exact rules vary between men/women and also depend on the tournament, but here we will talk about winning a single game.
The following figure contains the possible state transitions of a single game. The state is a tuple containing how many points has player \(A\) scored against how many points has player \(B\) scored, and is therefore represented by a tuple \((a, b)\). Notice that the number of points in a game is unbounded. Wikipedia says the record for the longest game had a final score of 70/68.
The above transition graph captures the requirement for winning a game: the first player that reaches 4 points and wins and has two more points than the other player. This latter condition means that if both players reach state (3, 3) that may continue indefinitely until reaching \((n + 2, n)\) or \((n, n + 2)\) for some \(n\). Transitions are probabilistic: we move down with probability \(p\) and right with probability \(1  p\), where \(p\) is the probability of \(A\) winning the point being played.
Now we pose the main question: If player \(A\) has probability \(p\) of winning a single point what is the probability of winning the game?
Let's call \(P_{m n}\) the probability of reaching state \((m, n)\) and \(P^A_{\operatorname{mn}}\) the probability of player \(A\) winning after reaching that state. Then:
The probability of reaching the first 3 nonrecursive states are given by the Binomial distribution and so we have:
The probability of winning after reaching a recursive state is:
But notice that, assuming both players are affected equally by how long they have been playing, which is not the case against Rafael Nadal:
And so we have that
After some boring manipulation we have that:
Which I have named The Tennis Equation (because Tennis Rational Fraction was too long and less cool). If you are curious about how it looks, it looks (surprise...) like a sigmoid.
This was the main building block of my match prediction model. Now you need to extend to sets and match, which is a little tedious but manageable with a CAS (I used sympy) and of course model \(p\) as depending on several features (i.e surface type) but I give you this quote:
My bet (pun intended) would be trying to change the rules, change the betting market. I looked some time ago at blockchain based solutions but they were not ready.