Evening sports fans. Hope everyone's having a wonderful weekend.
Before we get to the code, I'm happy to say that 7 out of 10 predictions were correct and the 3 that were wrong were draws!
If we had put £1 single bets on each game, then for our £10 stake, we'd have had £12.86 back. Only time will tell if this 28.6% ROI will continue.
In Part 3, I spoke about limiting how far back the system would look when making it's predictions and chose 100 games as a default limit. I've now added a function to backtest different values for this.
The updated code is available on my GitHub.
What I've done is take 60 days of games from a year before the current date and backtest with values from 50 to 500 games, outputting the most successful value.
I've also added a cutoff value for the predicted probability to decide if the game is worth betting on. So the code also sweeps through values for this from 40 to 95.
There was a problem with this approach initially in that it would get to 100% accuracy but only suggest betting on 1 game out of 100. In other words only games that were pretty much foregone conclusions and therefore not worth betting on.
So I've now limited this to advise of at least 1 game out of 10. It reports somewhere in the region of 70-90% accuracy during the backtest.
Now this is a pretty naive form of machine learning, basically a brute force scan through what could be called our hyperparameters, so there's likely to be a danger of curve fitting. To rule this out, I also added a function to test the parameters found during the scan on the next 60 days of games. If the reported accuracy still looks good then we're golden.
New command line options are "-t" or "--test" to scan through the values, and "-b" or "--cutoff" to have the program print out predictions with predicted probabilities above that value.
Running the following command line will find the best values to use for the Scottish Premiership.
python3 soccerprediction.py -c Scotland -l Premiership -t
This returns with values of 450 for history and 70 for cutoff with 100% accuracy for 7 predictions out of 70 games. Sounds too good to be true, I know! However, it also returns 100% accuracy in the validation test.
Running the tests on the English Premier League returns 400 & 70 with 71% accuracy for 7 games from 70. The validation test returns 93%.
I've only tested with the English Premier League, English Championship and the Scottish Premiership so far but as the predictions the code made in Part 3a show, it appears to be working pretty well.
Hey, I know the code isn't pretty, efficient, elegant or any of the things it would be if a professional programmer had written it but who the hell cares if it works eh? I'll be continuing to test it and hope some of you guys give it a try too. Feel free to use or change the code in any way you want and if you've any ideas for improvements or fixes please share them here.
Maybe we can all stick it to the bookies. hehehehe.
My program has only 2 predictions for today.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
yup another two lmao fortunatly I saw this in time to bet on Lazio :)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey, that's great. I'm going to test a little longer before I start betting live.
Did you manage to get the code running btw? (I'm looking into the best way to automate installing the prerequisite packages.)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi, no I didn't, installed a bunch of stuff even said pandas was installed, but it still gives me the same error, module not found pandas. I'm on windows btw this is messy.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I've never used python in windows but from what I can see you would change directory to your python folder, possibly c:\python35, then run the following commands.
python -m pip install pandas
python -m pip install numpy
python -m pip install beautifulsoup4
python -m pip install selenium
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you, I'll try this. I just don't understand anything about code and I'm getting ahead of myself trying to run the programm :S One of this days I'll start making something simple, always wanted to make my own website.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Woohoo. Both predictions correct!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi its me again :)
What data goes into the prediction? You said the system looks back 100 games to make the predictions, just the result? Bookie odds? what does it use?
Thanks
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey bud, the only data it uses is the scores from the previous games. We take averages for home and away goals in the competition, then the averages for each team we want to predict, use these to calculate expected goals for each team. These are then used to pick random samples from the poisson distribution which get averaged out to give us our prediction. Here's an excellent description of the process - it's the one I followed to get started.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
You are guilty for making me lose a couple of hours already, boy I look dumb staring at all this stuff on my screen, we have a saying in portuguese, "like an ox staring at a palace".
Wouldn't going so far back in games kind of ruin the prediction a bit, when calculating averages for each team? (Because you go into past seasons, diferent players, managers etc.)
Anyway I tried to run your code and got this error: ModuleNotFoundError: No module named 'pandas'
I need the database right? Can I use my own database?
The possibiltys with this are immense, unfortunatly my knowledge of this isn't.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
It looks like going further back helps more than hinders. The backtest tries everything from 50 to 500 games and most competitions I've tested with seem to do best considering around 400 games. I believe there are 380 games in a season of the Premier League so it looks like a full year worth of history would be a sensible default.
No database required. For the pandas error, you can use your package manager (synaptic probably) to install "python3-pandas".
Or try
pip3 install pandas
from the command line.You'll likely need to install python3-bs4, python3-selenium and python3-numpy as well.
If you have any other issues, let me know which distro you're using and I'll help as best as I can.
Steven
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I like what you've done here. I was able to run your code. Do you mind if i
use and test it? I can see where improvements can be done. Great job!
By the way, an easier to install dependencies is to store them in a file for example requirement.txt then place these lines inside that file:
you can then install it by
pip install -r requirements.txt
you can also specify module versions on requirements.txt ie.
pandas=0.20.0
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi, I don't mind at all if you use the code, in fact I welcome it. It would be great if you could improve it.
There's so much that could be added, making use of the over/under and BTTS predictions, adding over/under predictions for other scores, predictions for handicaps, correct scores, the list goes on.
I've also thought about scraping a list of available countries & competitions and implementing a GUI with some nice graphs comparing each team.
Anyway, feel free to do anything you like with the code. And if you wanted to, you could post your improvements here on Steem.
Thanks as well for the advice on requirements.txt. I'll just go and add that now.
Have fun mate. :-)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit