Natural language processing AFL injury data

Problem

In building an AFL fantasy optimisation engine I ran tired of manually excluding all injured players.

I put together a python script that reads the official list of AFL player injuries and outputs expected return dates (based on the estimated return listed on the website).

The result is an auto-updating CSV 

(https://github.com/conradbez/afl_injuries/blob/main/injuries.csv)

Challenges:

  1. Return from injury times are given in the human-readable form (see image below *)
  2. Return dates should auto-update without costly infrastructure (maintenance time and compute cost) 

* example injury page - not easily machine-readable

Solution

  1. After trailing a few methods (including painful regex) I settled on the dateparser library converting these times to python timedelta  objects and adding it the date updated - for the full method inspect https://github.com/conradbez/afl_injuries/blob/main/injuries.py
  2. Github provides handy CI/CD through Github Actions. I decided to solve auto-refreshing by piggybacking off this feature. Through triggering a code build on a time schedule I controlled how often the list would be refreshed. Then setting this triggered build to execute the above script and saving the results back  into the repo the CSV is kept up to  date for free, without any underlying infrastructure.

 

Hope this helps automate your AFL spreadsheets