After thinking about this off and on for probably more than 5 years, I finally made a machine learning application that can predict NCAA tournament basketball games in time for the 2025 March Madness tournament. Occassionally, through the years, before I had done any actual work on the model, I'd find myself fantasizing that my application would predict the perfect bracket. I'd imagine myself in a podcast interview explaining how I was as surpised as anyone at the results but that I was confident that I had not somehow created Skynet. In reviewing the aftermath of my bracket, I definitely did not need to have an answer prepared to explain how I knew I had not accidently created Skynet.

I entered brackets completed by my machine learning application in four places:

  1. a competition against other machine learning applications
  2. a family bracket pool where every other bracket was completed by family (all humans)
  3. a college friend bracket pool where every other bracket was completed by college friends (all humans)
  4. a work bracket pool where every other bracket was completed by coworkers (all humans)

Competition Results

In the Kaggle March Machine Learning Mania 2025 competition, I came in 931 of 1,727 competitors (46th percentile).

When using my machine learning model against human picks in my friend, family, and coworker leagues, it finished between the 40th percentile and 75th percentile.

Bracket Results

My model made 38 of 63 picks correctly (~60%).

Algorithm Factors

The Kaggle competition I entered provided box scores for each game of the NCAA regular season. I wrote a script to calculate season average statistics for each team based on those box scores, as well as the true shooting percentage and a couple statistics I made up called "blowout wins" and "blowout losses." I used these calculated statistics for each team as factors against the results of each NCAA tournament games (winning versus losing, represented as 1 versus 0) since 2003 to train my model. The ultimate weights calculated and used by my model are shown below.

Rank Factor Weight Weight (Absolute Value)
1Losing Team Offensive Rebounds Ave-0.60790.6079
2Winning Team Offensive Rebounds Ave0.58020.5802
3Winning Team True Shooting Percentage0.48170.4817
4Winning Team Free Throws Attempted Ave-0.45430.4543
5Losing Team Turnovers Ave0.43960.4396
6Winning Team Winning percentage0.33230.3323
7Losing Team True Shooting Percentage-0.30060.3006
8Winning Team Free Throws Made Ave0.29030.2903
9Winning Team Turnovers Ave-0.28740.2874
10Losing Team Free Throws Attempted Ave0.24090.2409
11Losing Team Effective Field Goal Percentage-0.22290.2229
12Winning Team Blocks Ave0.21870.2187
13Losing Team Three Points Made Ave0.20820.2082
14Losing Team Blocks Ave-0.19390.1939
15Losing Team Steals Ave-0.17140.1714
16Winning Team Steals Ave0.16260.1626
17Losing Team Winning percentage-0.13990.1399
18Winning Team Three Points Made Ave-0.13690.1369
19Losing Team Free Throws Made Ave-0.13170.1317
20Losing Team Score Ave0.12210.1221
21Winning Team Score Ave-0.11070.1107
22Winning Team Personal Fouls Ave-0.09910.0991
23Losing Team Blowout wins0.08860.0886
24Winning Team Three Points Attempted Ave0.08850.0885
25Losing Team Three Points Attempted Ave-0.08720.0872
26Losing Team Personal Fouls Ave0.08360.0836
27Winning Team Blowout losses0.07650.0765
28Winning Team Assists Ave-0.06170.0617
29Losing Team Blowout losses0.04950.0495
30Losing Team Defense Rebounds Ave-0.03840.0384
31Winning Team Blowout wins0.03490.0349
32Winning Team Effective Field Goal Percentage-0.03420.0342
33Losing Team Assists Ave-0.01440.0144
34Winning Team Defense Rebounds Ave-0.00730.0073

There were two factors that drove some deep, incorrect "Cinderella" predictions:

  1. I removed seeds as predictive factors
  2. the model loves offensive rebounding

Seeds, Blowouts, and Offensive Rebounding

The initial version of the model was trained with the seed of each team in a given match-up, but the first bracket I/it/we completed seemed too boring. For the past couple of years, the word "parity" seemed to get thrown around a lot. It felt like the general consensus was that there was no longer an advantage to being a "blue blood" or a high seed or from a major conference.

So I took the seed data out, but I didn't have another metric to attempt to account for difficulty of schedule or opponent. The next iteration of my bracket had Alabama (2) getting upset by Robert Morris (15) in the first round. It also loved Troy (14), having them upset Kentucky (3) and Illinois (6). It seemed to love VCU (11), and a St. Mary's (7) team that my favorite NCAA commentator (Mark Titus) said didn't really pass the eye test.

I quickly looked up Robert Morris. They had a good record and statistics, but some bad losses on their record against "name brand" schools. To attempt to account for merely being the best team in a bad conference, I threw together calculations for "blowout wins" and "blowout losses." I defined a blowout loss as a game where a team lost by more than 10 points. My hope was that this would catch the kind of teams that are in weaker conferences and get invited to play schools in traditionally better conferences early in the season to pad their record ("cupcakes"). I defined a "blowout win" as a game where a team won by more than 10 points either as the away team or on a neutral court. I was hoping to filter teams that did not play well away from home, are talented enough to blow teams out on the road, and maybe just catch teams with "grit."

After including the blowout game counts, the model did specifically tip to have Alabama beat Robert Morris, but not much else changed. It still had Troy, VCU, and St. Mary's making runs in the East bracket. It still had McNeese St. (12) making a run and High Point (13) upsetting Purdue (4) in the Midwest region.

I ran a SQL query on my database, and realized that Troy, VCU, St. Mary's, and Robert Morris all had high season average offensive rebounds. High Point averaged more than 1 additional offensive rebound per game than Purdue.

I was surprised and confused by how much emphasis the model put on offensive rebounding. It seemed even stranger given that, of all statistics, it gives the least weight to defense rebounding. I don't have an explanation for why it loves offensive rebounding, and I'm not sure what to do about it going forward, if anything.

Conclusion, Tweaks, and the Long-Term

Though I was disappointed with the results, this was a fun project, and one that I intend to tweak and maintain for the rest of my life.

This initial model deployed a "kitchen sink" approach. I just kind of threw everything I could come up with into the training data. I was curious to see what it would identify as most important.

I'd like to pare down the amount of statistics that go into the model, or at least have an intent behind why each statistic was included. By next year, I'd like to come up with categories of statistics that reflect different styles of play or strengths, and just include one of each category.

I'm also interested in developing a way to factor in the coach of each team, as well as the "name on the jersey" of the school. I'd like to come up with a way to quantify and reflect what it means to "be Duke" or "a Kentucky" or be a nobody. Similarly, I'd love to come up with a way to reflect that a coach like Rick Pitino is suddenly behind St. John's. But I view these as long term goals, that will take years of iteration of my code, sources of data, and theories.