My NCAA ML Model's Performance in 2025

July 18, 2025

After thinking about this off and on for probably more than 5 years, I finally made a machine learning application that can predict NCAA tournament basketball games in time for the 2025 March Madness tournament. Occassionally, through the years, before I had done any actual work on the model, I'd find myself fantasizing that my application would predict the perfect bracket. I'd imagine myself in a podcast interview explaining how I was as surpised as anyone at the results but that I was confident that I had not somehow created Skynet. In reviewing the aftermath of my bracket, I definitely did not need to have an answer prepared to explain how I knew I had not accidently created Skynet.

I entered brackets completed by my machine learning application in four places:

a competition against other machine learning applications
a family bracket pool where every other bracket was completed by family (all humans)
a college friend bracket pool where every other bracket was completed by college friends (all humans)
a work bracket pool where every other bracket was completed by coworkers (all humans)

Competition Results

In the Kaggle March Machine Learning Mania 2025 competition, I came in 931 of 1,727 competitors (46th percentile).

When using my machine learning model against human picks in my friend, family, and coworker leagues, it finished between the 40th percentile and 75th percentile.

Bracket Results

My model made 38 of 63 picks correctly (~60%).

Algorithm Factors

The Kaggle competition I entered provided box scores for each game of the NCAA regular season. I wrote a script to calculate season average statistics for each team based on those box scores, as well as the true shooting percentage and a couple statistics I made up called "blowout wins" and "blowout losses." I used these calculated statistics for each team as factors against the results of each NCAA tournament games (winning versus losing, represented as 1 versus 0) since 2003 to train my model. The ultimate weights calculated and used by my model are shown below.

Rank	Factor	Weight	Weight (Absolute Value)
1	Losing Team Offensive Rebounds Ave	-0.6079	0.6079
2	Winning Team Offensive Rebounds Ave	0.5802	0.5802
3	Winning Team True Shooting Percentage	0.4817	0.4817
4	Winning Team Free Throws Attempted Ave	-0.4543	0.4543
5	Losing Team Turnovers Ave	0.4396	0.4396
6	Winning Team Winning percentage	0.3323	0.3323
7	Losing Team True Shooting Percentage	-0.3006	0.3006
8	Winning Team Free Throws Made Ave	0.2903	0.2903
9	Winning Team Turnovers Ave	-0.2874	0.2874
10	Losing Team Free Throws Attempted Ave	0.2409	0.2409
11	Losing Team Effective Field Goal Percentage	-0.2229	0.2229
12	Winning Team Blocks Ave	0.2187	0.2187
13	Losing Team Three Points Made Ave	0.2082	0.2082
14	Losing Team Blocks Ave	-0.1939	0.1939
15	Losing Team Steals Ave	-0.1714	0.1714
16	Winning Team Steals Ave	0.1626	0.1626
17	Losing Team Winning percentage	-0.1399	0.1399
18	Winning Team Three Points Made Ave	-0.1369	0.1369
19	Losing Team Free Throws Made Ave	-0.1317	0.1317
20	Losing Team Score Ave	0.1221	0.1221
21	Winning Team Score Ave	-0.1107	0.1107
22	Winning Team Personal Fouls Ave	-0.0991	0.0991
23	Losing Team Blowout wins	0.0886	0.0886
24	Winning Team Three Points Attempted Ave	0.0885	0.0885
25	Losing Team Three Points Attempted Ave	-0.0872	0.0872
26	Losing Team Personal Fouls Ave	0.0836	0.0836
27	Winning Team Blowout losses	0.0765	0.0765
28	Winning Team Assists Ave	-0.0617	0.0617
29	Losing Team Blowout losses	0.0495	0.0495
30	Losing Team Defense Rebounds Ave	-0.0384	0.0384
31	Winning Team Blowout wins	0.0349	0.0349
32	Winning Team Effective Field Goal Percentage	-0.0342	0.0342
33	Losing Team Assists Ave	-0.0144	0.0144
34	Winning Team Defense Rebounds Ave	-0.0073	0.0073

There were two factors that drove some deep, incorrect "Cinderella" predictions:

I removed seeds as predictive factors
the model loves offensive rebounding

Seeds, Blowouts, and Offensive Rebounding

The initial version of the model was trained with the seed of each team in a given match-up, but the first bracket I/it/we completed seemed too boring. For the past couple of years, the word "parity" seemed to get thrown around a lot. It felt like the general consensus was that there was no longer an advantage to being a "blue blood" or a high seed or from a major conference.

So I took the seed data out, but I didn't have another metric to attempt to account for difficulty of schedule or opponent. The next iteration of my bracket had Alabama (2) getting upset by Robert Morris (15) in the first round. It also loved Troy (14), having them upset Kentucky (3) and Illinois (6). It seemed to love VCU (11), and a St. Mary's (7) team that my favorite NCAA commentator (Mark Titus) said didn't really pass the eye test.

I quickly looked up Robert Morris. They had a good record and statistics, but some bad losses on their record against "name brand" schools. To attempt to account for merely being the best team in a bad conference, I threw together calculations for "blowout wins" and "blowout losses." I defined a blowout loss as a game where a team lost by more than 10 points. My hope was that this would catch the kind of teams that are in weaker conferences and get invited to play schools in traditionally better conferences early in the season to pad their record ("cupcakes"). I defined a "blowout win" as a game where a team won by more than 10 points either as the away team or on a neutral court. I was hoping to filter teams that did not play well away from home, are talented enough to blow teams out on the road, and maybe just catch teams with "grit."

After including the blowout game counts, the model did specifically tip to have Alabama beat Robert Morris, but not much else changed. It still had Troy, VCU, and St. Mary's making runs in the East bracket. It still had McNeese St. (12) making a run and High Point (13) upsetting Purdue (4) in the Midwest region.

I ran a SQL query on my database, and realized that Troy, VCU, St. Mary's, and Robert Morris all had high season average offensive rebounds. High Point averaged more than 1 additional offensive rebound per game than Purdue.

I was surprised and confused by how much emphasis the model put on offensive rebounding. It seemed even stranger given that, of all statistics, it gives the least weight to defense rebounding. I don't have an explanation for why it loves offensive rebounding, and I'm not sure what to do about it going forward, if anything.

Conclusion, Tweaks, and the Long-Term

Though I was disappointed with the results, this was a fun project, and one that I intend to tweak and maintain for the rest of my life.

This initial model deployed a "kitchen sink" approach. I just kind of threw everything I could come up with into the training data. I was curious to see what it would identify as most important.

I'd like to pare down the amount of statistics that go into the model, or at least have an intent behind why each statistic was included. By next year, I'd like to come up with categories of statistics that reflect different styles of play or strengths, and just include one of each category.

I'm also interested in developing a way to factor in the coach of each team, as well as the "name on the jersey" of the school. I'd like to come up with a way to quantify and reflect what it means to "be Duke" or "a Kentucky" or be a nobody. Similarly, I'd love to come up with a way to reflect that a coach like Rick Pitino is suddenly behind St. John's. But I view these as long term goals, that will take years of iteration of my code, sources of data, and theories.