6 Comments

Your work is always awesome, thanks Evan.

Question about causality - it seems possible to me that teams have a lot of transfers because they are bad, not that they are bad because they have a lot of transfers.

For example - a team that has struggled over the last few seasons fires a coach, lots of players leave, new coach comes in with a new system and has to recruit basically from scratch, leading to a bad next season. They weren't bad BECAUSE of the transfers, it was just another symptom of their poor play.

Expand full comment
author

I just made an update to the blog post that does a much better job of showing the causality of returning player minutes on end of season ranking. https://blog.evanmiya.com/i/145424996/hitting-at-least-minutes-from-returning-players

Expand full comment

Very interesting. Thanks, Evan.

Expand full comment

Thanks for this, and some interesting ideas but ...

Would be interested to see descriptive statistics on figure of % returning minutes vs end of season rating. TBH, looks like a pretty random scatterplot of values

Expand full comment
author

I just made an update to the blog post that does a much better job of showing the causality of returning player minutes on end of season ranking. https://blog.evanmiya.com/i/145424996/hitting-at-least-minutes-from-returning-players

Expand full comment
author

Granted there is a lot of noise there for sure.

Crude way to quickly answer this question, but if you do a linear regression between `team_rating_end_of_year ~ returning_minutes_pct` the R-squared is about 6% (not big but not nothing) and the P-value for the coefficient is extremely close to zero.

Now returning minutes % is somewhat correlated with preseason roster score, so the even better question is if the returning minutes % adds any predictive value on top of the preseason roster score. If we do a model of `team_rating_end_of_year ~ roster_score + returning_minutes_pct`, the roster_score has a P-value of ~0 and the returning_minutes_pct has a P-value of 0.112. So not super low, but it definitely adds a little bit of value to the model. And in my out-of-sample cross-validation exercises, the model with both variables performs slightly better in predicting a team's end of season rating than the model with just the preseason roster score.

That's all just for high majors. If we expand the analysis to all D1 teams (going from a sample size of roughly 240 over three seasons to over 1000), the `team_rating_end_of_year ~ roster_score + returning_minutes_pct` model leads to returning_minutes_pct having a P-value < 0.001. I only included the graph with just the high majors in the article to make it a bit easier to understand, but the relationship is even more clear if you include all D1 teams.

Expand full comment