Question about causality - it seems possible to me that teams have a lot of transfers because they are bad, not that they are bad because they have a lot of transfers.

For example - a team that has struggled over the last few seasons fires a coach, lots of players leave, new coach comes in with a new system and has to recruit basically from scratch, leading to a bad next season. They weren't bad BECAUSE of the transfers, it was just another symptom of their poor play.

Thanks for this, and some interesting ideas but ...

Would be interested to see descriptive statistics on figure of % returning minutes vs end of season rating. TBH, looks like a pretty random scatterplot of values

Crude way to quickly answer this question, but if you do a linear regression between `team_rating_end_of_year ~ returning_minutes_pct` the R-squared is about 6% (not big but not nothing) and the P-value for the coefficient is extremely close to zero.

Now returning minutes % is somewhat correlated with preseason roster score, so the even better question is if the returning minutes % adds any predictive value on top of the preseason roster score. If we do a model of `team_rating_end_of_year ~ roster_score + returning_minutes_pct`, the roster_score has a P-value of ~0 and the returning_minutes_pct has a P-value of 0.112. So not super low, but it definitely adds a little bit of value to the model. And in my out-of-sample cross-validation exercises, the model with both variables performs slightly better in predicting a team's end of season rating than the model with just the preseason roster score.

That's all just for high majors. If we expand the analysis to all D1 teams (going from a sample size of roughly 240 over three seasons to over 1000), the `team_rating_end_of_year ~ roster_score + returning_minutes_pct` model leads to returning_minutes_pct having a P-value < 0.001. I only included the graph with just the high majors in the article to make it a bit easier to understand, but the relationship is even more clear if you include all D1 teams.

Your work is always awesome, thanks Evan.

Question about causality - it seems possible to me that teams have a lot of transfers because they are bad, not that they are bad because they have a lot of transfers.

For example - a team that has struggled over the last few seasons fires a coach, lots of players leave, new coach comes in with a new system and has to recruit basically from scratch, leading to a bad next season. They weren't bad BECAUSE of the transfers, it was just another symptom of their poor play.

I just made an update to the blog post that does a much better job of showing the causality of returning player minutes on end of season ranking. https://blog.evanmiya.com/i/145424996/hitting-at-least-minutes-from-returning-players

Very interesting. Thanks, Evan.

Thanks for this, and some interesting ideas but ...

Would be interested to see descriptive statistics on figure of % returning minutes vs end of season rating. TBH, looks like a pretty random scatterplot of values

I just made an update to the blog post that does a much better job of showing the causality of returning player minutes on end of season ranking. https://blog.evanmiya.com/i/145424996/hitting-at-least-minutes-from-returning-players

Granted there is a lot of noise there for sure.

Crude way to quickly answer this question, but if you do a linear regression between `team_rating_end_of_year ~ returning_minutes_pct` the R-squared is about 6% (not big but not nothing) and the P-value for the coefficient is extremely close to zero.

Now returning minutes % is somewhat correlated with preseason roster score, so the even better question is if the returning minutes % adds any predictive value on top of the preseason roster score. If we do a model of `team_rating_end_of_year ~ roster_score + returning_minutes_pct`, the roster_score has a P-value of ~0 and the returning_minutes_pct has a P-value of 0.112. So not super low, but it definitely adds a little bit of value to the model. And in my out-of-sample cross-validation exercises, the model with both variables performs slightly better in predicting a team's end of season rating than the model with just the preseason roster score.

That's all just for high majors. If we expand the analysis to all D1 teams (going from a sample size of roughly 240 over three seasons to over 1000), the `team_rating_end_of_year ~ roster_score + returning_minutes_pct` model leads to returning_minutes_pct having a P-value < 0.001. I only included the graph with just the high majors in the article to make it a bit easier to understand, but the relationship is even more clear if you include all D1 teams.