In previous iterations of my draft combine and defensive performance posts, I’ve assessed the basic distribution of combine measures like height, wingspan, vertical jump, and sprint speed; described how these measures correspond to basic measures of defensive and rebounding prowess; and fit linear models to the combine and performance data, estimating the influence of each combine measure in the presence of all the others and providing for a baseline predictive model.
In this post, I’ll go a step further in the development of a genuinely predictive model, modifying my approach in two ways: 1) using a neural network model, as opposed to the previous hierarchical linear model, which can more flexibly determine the combinations of draft combine measures that best predict defensive stats, and 2) using a cross validation method that selects for the best predicting model fit and estimates the likely predictive performance of the model when faced with new data.
The neural net I’ll be fitting here is of the feed forward variety, with a single hidden layer of nodes connecting the combine measure inputs to a single defensive metric output, in this case block rate. I’ll be using the R package
nnet for this task, along with multiple tools in the
tidyverse and a few graphing functions in
NeuralNetTools to explore the resulting models.
Standardizing each predictor and outcome variable, so that it has mean 0 and standard deviation 1. This allows for easy comparison across predictor variables and models, and also improves the computational performance of the models.
Pruning the dataset so that each row corresponds to a player-season in which that player has full data for all the combine measures included as predictors in the model, and the player logged at least 1000 minutes in the season.
Setting up cross validation and testing folds, such that parameters in the neural network model can be tuned to their best-performing values, and the model’s performance on novel data can be assessed.
Step 3) here is the most involved. I use a nested 10-fold cross validation procedure, wherein the dataset is split into 10 roughly even sections. Each of the 10 folds is predicted using models fit to the other 9 folds, with the modeling function using various combinations of parameter values (namely, the number of nodes in the hidden layer and the value of a decay parameter that guards against over-fitting). The parameter combination that does the best job of predicting the observed block rates across each of the 10 left-out folds is then used for a final predictive model fit to the full dataset.
There are a couple of wrinkles here. First, while a single cross-validation procedure is valid for selecting the best performing set of parameter values for the modeling function, it’s not valid for testing the performance of this model in a way that extends to new, unseen data points. This is because we’ve peeked at the testing data – we’ve seen what specific set of parameter values work best for the observed data, which are by definition the best-performing for these data, but which might not be the best predicting for the super-population of data. For instance, we might find that including 5 nodes in the hidden layer works best for the data in hand, but we can’t be sure that it’d be the best choice for 1,000 different player-seasons. By assuming that the error we observed in the validation data would carry over to the new data, we’d be systematically overestimating our performance, since it might very well be the case that the optimal parameters in the observed data would be somewhat suboptimal for novel data. To work around this, I perform a nested validation procedure – within each of the initial 10 training/testing splits, I again split the training data into 10 folds. This allows me to find 10 different optimized models, and to use each of these models to generate and test model predictions on observed data that didn’t inform the model selection. Averaging across each set of predictions, I’m able to estimate the performance of my modeling procedure on new data.
Second, it would be improper to generate training/validation/testing splits by just randomly assigning each row of the initial dataset (which again corresponds to each valid player-season in the data). This is because a given player is likely to have multiple observed seasons. If a training dataset includes, say, Hassan Whiteside’s 2014 through 2016 seasons, and the corresponding test set includes his 2017 season, we’d get a biased sense of the model’s ability to predict the shot-blocking prowess of a guy with Whiteside’s measures, since the outcome to be predicted will be artificially correlated with outcomes used to fit the model. To guard against this, I split the data not according to player-season, but according to player – first assigning players according to the cross-validation scheme described above, and then assigning player-seasons according to those splits.
Following the data splitting procedure discussed above, I generate predictions for each of 10 (unseen) testing folds, using models fit to the remainder of the data. This gives 10 different predictions for sets of observed data of between 102 and 189 player-seasons, using models fit to sets of 1295 and 1382 player-seasons.
For purposes of comparison, I carry out this same predictive exercise for the same hierarchical linear modeling (HLM) approach used in Part Two of my combine-defensive posts (though generating predictors for the HLM approach is a bit easier, since there are no hyperparameters to tune).
Averaging prediction error across all 10 test folds, the hierarchical linear modeling procedure generates an average RMSE of 0.67, while the neural net modeling procedure generates an average RMSE of 0.64. This means that, as expected, the neural net provides a somewhat more accurate set of predictions (for comparison, a baseline HLM procedure that uses just player position and age generates an average RMSE of 0.75). Figure 1 shows the average errors across test folds and the errors for each individual fold.
Influence of Combine Predictors
After testing the performance of this neural net modeling procedure on data subsets, the next step is to fit a model to the full dataset and examine properties of this model. To fit the model, I again perform a 10-fold cross validation procedure to find the best-performing set of model parameters. Doing this, again using the
nnet:nnet() function, I arrive at a model with 5 nodes in its hidden layer and a ‘decay’ parameter set to 5. The resulting model, fit to the full set of observations, is represented in Figure 2.
While the above diagram, produced using the
NeuralNetTools package, is useful in understanding how exactly input predictor values are translated to an output prediction of blocked shot rate, it’s obviously a bit tricky read predictor importance/effect sizes directly from it. A few other
NeuralNetTools functions are more directly useful for that task. Namely, the package contains tools to graph variable importance using Garson’s and Olden’s algorithms, respectively.
I focus here on Olden’s algorithm (see here for details), with results displayed in Figure 3. The results are intuitive and match with previous findings: standing reach is most important for predicting blocked shots, followed by wingspan and then height; max vertical leap is more informative than standing vertical leap; and older, slower players carrying more body fat and playing on the wing block fewer shots.
While the above provides a useful look at performance, it doesn’t actually show us anything about effect sizes/outcome responses. To examine how much shot blocking improvement we might expect from, say, a one standard deviation increase in wingspan, we need to plot out some marginal responses. This is done in Figure 4. For each position group, the predicted (standardized) block rate is graphed over the entire range of a targeted predictor variable, while all other predictor variables are set to their position group mean.
The results here match up with what we see in the variable importance graph – standing reach, wingspan, and height are all positively associated with blocked shots in descending order of importance, max vertical leap has more of an impact than standing vertical leap, etc. What this sort of marginal effects graph offers, however, is not just a direct assessment of effect sizes, but also a view of some of the nuances of variable interactions and other non-linearities. For instance, the effects of age, height, vertical leap, and lane agility all appear fairly dependent on position group: height, age, and vertical jump all matter more for bigs, while lower body fat and quicker lane agility times are more beneficial for wings and lead guards. A speculative explanation here is that blocking shots at the rim requires more size and vertical explosion, while collecting blocks/tips as a guard requires more quickness and defensive savvy. (It also might be the case that bigs with quicker lane agility times are more likely to switch picks and otherwise be deployed on the perimeter, and so are less likely to be in position to block shots at the rim.) Also, while most marginal effect curves within position groups appear fairly linear, there are some notable exceptions, such as for standing reach among wings and bench press among bigs.
Finally, it’s important to note here that these marginal effect curves don’t tell us how predictors might interact with other, non-positional predictors. For instance, all we see here is how max vertical leap predicts blocked shot rates for players with average standing reach, but the marginal affect of vertical leap might look significantly different for exceptionally short or exceptionally long players.
A Look at Predictions
In addition to looking at predictor effects for the final model, we can also look at the predicted values themselves. There are a number of different diagnostic procedures that rely on comparison between predicted and observed outcomes (Q-Q plots, residual plots, etc.). Here, though, I’ll focus on two things: a look at which players in the observed data have over- and under-performed their combine measures with respect to shot blocking, and a look at the projected performance for players in the 2018 draft class.
Figure 5 gives scatterplots of predicted (standardized) block rates versus the actually observed values, split out by position group. Additionally, I’ve overlayed the 6 players in each position group who have played seasons (1000+ minutes) with the greatest over-performance of combine-based projections, as well as the 6 players with greatest under-performance. A lot of these outlying players fit conventional wisdom about defensive prowess. One exception is Kevin Durant as someone who blocks surprisingly few shots; while he’s been a prolific shot blocker the past few years in Golden State, he’s had several years in OKC where he blocked less than a shot per game, despite logging heavy minutes and being in possession of notoriously spidery arms.
Finally, Figure 6 shows predicted (non-normalized) block rates for 2018-19 first-year players who went through the combine and have played in at least one game thus far this year (through Nov. 17). For about a third of these players, full data is available for height, standing reach, wingspan, weight, and body fat percentage, but (usually all) athleticism measurement data is missing. For these players, highlighted in red, this missing data is imputed using data relationships observed in the full combine dataset.
While it’s extremely early, and even rotational players have a very limited sample size, the predictive model does a pretty decent job predicting early season block rates – players who are anticipated to block a lot of shots generally do, and players who are anticipated to not generally don’t. Mo Bamba, Jaren Jackson, and Wendell Carter are all blocking a ton of shots (ranking 5th, 7th, and 8th in the NBA in block rate, respectively), and these were the three players the model predicted to block the most (albeit at a lower rate than we actually observe). Likewise, it should come as no surprise that players like Jalen Brunson or Svi Mykhailiuk have yet to block their first NBA shot. In terms of early model misfires, Allonzo Trier has substantially over-performed his expected block rate (largely on the strength of a 3-block performance in a blowout loss to the Magic on the 11th), while Knicks teammate Kevin Knox has not to-date translated his length into many blocks.
Modeling Next Steps
Moving beyond the predictive modeling procedure discussed here, there are a number of productive next steps that could be taken Ideally, a model should be able to take account of the repeated measures structure of the data – the observed dataset consists of multiple seasons for a given player – and while the procedure here takes account of this in validation and testing, the neural network model itself doesn’t distinguish between season observations for the same or different players. Additionally, the model doesn’t take advantage of patterns of observed player-seasons with missing predictors – list-wise deletion means that information from predictors that are specified is thrown away, and no information is gleaned from patterns of missingness themselves (if, say, a player takes part in all testing except vertical jumps, this might tell us something useful about that player’s athletic profile). One modeling approach that could potentially address both issues of repeated measures and missingness that I’ll look to explore in future work is using certain forms of recurrent (rather than feed-forward) neural networks (see, e.g., “Recurrent Neural Networks for Multivariate Time Series with Missing Values”, https://www.nature.com/articles/s41598-018-24271-9).
Additionally, the use of other machine learning methods besides neural networks, such as random forests, boosted regressions, support vector machines, and model averaging techniques, could potentially be of use in improving predictive performance. Likewise, the use of Bayesian multilevel models, with variances on priors tuned to maximize predictive performance (akin to penalized regression) could be of additional use by 1) directly incorporating the repeated measures structure of the data, 2) providing probability distributions for predictor effects, and 3) providing probability distribution for predictions.
Finally, looking beyond specific modeling techniques, the predictions discussed here could almost certainly be improved by including pre-NBA defensive statistics. Nearly all the players listed here had accrued statistics in the NCAA or non-NBA professional leagues at the time of their combine measures, and incorporating these statistics into a predictive model would provide for tighter predictions of greater direct use for basketball decision-makers.