Draft Combine Measures & Defense, Part Two - Full Inferential Models

In Part One of my look at draft combine measures and defensive performance, I graphed out some basic relationships among the combine measures (e.g. how height and wingspan tend to relate to each other), as well as some basic relationships between these measures and future defensive performance. To summarise briefly: there’s a very regular, linear relationshp between height and wingspan, but when looking at either measure in a vacuum, wingspan tends to do a bit better in predicting future performance.

The motivating questions behind this whole exercise, though, are along the lines of “Svi Mykhailiuk has tiny arms but is tall and an explosive leaper; is he ever going to block a shot?” Obviously, looking at combine measures in isolation can’t give a satisfactory answer. In this post, then, I go further in assessing the importance of different combine measures, specifying full inferential models that simultaneously account for all the measures taken at the combine. By “inferential”, I’m drawing a contrast with “predictive” models, which I’ll focus on in future installments. Rather than trying to optimize predictive performance for novel data (e.g. getting as close as possible to Svi’s block rate three years out), the models here are intended to provide for maximally clear inference about the role of different measures in predicting performance (e.g. saying whether Svi’s wingspan or max vertical jump tells us more about his shot blocking potential). Rather than assembling models premised on complex (and at times inscrutable) combinations of predictors, here I assemble linear models with straightforward coefficients for each of the included combine measures.

The data I used for modeling defensive statistics consist of all player-seasons where the player has fully available combine data (gathered through the stats.nba API; see end of Part One for details of dataset construction) and played at least 1000 minutes. This amounts to 1484 seasons split among 273 players. The models I fit to these data are hierarchical linear models with random intercept terms for each player – these are similar to standard OLS regressions, but account for the non-independence of different seasons played by the same player (see Wiki page on multilevel models for repeated measures). I fit all the models using the R function lme4::lmer(), translating coeficient estimates to graphs using a very convenient broom.mixed to ggplot2 pipeline.

The following sections show combine predictor effects for five defensive measures. In addition to the combine predictors in the graphs, each model also contains controls for player age and position group. Full models with goodness of fit metrics are shown in the appendix at the end of the post.

Offensive Rebound Percentage

The first statistic I modeled is offensive rebound percentage – an estimate of the percentage of potential offensive rebounds a player gets when he’s on the court. Figure 1 shows the estimated relationship between combine measures and offensive rebounding, with the blue lines representing +/- one standard error for the estimated of effect. A few things jump out. Most notably, though not suprisingly: size measures are by far the most important predictors of offensive rebounding. What is surprising, however, is that standing reach and wingspan matter much more than height. This makes sense conceptually – getting rebounds requires getting your hands to the ball, not getting your forehead there. It’s also backed up by further models displayed in appendix, which show that base models with only wingspan (column 3) or standing reach (column 4) perform substantially better than models with only height (column 2).

Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Figure 1: Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Defensive Rebound Percentage

Defensive rebounding percentage (specified analogously to offensive rebounding percentage) displays similar predictor behavior, shown in Figure 2. Again standing reach, and to a lesser extent wingspan, are substantially stronger predictors than height. Notably different, max vertical leap (though not standing vertical) is a relatively strong predictor of defensive rebounding. The two vertical jump measures are highly correlated, so it’s good to probe this result a little more closely. Columns 5 and 6 in the defensive rebounding Appendix table corroborate this finding, though – base models with just a max vertical jump variable outperform those with just a standing vertical variable, and the max vertical effect size is substantially larger.

Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Figure 2: Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Block Percentage

The next defensive stat is block percentage – an estimate of the percentage of opponent 2pt field goal attempts that a player blocks when he’s on the court (Figure 3). The height/wingspan/reach effect estimates here are similar as for rebounding. Interestingly, speed and agility measures appear to be meaningfully predictive, with a quicker 3/4 sprint time corresponding to more blocks, but a quicker lane agility time corresponding to fewer. The former result is intuitive enough, though the latter is difficult to explain intuitively. Worse agility testing corresponding to better shot blocking could very well be a statistical artifact, though it’s possible that it says something about players who test well in agility relative to their performance in sprint speed and vertical jump.

Expected standard deviation difference in shot blocking based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Figure 3: Expected standard deviation difference in shot blocking based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Steal Percentage

Steal percentage corresponds to an estimate of opponent possessions ending in the player’s steal (Figure 4). The interesting thing here is the contrast between height and wingspan – wingspan is a strong positive predictor, while height is an even stronger negative predictor (standing reach, something of an amalgam of the two, is relatively neutral). This finding holds up when each of the height/length variables is included without the other two: taller players get fewer steals, while guys with longer arms get more.

Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Figure 4: Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Defensive Boxscore +/-

Finally, defensive boxscore plus-minus is a rate-adjusted defensive metric that accounts for a range of defensive boxscore stats (see https://www.basketball-reference.com/about/bpm.html). Similar to the previous stats, size measures matter most, with the effect estimate for wingspan somewhat larger and somewhat more certain than standing reach or height. When modeled in the absence of the other two variables, wingspan and standing reach generate comparable model fits, with both performing substantially better than height.

Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Figure 5: Expected standard deviation difference in steals based on one standard deviation difference in predictor; estimates based on hierarchical model of 1484 seasons of at least 1000 minutes played, grouped among 273 players

Appendix: Regression Tables

Offensive Rebound Models
Dependent variable:
Offensive Rebound Percent (scaled)
(1) (2) (3) (4) (5) (6)
poly(age, 2)1 -6.153*** -6.178*** -6.158*** -6.152*** -6.150*** -6.156***
(0.377) (0.378) (0.377) (0.377) (0.377) (0.377)
poly(age, 2)2 1.235*** 1.243*** 1.238*** 1.233*** 1.237*** 1.235***
(0.344) (0.344) (0.344) (0.344) (0.344) (0.344)
position_genWing -0.234* -0.121 -0.115 -0.250** -0.233* -0.231*
(0.123) (0.124) (0.108) (0.118) (0.123) (0.122)
position_genBig 0.765*** 0.950*** 0.957*** 0.739*** 0.768*** 0.768***
(0.185) (0.186) (0.160) (0.177) (0.185) (0.185)
height_wo_shoes -0.008 0.238*** -0.004 -0.008
(0.099) (0.078) (0.098) (0.099)
wingspan 0.155* 0.313*** 0.167* 0.156*
(0.090) (0.063) (0.086) (0.090)
standing_reach 0.274** 0.406*** 0.257** 0.271**
(0.132) (0.076) (0.127) (0.132)
weight 0.124 0.226*** 0.153** 0.147** 0.127* 0.123
(0.077) (0.076) (0.075) (0.074) (0.077) (0.077)
body_fat_pct -0.049 -0.090* -0.067 -0.053 -0.053 -0.049
(0.046) (0.046) (0.045) (0.045) (0.044) (0.045)
standing_vertical_leap 0.028 0.024 -0.013 0.057 0.039
(0.061) (0.060) (0.059) (0.058) (0.044)
max_vertical_leap 0.016 -0.016 0.0001 0.016 0.035
(0.060) (0.062) (0.060) (0.060) (0.044)
lane_agility_time 0.029 0.082** 0.036 0.038 0.031 0.027
(0.041) (0.041) (0.042) (0.041) (0.041) (0.041)
three_quarter_sprint -0.049 -0.100** -0.072* -0.040 -0.052 -0.051
(0.046) (0.044) (0.042) (0.042) (0.045) (0.045)
bench_press 0.041 0.005 0.028 0.036 0.044 0.040
(0.041) (0.042) (0.041) (0.041) (0.041) (0.041)
Constant -0.160 -0.269** -0.276*** -0.142 -0.163 -0.161
(0.112) (0.113) (0.097) (0.108) (0.112) (0.112)
Observations 1,484 1,484 1,484 1,484 1,484 1,484
Log Likelihood -697.190 -704.754 -697.472 -695.839 -695.419 -695.334
Akaike Inf. Crit. 1,428.381 1,439.507 1,424.944 1,421.678 1,422.837 1,422.667
Bayesian Inf. Crit. 1,518.523 1,519.045 1,504.482 1,501.215 1,507.677 1,507.507
Note: p<0.1; p<0.05; p<0.01
Defensive Rebound Models
Dependent variable:
Defensive Rebound Percent (scaled)
(1) (2) (3) (4) (5) (6)
poly(age, 2)1 2.003*** 1.999*** 2.008*** 2.001*** 1.996*** 1.968***
(0.459) (0.460) (0.459) (0.459) (0.459) (0.459)
poly(age, 2)2 -1.769*** -1.768*** -1.769*** -1.770*** -1.773*** -1.777***
(0.419) (0.420) (0.419) (0.419) (0.419) (0.419)
position_genWing -0.166 -0.017 0.025 -0.181 -0.168 -0.138
(0.129) (0.132) (0.115) (0.123) (0.129) (0.129)
position_genBig 0.587*** 0.830*** 0.893*** 0.562*** 0.583*** 0.609***
(0.194) (0.197) (0.170) (0.185) (0.194) (0.195)
height_wo_shoes -0.022 0.307*** -0.029 -0.016
(0.104) (0.083) (0.103) (0.104)
wingspan 0.095 0.353*** 0.077 0.098
(0.094) (0.067) (0.090) (0.095)
standing_reach 0.451*** 0.521*** 0.479*** 0.420***
(0.139) (0.080) (0.133) (0.139)
weight 0.114 0.227*** 0.161** 0.126 0.110 0.107
(0.080) (0.080) (0.080) (0.078) (0.080) (0.081)
body_fat_pct -0.048 -0.097** -0.077 -0.050 -0.041 -0.050
(0.048) (0.049) (0.048) (0.047) (0.046) (0.048)
standing_vertical_leap -0.045 -0.069 -0.113* -0.028 0.050
(0.064) (0.063) (0.062) (0.061) (0.047)
max_vertical_leap 0.137** 0.096 0.112* 0.138** 0.107**
(0.063) (0.066) (0.064) (0.063) (0.046)
lane_agility_time -0.034 0.029 -0.021 -0.027 -0.037 -0.047
(0.043) (0.044) (0.044) (0.043) (0.043) (0.043)
three_quarter_sprint 0.015 -0.059 -0.023 0.017 0.019 0.001
(0.048) (0.047) (0.045) (0.044) (0.047) (0.048)
bench_press -0.068 -0.111** -0.090** -0.071* -0.073* -0.073*
(0.043) (0.044) (0.044) (0.043) (0.042) (0.043)
Constant -0.119 -0.265** -0.305*** -0.103 -0.115 -0.133
(0.118) (0.119) (0.104) (0.113) (0.117) (0.118)
Observations 1,484 1,484 1,484 1,484 1,484 1,484
Log Likelihood -954.601 -965.539 -959.012 -952.371 -953.009 -955.118
Akaike Inf. Crit. 1,943.202 1,961.079 1,948.023 1,934.742 1,938.019 1,942.235
Bayesian Inf. Crit. 2,033.345 2,040.616 2,027.561 2,014.280 2,022.859 2,027.075
Note: p<0.1; p<0.05; p<0.01
Blocked Shot Models
Dependent variable:
Block Percent (scaled)
(1) (2) (3) (4) (5) (6)
poly(age, 2)1 -1.451*** -1.480*** -1.452*** -1.445*** -1.446*** -1.466***
(0.448) (0.449) (0.448) (0.448) (0.448) (0.447)
poly(age, 2)2 0.123 0.134 0.124 0.115 0.125 0.121
(0.408) (0.409) (0.408) (0.408) (0.408) (0.408)
position_genWing -0.377** -0.198 -0.139 -0.369*** -0.375** -0.362**
(0.146) (0.153) (0.130) (0.142) (0.146) (0.146)
position_genBig 0.382* 0.674*** 0.761*** 0.390* 0.386* 0.394*
(0.221) (0.229) (0.193) (0.213) (0.220) (0.220)
height_wo_shoes 0.090 0.474*** 0.097 0.093
(0.118) (0.097) (0.117) (0.118)
wingspan 0.301*** 0.554*** 0.319*** 0.302***
(0.107) (0.075) (0.103) (0.107)
standing_reach 0.385** 0.708*** 0.358** 0.368**
(0.158) (0.092) (0.151) (0.157)
weight -0.066 0.106 -0.002 -0.009 -0.062 -0.069
(0.091) (0.093) (0.091) (0.090) (0.091) (0.091)
body_fat_pct -0.050 -0.118** -0.084 -0.063 -0.057 -0.051
(0.054) (0.057) (0.054) (0.055) (0.053) (0.054)
standing_vertical_leap 0.044 0.048 -0.020 0.103 0.097*
(0.072) (0.074) (0.071) (0.070) (0.053)
max_vertical_leap 0.077 0.026 0.051 0.079 0.106**
(0.072) (0.076) (0.073) (0.072) (0.052)
lane_agility_time 0.083* 0.170*** 0.091* 0.096* 0.086* 0.075
(0.049) (0.051) (0.050) (0.049) (0.049) (0.049)
three_quarter_sprint -0.091* -0.167*** -0.112** -0.057 -0.095* -0.099*
(0.054) (0.054) (0.051) (0.051) (0.054) (0.054)
bench_press -0.066 -0.126** -0.091* -0.078 -0.062 -0.069
(0.049) (0.052) (0.050) (0.049) (0.048) (0.049)
Constant 0.057 -0.115 -0.171 0.057 0.052 0.050
(0.134) (0.139) (0.117) (0.130) (0.133) (0.134)
Observations 1,484 1,484 1,484 1,484 1,484 1,484
Log Likelihood -949.365 -966.040 -953.209 -950.803 -947.840 -948.219
Akaike Inf. Crit. 1,932.730 1,962.081 1,936.418 1,931.606 1,927.681 1,928.438
Bayesian Inf. Crit. 2,022.872 2,041.618 2,015.955 2,011.143 2,012.521 2,013.278
Note: p<0.1; p<0.05; p<0.01
Steal Models
Dependent variable:
Steal Percent (scaled)
(1) (2) (3) (4) (5) (6)
poly(age, 2)1 -3.527*** -3.542*** -3.549*** -3.549*** -3.540*** -3.545***
(0.634) (0.634) (0.634) (0.634) (0.634) (0.634)
poly(age, 2)2 -2.288*** -2.288*** -2.275*** -2.281*** -2.296*** -2.290***
(0.579) (0.579) (0.579) (0.579) (0.579) (0.579)
position_genWing -0.383** -0.298 -0.557*** -0.517*** -0.388** -0.368*
(0.191) (0.188) (0.167) (0.185) (0.191) (0.190)
position_genBig -0.849*** -0.710** -1.118*** -1.057*** -0.859*** -0.837***
(0.288) (0.282) (0.248) (0.279) (0.288) (0.287)
height_wo_shoes -0.324** -0.148 -0.342** -0.321**
(0.154) (0.119) (0.153) (0.153)
wingspan 0.228 0.190* 0.180 0.229
(0.140) (0.097) (0.134) (0.140)
standing_reach 0.110 0.102 0.180 0.093
(0.206) (0.120) (0.197) (0.205)
weight 0.003 0.101 -0.055 0.001 -0.006 -0.001
(0.119) (0.115) (0.117) (0.117) (0.119) (0.119)
body_fat_pct -0.052 -0.088 -0.030 -0.047 -0.034 -0.052
(0.071) (0.070) (0.070) (0.071) (0.069) (0.071)
standing_vertical_leap -0.114 -0.097 -0.112 -0.083 -0.061
(0.094) (0.091) (0.091) (0.092) (0.069)
max_vertical_leap 0.076 0.052 0.079 0.072 -0.002
(0.093) (0.094) (0.094) (0.095) (0.068)
lane_agility_time -0.067 -0.020 -0.057 -0.038 -0.075 -0.074
(0.064) (0.063) (0.064) (0.064) (0.064) (0.064)
three_quarter_sprint -0.052 -0.082 -0.096 -0.089 -0.040 -0.059
(0.071) (0.067) (0.065) (0.066) (0.070) (0.070)
bench_press 0.014 -0.018 0.028 0.013 0.001 0.011
(0.064) (0.063) (0.064) (0.064) (0.063) (0.064)
Constant 0.383** 0.302* 0.541*** 0.508*** 0.395** 0.376**
(0.174) (0.171) (0.151) (0.169) (0.174) (0.174)
Observations 1,484 1,484 1,484 1,484 1,484 1,484
Log Likelihood -1,443.340 -1,444.982 -1,444.058 -1,445.385 -1,442.627 -1,442.218
Akaike Inf. Crit. 2,920.681 2,919.964 2,918.115 2,920.769 2,917.253 2,916.435
Bayesian Inf. Crit. 3,010.823 2,999.502 2,997.653 3,000.307 3,002.093 3,001.275
Note: p<0.1; p<0.05; p<0.01
DBPM Models
Dependent variable:
Defensive Boxscore +/- (scaled)
(1) (2) (3) (4) (5) (6)
poly(age, 2)1 -0.019 -0.066 -0.019 -0.009 -0.014 -0.029
(0.622) (0.623) (0.622) (0.622) (0.622) (0.621)
poly(age, 2)2 -2.281*** -2.270*** -2.283*** -2.295*** -2.279*** -2.283***
(0.568) (0.569) (0.568) (0.568) (0.568) (0.568)
position_genWing -0.365** -0.218 -0.140 -0.336** -0.363** -0.357**
(0.174) (0.176) (0.153) (0.168) (0.173) (0.173)
position_genBig -0.197 0.043 0.161 -0.155 -0.194 -0.191
(0.261) (0.263) (0.226) (0.252) (0.261) (0.261)
height_wo_shoes 0.155 0.466*** 0.160 0.157
(0.140) (0.111) (0.139) (0.139)
wingspan 0.313** 0.514*** 0.325*** 0.314**
(0.127) (0.089) (0.122) (0.127)
standing_reach 0.260 0.637*** 0.243 0.251
(0.187) (0.108) (0.179) (0.186)
weight -0.057 0.098 0.007 0.009 -0.055 -0.059
(0.108) (0.107) (0.107) (0.106) (0.108) (0.108)
body_fat_pct 0.026 -0.032 -0.006 0.011 0.022 0.026
(0.064) (0.065) (0.064) (0.064) (0.063) (0.064)
standing_vertical_leap 0.028 0.044 -0.020 0.091 0.055
(0.086) (0.085) (0.083) (0.083) (0.062)
max_vertical_leap 0.039 -0.003 0.019 0.043 0.058
(0.085) (0.088) (0.085) (0.086) (0.062)
lane_agility_time 0.022 0.098* 0.026 0.032 0.024 0.018
(0.058) (0.058) (0.059) (0.058) (0.058) (0.058)
three_quarter_sprint -0.078 -0.136** -0.082 -0.033 -0.081 -0.082
(0.065) (0.062) (0.060) (0.060) (0.064) (0.064)
bench_press 0.018 -0.035 -0.005 0.004 0.021 0.016
(0.058) (0.059) (0.058) (0.058) (0.057) (0.058)
Constant 0.170 0.029 -0.044 0.152 0.167 0.166
(0.158) (0.160) (0.138) (0.153) (0.158) (0.158)
Observations 1,484 1,484 1,484 1,484 1,484 1,484
Log Likelihood -1,399.362 -1,408.063 -1,400.914 -1,400.489 -1,397.876 -1,397.918
Akaike Inf. Crit. 2,832.724 2,846.126 2,831.828 2,830.977 2,827.752 2,827.837
Bayesian Inf. Crit. 2,922.866 2,925.663 2,911.365 2,910.515 2,912.592 2,912.677
Note: p<0.1; p<0.05; p<0.01