Where Guys Are From, Part One - Heat Maps

I’ve at some point described every place I’ve ever lived as “a good town/city/region/state for basketball”. As clearly as this judgment is tainted by a sort of home-spun narcissism rootless tribalism goodhearted loyalty, it’s also clear that some places are better than others. Which are those places?

To begin answering this question, I pick from Basketball Reference’s extraordinary stock of information, scraping their Birth Places page to get the country/state and city of birth for every player in NBA history.There’s a lot that one could do with this data to look at how different places stack up with respect to basketball fecundity; I’ll start by making ‘heat maps’ that show where players in the contiguous US are born and how these locations have shifted over time. I’ll also provide some comments on what these maps show. This post begins with the maps/analysis, with an extended walkthrough of the code used to produce them at the end.

Without further ado, the following gif shows the density of player origins for all NBA players born in the lower 48 states from 1928 – the year in which early lights like Bob Cousy and Dolph Schayes were born, 18 years before the NBA came into being as the BAA, and 22 years before it became tentatively integrated. (Small caveat: of the 3360 players who fit this description, one was excluded because his birth city wasn’t given on either Basketball Reference or his Wiki page; sorry George Peeples.) There’s also a set of static images of the same data below the gif, showing samples of the same data, along with a composite map showing cumulative origins from 1928 on.

I’ll be exploring the data displayed here in greater depth in future posts. For now, though, a few tentative observations:

  • New York and northern New Jersey are the clear epicenter of basketball talent over the course of the 20th century, though their dominance has waned considerably – about a quarter of pros born between 1928 and 1937 were from New York, down to about 8% (of US-born players only) between 1989 and 1998.

  • The rest of the I-95 corridor down to DC has also generated a ton of players, with Philadelphia contributing a high density of players starting in the 1930s, and DC/Maryland emerging as a center of talent in the ’40s and ’50s.

  • Appalachia was once a real basketball power. A ton of players came out of eastern Kentucky and neighboring West Virginia during the late 1920s, 1930s, and early 1940s, including Hal Greer, Jerry West, and Rod Thorn over a 5-year span in WV. This torrent of player births slowed considerably by the ’50s, with KY-born players shifted westward and concentrated largely in Louisville.

  • NBA births in the deep South were nearly non-existant before 1938. During the ’40s, ’50s, and ’60s, player births in the South became much more frequent, and were quite spatially dispersed. Starting in the 1970s, southern-born players became increasingly concentrated in a few metro areas, namely Atlanta (and also Houston and Memphis).

  • In the Midwest, Chicago has been a predominant source of talent throughout the century. Detroit started producing a large concentration mid-century, before fading a bit, while Indianapolis has gained in prominence more recently. After an active ’30s and ’40s, St. Louis has faded substantially, while neither Milwaukee nor Mineapolis have ever produced a great number of players.

  • The Los Angeles region started slowly during the first half of the century, but has grown to become the largest single geographical source of players in the world. Since 1990, 29 NBA players have been born in LA County, which is more than the next two counties combined – Cook, IL (Chicago) with 13 and Harris, TX (Houston) with 11. During this timeframe, the entire state of NY produced 27 players.

  • As with Los Angeles but on smaller scales, San Diego, Portland, and Seattle all came into basketball prominence over the course of the Century. Meanwhile, the SF Bay Area maintained its initially strong presence as a producer of basketball talent.


Description of Code

The first task in producing these maps was to gather and process birth place data. I did this by scraping data from Basketball Reference’s Player Origins page using the following R code. (I also used SelectorGadget CSS selector to help identify the HTML nodes I needed to specify in rvest::html_nodes())

library(rvest)
library(dplyr)

# specify main URL to scrape from and base URL to append player sub-pages to
url_main <- "https://www.basketball-reference.com/friv/birthplaces.fcgi"
url_base <- "https://www.basketball-reference.com"

page_main <- read_html(url_main)

# Read the nodes corresponding to US states on main Player Origins page
url_statesVector <- page_main %>%
  html_nodes('#birthplace_1 a') %>%
  html_attr('href')


# Set up list object, write player origin data frames to it iteratively, with a new data frame of players for every state sub-page from above
player_full_foreign_list <- list()

for(i in 1:length(names_statesVector)){
  page_state <- read_html(paste(url_base,url_statesVector[[i]],sep=''))
  
  player_names_births <- page_state %>%
    html_nodes('#stats .left') %>%
    html_text() # read player data giving name and birth place
  
  player_names_births <- cbind.data.frame(
    split(
      player_names_births,
      rep(1:3,times=length(player_names_births)/3)
    ),
    stringsAsFactors = FALSE
  ) %>% rename(
    Name = `1`, Birth_Date = `2`, Birth_City = `3`
  )
  
  player_names_births$Birth_Year <- player_names_births$Birth_Date %>%
    str_sub(-4,-1)
  player_names_births$Birth_State <- names_statesVector[[i]]
  player_names_births$Nationality <- 'Domestic'
  player_names_births$Name <- player_names_births$Name %>%
    str_remove("[*]")
  
  player_names_births$Profile_URL <- page_state %>%
    html_nodes('#stats a') %>%
    html_attr('href') # read Bask Ref player page URL
  
    player_stats <- page_state %>%
    html_nodes('#stats .right') %>%
    html_text() # read selection of 
  
  player_stats <- cbind.data.frame(
    split(player_stats, rep(1:27, times=length(player_stats)/27)),
    stringsAsFactors = FALSE
  ) %>%
    select(
      Years_Played = `2`,
      Year_Start = `3`,
      Year_End = `4`,
      Games_Played = `5`,
      Minutes_Played = `6`,
      Points = `20`
    )
  
  player_full <- cbind.data.frame(
    player_names_births,
    player_stats
  )
  
    player_full_domestic_list[[i]] <- player_full
}

The above code chunk generates a list of 50 state-specific dataframes with basic player attributes, like so:

## Observations: 82
## Variables: 13
## $ Name           <chr> "Michael Ansley", "Keith Askins", "Carl Bailey"...
## $ Birth_Date     <chr> "Feb 8, 1967", "Dec 15, 1967", "Apr 23, 1958", ...
## $ Birth_City     <chr> "Birmingham", "Athens", "Birmingham", "Leeds", ...
## $ Birth_Year     <chr> "1967", "1967", "1958", "1963", "1989", "1953",...
## $ Birth_State    <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al...
## $ Nationality    <chr> "Domestic", "Domestic", "Domestic", "Domestic",...
## $ Profile_URL    <chr> "/players/a/anslemi01.html", "/players/a/askink...
## $ Years_Played   <chr> "3", "9", "1", "16", "8", "6", "1", "8", "9", "...
## $ Year_Start     <chr> "1990", "1991", "1982", "1985", "2011", "1976",...
## $ Year_End       <chr> "1992", "1999", "1982", "2000", "2018", "1984",...
## $ Games_Played   <chr> "149", "486", "1", "1073", "492", "366", "47", ...
## $ Minutes_Played <chr> "2143", "7983", "7", "39330", "13631", "7046", ...
## $ Points         <chr> "1026", "1852", "2", "23757", "6793", "2821", "...

Next, I loop through these dataframes, creating a new city/state abbreviation column, inputing this column into the ggmaps::mutate_geocode() function for geocoding, gathering and processing the results (a pair of latitude/longitude coordinates for each player’s hometown), and doing a bit of cleanup to fix missing or erroneous data, ending at long last with a (relatively) clean and complete dataframe of domestic player birth locations.

domestic_locations_list <- list() # create list object to store geocode results
domestic_locations_2ndPass_list <- list() # create list object to store geocodes that didn't process the first time around

state_abrevKey <- player_full_domestic_list %>%
  bind_rows() %>%
  select(Birth_State) %>%
  distinct() # get a list of all the distinct states in the dataset

state_abrevKey$Birth_State_Abrev <- c(
  "AL","AK","AZ","AR","CA","CO","CT","DE","DC","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA","ME",
  "MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK","OR","PA",
  "RI","SC","SD","TN","TX","UT","VA","WA","WV","WI","WY"
) # append abbreviation codes to all states

# loop through list of states, creating place name variable, geocoding, merging geocode, and writing to output list
for(i in 1:length(player_full_domestic_list)){
  df <- player_full_domestic_list[[i]] %>%
    left_join(state_abrevKey) %>%
    mutate(
      Birth_Place = paste(Birth_City, ', ', Birth_State_Abrev, sep = '')
    ) # use state abbreviations to create geocode-ready variable

  df_geocoded <- df  %>% # geocode using Data Science Toolkit source
    select(Birth_Place) %>%
    distinct() %>% # filter out duplicate place names -- no need to re-geocode them
    mutate_geocode(Birth_Place, source = "dsk")
  
  domestic_locations <- df %>%
    left_join(df_geocoded) # join geocode results
  
  domestic_locations_list[[i]] <- domestic_locations # write to output list
}

# write full output list to disk (it will likely have taken a while to generate)
save(domestic_locations_list,
     file = here("content", "data", "domestic_locations_list.RData"))


# some places may have failed to geocode due to server error
# identify these places, and run them through a second time
#  (two passes was sufficient for me, but this step could be iterated further)
for(i in 1:length(player_full_domestic_list)){
  df <- domestic_locations_list[[i]] %>%
    filter(
      is.na(lon) # get records which don't have lat/lon data
    )
  
  df2 <- df  %>%
    select(Birth_Place) %>%
    distinct() # again, look only at distinct places
  
  if(nrow(df) > 0){ # re-geocode
    df_geocoded <- mutate_geocode(df2,
                                  Birth_Place, source = "dsk")
    
    domestic_locations <- df %>%
      select(-lat, -lon) %>%
      left_join(df_geocoded)
    
    domestic_locations_2ndPass_list[[i]] <- domestic_locations
  }
}

save(domestic_locations_2ndPass_list,
     file = here("content", "data", "domestic_locations_2ndPass_list.RData"))


# create full dataframe of all domestic players with geocoded hometowns
#   accomplish this by row-binding state first-pass lists, cutting out
#   missing geocodes, and binding in state 2nd-pass lists
player_geocode_final_df <- bind_rows(
  domestic_locations_list,
  foreign_locations_list
) %>%
  filter(!is.na(lat)) %>%
  bind_rows(
    domestic_locations_2ndPass_list,
    foreign_locations_2ndPass_list
  )

# Finally, I hand-correct some missing/erroneous data
# Errors came from misspelled hometowns, incorrect home states,
#   missplaced coordinates (e.g. Queens, NY was placed in western part of state),
#   and missing hometown on Bask Ref website (I used Wikipedia to add where possible)

# Error correction #1: Carrick Felix in Las Vegas, *NV*
player_geocode_final_df$Birth_State[
  player_geocode_final_df$Name == 'Carrick Felix'
  ] <- 'Nevada'
player_geocode_final_df$Birth_State_Abrev[
  player_geocode_final_df$Name == 'Carrick Felix'
  ] <- 'NV'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Carrick Felix'
  ] <- 'Las Vegas, NV'
player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Carrick Felix'
  ] <- player_geocode_final_df$lon[
    player_geocode_final_df$Birth_Place == 'Las Vegas, NV'
    ][2]
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Carrick Felix'
  ] <- player_geocode_final_df$lat[
    player_geocode_final_df$Birth_Place == 'Las Vegas, NV'
    ][2]


# Error correction #2: Kirk Haston is from Lobelville, not Loblesville, TN
player_geocode_final_df$Birth_City[
  player_geocode_final_df$Name == 'Kirk Haston'
  ] <- 'Lobelville'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Kirk Haston'
  ] <- 'Lobelville, TN'

geocode_Haston <- geocode('Lobelville, TN', source = 'google')

player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Kirk Haston'
  ] <- geocode_Haston$lon
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Kirk Haston'
  ] <- geocode_Haston$lat



# Error correction #3: Gaylon Nickerson is from Osceola, AR, not Osecola
player_geocode_final_df$Birth_City[
  player_geocode_final_df$Name == 'Gaylon Nickerson'
  ] <- 'Osceola'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Gaylon Nickerson'
  ] <- 'Osceola, AR'

geocode_Nickerson <- geocode('Osceola, AR', source = 'google')

player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Gaylon Nickerson'
  ] <- geocode_Nickerson$lon
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Gaylon Nickerson'
  ] <- geocode_Nickerson$lat



# Error correction #4: Major Jones is from McGehee, AR, not McGhee
player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Major Jones'
  ] <- player_geocode_final_df$lon[
    player_geocode_final_df$Birth_Place == 'McGehee, AR'
    ][1]
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Major Jones'
  ] <- player_geocode_final_df$lat[
    player_geocode_final_df$Birth_Place == 'McGehee, AR'
    ][1]

player_geocode_final_df$Birth_City[
  player_geocode_final_df$Name == 'Major Jones'
  ] <- 'McGehee'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Major Jones'
  ] <- 'McGehee, AR'

# Error correction #5: Lowes Moore is from Plymouth, NC, not SC
player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Lowes Moore'
  ] <- player_geocode_final_df$lon[
    player_geocode_final_df$Birth_Place == 'Plymouth, NC'
    ][1]
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Lowes Moore'
  ] <- player_geocode_final_df$lat[
    player_geocode_final_df$Birth_Place == 'Plymouth, NC'
    ][1]

player_geocode_final_df$Birth_State[
  player_geocode_final_df$Name == 'Lowes Moore'
  ] <- 'North Caroline'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Lowes Moore'
  ] <- 'Plymouth, NC'


# Error correction #6: Queens, NY is placed in western NY; reassign lat/long to Astoria's
player_geocode_final_df$lon[
  player_geocode_final_df$Birth_Place == "Queens, NY"
  ] <- player_geocode_final_df$lon[
    player_geocode_final_df$Birth_Place == 'Astoria, NY'
    ][1]
player_geocode_final_df$lat[
  player_geocode_final_df$Birth_Place == "Queens, NY"
  ] <- player_geocode_final_df$lat[
    player_geocode_final_df$Birth_Place == 'Astoria, NY'
    ][1]


# Error correction #7: Mt Vernon, NY is placed in western NY; reassign lat/long to the Bronx's
player_geocode_final_df$lon[
  player_geocode_final_df$Birth_Place == "Mount Vernon, NY"
  ] <- player_geocode_final_df$lon[
    player_geocode_final_df$Birth_Place == 'Bronx, NY'
    ][1]
player_geocode_final_df$lat[
  player_geocode_final_df$Birth_Place == "Mount Vernon, NY"
  ] <- player_geocode_final_df$lat[
    player_geocode_final_df$Birth_Place == 'Bronx, NY'
    ][1]

# Place Orbie Bowling in Sandy Hook, KY (this information is missing from his Basketbally Reference page
#   but is given on his Wiki page)
player_geocode_final_df$Birth_City[
  player_geocode_final_df$Name == 'Orbie Bowling'
  ] <- 'Sandy Hook'
player_geocode_final_df$Birth_Place[
  player_geocode_final_df$Name == 'Orbie Bowling'
  ] <- 'Sandy Hook, KY'

geocode_Haston <- geocode('Sandy Hook, KY', source = 'dsk')

player_geocode_final_df$lon[
  player_geocode_final_df$Name == 'Orbie Bowling'
  ] <- geocode_Haston$lon
player_geocode_final_df$lat[
  player_geocode_final_df$Name == 'Orbie Bowling'
  ] <- geocode_Haston$lat


# We finally have a pretty good player location file!
save(player_geocode_final_df,
     file = here("content", "data", "player_geocode_final_df.RData"))

With a dataset now in hand, I proceed with the more interesting task of representing the data. The first thing I do is to load a base map from the maps package, transform its coordinate system into something appropriate for continent-wide cartography and compatible with distince-based density mapping, and save the lat/long limits of the map.

library(maps)

states_map <- maps::map(database = "state", fill = TRUE,
                  col = "transparent", plot=FALSE)

# convert to spatial dataframe in order to apply projection to coordinates
states_spdf <- map2SpatialPolygons( 
  map = states_map,
  IDs = states_map[["names"]],
  proj4string = CRS("+proj=longlat +datum=WGS84")
)

states_spdf_proj <- spTransform(
  states_spdf,
  CRSobj = CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs") # project into Lambert Conformal coord sys
)

# convert projected spdf back to regular df (easy to map with ggplot2)
states_spdf_proj_df <- fortify(states_spdf_proj)

states_min_long <- min(states_spdf_proj_df$long)
states_max_long <- max(states_spdf_proj_df$long)
states_min_lat <- min(states_spdf_proj_df$lat)
states_max_lat <- max(states_spdf_proj_df$lat)

Shifting focus to the dataframe of player locations, I take the subset of these in the lower 48, with birth years going back to 1928.

# Take subset of players
playerCityPoints_lower48_allYear <- player_geocode_final_df %>%
  filter(Birth_City != '') %>%
  filter(Nationality == 'Domestic' & !(Birth_State %in% c('Alaska','Hawaii')))

# define initial projection and transform to Lambert Conformal
playerCityPoints_lower48_allYear_sp <- playerCityPoints_lower48_allYear
coordinates(playerCityPoints_lower48_allYear_sp) <- ~lon + lat
proj4string(playerCityPoints_lower48_allYear_sp) <- CRS("+proj=longlat +datum=WGS84")

playerCityPoints_lower48_allYear_sp_proj <- spTransform(
  playerCityPoints_lower48_allYear_sp,
  CRSobj = CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")
)

Next, I specify a function to calculate player birth density using KernSmooth::bkde2D(), and I use this function to create a list of dataframes giving densities for progressive 10-year intervals starting in 1928 and ending in 1998.

library(KernSmooth)

dens_func <- function(start_year){
  df_spatPt <- subset(playerCityPoints_lower48_allYear_sp_proj,
                      Birth_Year >= start_year & Birth_Year < (start_year + 10))
  
  rast <- KernSmooth::bkde2D(
    x = df_spatPt@coords,
    bandwidth = c(40000,40000), #bandwidth of kernel function is in meters
    gridsize = c(
      400, # grid will be 400 units wide, proportional num. tall
      round( 400 * (states_max_lat - states_min_lat + 100000) /
               (states_max_long - states_min_long + 200000) )
    ),
    range.x = list( # extend the density grid a bit beyond US borders
      c(states_min_long - 100000, states_max_long + 100000),
      c(states_min_lat - 50000, states_max_lat + 50000)
    )
  )
  
  # convert output of bkde2d into a simple dataframe
  rast_expandDF <- expand.grid(rast[['x1']],
                               rast[['x2']])
  rast_expandDF$fhat = as.vector(rast[['fhat']])

  # transform density values to scale and shrink extreme values by taking root
  rast_expandDF$fhat <- (rast_expandDF$fhat * 10^12) ^ (1/2)
  
  return(rast_expandDF)
}
  

rast_list <- list() # initialize list to write 10-year rasters
year_seq <- seq(from = 1928, to = 1988, by = 1)

# apply density func to each 10-year interval
for(i in 1:length(year_seq)){
  rast_list[[i]] <- dens_func(year_seq[i])
}

# calculate max density across all 10-year spans (used later for scaling colors)
dens_max <- rast_list %>% bind_rows() %>%
  summarise(max = max(fhat))

Next, I create a density mapping function and a timeline graphing function (both based on ggplot2), and use these functions to produce all of the individual map frames for the density gif.

library(ggplot2)
library(ggthemes)
library(cowplot)

heatMap_func <- function(rast_df){
  p <- ggplot() +
    geom_raster(
      data=rast_df,
      aes(x=Var1, y=Var2, fill=fhat), alpha=1
    ) +
    geom_path(
      data = states_spdf_proj_df,
      aes(x = long, y = lat, group = group),
      color = 'white', alpha = 0.1
    ) +
    scale_x_continuous(breaks = c(states_min_long,states_max_long)) +
    scale_fill_viridis(name = 'Density of\nplayer births',
                       option = 'inferno',
                       limits = c(0, dens_max[1,1])
                       ) +
    labs(title = 'Density of NBA Player Births',
         subtitle = '10-year spans from 1928 to 1998') +
    coord_fixed() +
    theme_map() +
    theme(
      legend.position = "none"
      )
  
  return(p)
  
}


timeline_func <- function(start_year){
  
  timeframe_df <- data.frame(x = start_year,
                             y = '')
  
  t <- ggplot(data = timeframe_df) +
    geom_segment(aes(x = start_year, xend = start_year + 10,
                     y = y, yend = y),
                 size = 3,
                 color = plasma(10)[5]) +
    geom_vline(xintercept = start_year,
               color = plasma(10)[5], linetype = 3) +
    geom_vline(xintercept = start_year + 10,
               color = plasma(10)[5], linetype = 3) +
    
    scale_x_continuous(limits = c(1925,2000),
                       breaks = seq(1925,2000,5)) +
    theme_tufte() +
    theme(
      axis.title = element_blank(),
      axis.ticks.y = element_blank()
    )
}


for(i in 1:length(year_seq)){
  
  p <- heatMap_func(rast_list[[i]])
  t <- timeline_func(year_seq[[i]])
  
  g <- plot_grid(p,t, nrow = 2, ncol = 1,
                 rel_heights = c(6,1))
  
  ggsave(
    filename = here(
      "static", "images", "heatmap_gif_frames",
      paste(
        'heatmap_grid_frame_',
        str_pad(i, 2, pad = "0"),
        '.png', sep='')
    ),
    plot = g,
    width = 7, height = 5,
    units = 'in'
  )
  
}

The above production of map frames shouldn’t take to long. The following operation of combining these frames into a single gif, using the magick package, unfortunately does – about an hour in my case. (I’ve yet to try it, but the recent gifski package here might handle the task more swiftly.). My code snippet here borrows heavily from Ryan Peek.

library(purrr)
library(magick)

list.files(
    path = here("static", "images", "heatmap_gif_frames"),
    pattern = "*.png", full.names = T
  ) %>% 
    purrr::map(image_read) %>% # reads each path file
    image_join() %>% # joins image
    image_animate(fps=4) %>% # animates
    image_write(
      here("content", "images", "heatmap.gif")
    ) # write to current dir

Finally, I produce a static, multipanel map by selected the density data behind a set of 7 10-year spans that went into producing the gif, along with an 8th dataframe giving the density across the entire 1928-1998 span. Since I thought it important to use a different color scale for this all-year dataset, I used gridExtra::arrangeGrob() to manually combined the plots, rather than the more convenient ggplot2::facet_wrap().

library(gridExtra)

# create density dataframes for static/faceted
examp_years <- c(1,11,21,31,41,51,61)
examp_rast_list <- list()

for(i in 1:length(examp_years)){
  
  examp_rast_list[[i]] <- rast_list[[ examp_years[i] ]]
  examp_rast_list[[i]]$years <- paste(
    year_seq[ examp_years[i] ],
    ' to ',
    (year_seq[ examp_years[i] ] + 9),
    sep = ''
  )
}


# Create a final raster dataframe for all players since 1928
df_spatPt <- subset(playerCityPoints_lower48_allYear_sp_proj,
                    Birth_Year >= 1928)

rast_allYear <- bkde2D(
  x = df_spatPt@coords,
  bandwidth = c(40000,40000),
  gridsize = c(
    400,
    round( 400 * (states_max_lat - states_min_lat + 100000) /
             (states_max_long - states_min_long + 200000) )
  ),
  range.x = list(
    c(states_min_long - 100000, states_max_long + 100000),
    c(states_min_lat - 50000, states_max_lat + 50000)
  )
)
rast_expandDF <- expand.grid(rast_allYear[['x1']],
                             rast_allYear[['x2']])
rast_expandDF$fhat = as.vector(rast_allYear[['fhat']])
rast_expandDF$fhat <- (rast_expandDF$fhat * 10^12) ^ (1/2)
rast_expandDF$fhat <- rast_expandDF$fhat * max(examp_rast_df$fhat) / max(rast_expandDF$fhat)
rast_expandDF$years <- 'All Years'

examp_rast_list[[length(examp_years) + 1]] <- rast_expandDF


# specify function to graph each example frame
examp_graph_func <- function(df){
  
  fill_opt <- 'inferno'
  
  if(df$years[1] == 'All Years'){
    fill_opt <- 'viridis'
  }
  
  p <- ggplot() +
  geom_raster(
    data=df,
    aes(x=Var1, y=Var2, fill=fhat)
  ) +
    geom_path(
      data = states_spdf_proj_df,
      aes(x = long, y = lat, group = group),
      color = 'white', alpha = 0.1
    ) +
    scale_x_continuous(breaks = c(states_min_long,states_max_long)) +
    scale_fill_viridis(name = 'Density of\nplayer births',
                       option = fill_opt,
                       limits = c(0, dens_max[1,1])
    ) +
    labs(title = df$years) +
    coord_fixed() +
    theme_map() +
    theme(
      legend.position = "none",
      plot.title = element_text(hjust = 0.5, vjust = 0)
    )
  
  return(p)
}

# apply graphing fuction to all example frames
examp_graph_list <- examp_rast_list %>%
  purrr::map(examp_graph_func)

#combine example graphs using gridExtra
p_examps <- gridExtra::arrangeGrob(grobs = examp_graph_list,
                        ncol = 2)

ggsave(
  filename = here(
    "static", "images", "heatmap_exampleFacets.png"
  ),
  plot = p_examps,
  width = 8, height = 12,
  units = 'in'
)