Mapping the Hometowns of University of Nebraska-Lincoln Football Recruits, 2000-2018
Abstract
The University of Nebraska switched athletic conferences in 2011 from the Big XII Conference to the Big Ten Conference. The geography of athletic conferences has impact on what schools’ recruits chose, which is important given the financial implications of college athletics. The purpose of this study is to ascertain whether the geography of Nebraska’s football recruits changed with the conference switch. Using the r-package Rvest, hometowns of Nebraska’s recruits were extracted from the University of Nebraska’s football website from the years 2000-2018, then geocoded. The geocode and subsequent calculations of mean centers revealed a possible correlation between the conference switch and the hometowns of Nebraska recruits, however the results are inconclusive.
Background
The purpose of this study is to understand geographic variability in the University of Nebraska’s football recruiting. The University of Nebraska switched athletic conferences in 2011, from the Big Twelve Conference to the Big Ten Conference. The main difference between the two conferences is the geography. Schools in the Big XII Conference include states with high prevalence of high school football culture, like Texas, while schools in the Big Ten Conference include states where high school football may not be as culturally significant (i.e. the midwest). With Nebraska’s switch into the Big Ten, I hypothesize the mean center of recruits has shifted north. Dumand, Lynch, and Platania (2008) state the number one factor in a high school recruits’ choice in college is the geographic distance between hometown and college. Although Nebraska and Texas aren’t that close geographically, Nebraska did travel down to Texas at least once a year for an away game, which Texas recruits can invite their family and friends to those games. However, with Nebraska’s switch into the Big Ten Conference, they stopped traveling to Texas on an annual basis, which may have affected their ability to recruit in Texas.
This study is important because of the enormous financial impact college athletics have on universities. College athletics provide direct financial gain for universities, in that universities directly profit off their football programs (given they are successful) (Goff 2000). Collegiate athletics also can increase exposure for universities, which in turn increases financial contributions from people and increase enrollment numbers (Goff 2000). And the more successful the program, the more financially successful the university is.
Methods and Data
19 years of recruits’ hometowns were compared between 2000 and 2018. Those years were selected because they represent Nebraska in the Big XII conference, their transition into the Big Ten Conference, and Nebraska rooted in the Big Ten Conference. The University of Nebraska’s football website was used, in which they have an extensive database on every player and their hometown throughout their football history.
Once the years were determined, the data was extracted using R, specifically the package “Rvest.” The package Rvest is used to scrape data from webpages and using the code below I extracted the hometowns of football players.
Code to Extract Hometowns from University of Nebraska Website
web.page <- read_html(page)
#web.page <- read_html("http://www.huskers.com/SportSelect.dbml?DB_OEM_ID=100&SPID=22&SPSID=4&KEY=&Q_SEASON=2018") # sample for single year
# used selector gadget to get what i wanted
home.town <- web.page %>%
html_nodes(".hometown .data")
# further process data with rvest
home.town.text <- html_text(home.town)
# remove new line and tab characters
home.town.text <- gsub("[\n\t]", "", home.town.text)
# remove high school name (in parentheses)
home.town.text <- gsub(" \\(.*","", home.town.text)
# get the city on its own
city <- sub('\\s*,.*','', home.town.text)After, the hometown data was geocoded using the Nominatim geocoding API. I then wrote the data into a CSV file for mapping. However, some hometowns did not geocode correctly for an unknown reason. For example, some states with a three-letter abbreviation (i.e. Neb. for Nebraska) did not locate correctly. Due to this, the players hometowns had to be substituted. I created a CSV substituting the state abbreviations with the postal code, imported the CSV into R, then changed the abbreviations using the code below.
Code to Substitute State Abbreviations, Geocode, and Putting Results in a Data Frame
# get csv of old/new state names
states <- data.frame(read.csv("Q:/StudentCoursework/Haffnerm/GEOG.435.001.2191/MOENAD/Projects/Project_Part_3/State_Abbr_Changes.csv"))
# get the bad state abbreviation on its own
state.old <- sub('.*, ', '', home.town.text)
# get new state name based on our manually looked up changes
state.new <- states$good[match(state.old, states$bad)]
# create new column of city, state
city.state <- paste0(city, ", ", state.new)
# use osm geocoder to extract the data
home.town.loc <- osm_search(query = city.state, key = key)
# put results in a dataframe
df <- data.frame("hometown" = home.town.text[1:nrow(home.town.loc)],
"lon" = home.town.loc$lon,
"lat" = home.town.loc$lat)
# save the .csv to a file
write.csv(df, paste("nb-recruits-", year, ".csv", sep = ""), row.names = FALSE)
}Once the data was geocoded, various kinds of maps were created for analysis. A time change map was created to see the change in recruits’ hometowns over time. A heat map observed, to see the density of Nebraska’s football recruits. Geographic mean center and standard deviation were also calculated, to see the spatial change and variability in recruits from 2000 to 2018.
The data used for this study is from the University of Nebraska Football roster page. For the study years, the season was just changed from the current year, to the years desired for the study.
For spatial reference, this is a map of the University of Nebraska within the context of the United States.
Study Area Map
Results
The data produced relatively accurate geocoded results, as evident by the map below. The map below is an example for the year 2000. There were a couple of inaccurate geocodes, which points in Czech Republic and Namibia, however most points geocoded correctly.
2000 Geocoded Result Map of Nebraska Recruit’s Hometowns
The geocode map is difficult to analyze based on points alone, because the points overlap each other, so I created a cluster map to better analyze the data.
2000 Cluster Map of Nebraska Recruit’s Hometowns
Most of Nebraska’s recruits come from Nebraska and the surrounding (Missouri, Kansas, Iowa, and Colorado). Texas has the highest number of recruits out of states that do not border Nebraska, with eight recruits.
I created a cluster geocode map for the year 2018 as well (see below)
2018 Cluster Map of Nebraska Recruits’s Hometowns
For the year 2018, most of Nebraska’s recruits hail from Nebraska and the tri-state area (Missouri, Kansas, and Iowa). Six recruits hailed from the Texas area, six came from Georgia, and five came from Colorado. Also, six recruits came from Florida, which differs from the year 2000 where zero recruits came from Florida.
I also created a mean center map, with green markers representing Nebraska recruits before the conference switch, while red represents after.
Mean Center Map
The green markers generally seem more west compared to the red markers, meaning there might be some change in where Nebraska obtained their recruits. The mean center isn’t moving east consistently every year; however, the general trend seems to be moving east.
Standard deviations of the geocodes were also calculated on a yearly basis (table below). Standard Deviations tell us the radius of where 68% of the recruits come from (so for the year 2000, 68% of recruits come from 17.81 degrees east and west of Nebraska, and 6.53 degrees north and south).
Standard Deviation Table
| Year | Lon SD (degrees) | Lat SD (degrees) |
|---|---|---|
| 2000 | 17.81 | 6.53 |
| 2001 | 18.36 | 7.24 |
| 2002 | 19.19 | 7.19 |
| 2003 | 20.74 | 7.05 |
| 2004 | 19.18 | 5.55 |
| 2005 | 21.70 | 6.85 |
| 2006 | 19.38 | 6.99 |
| 2007 | 16.18 | 6.77 |
| 2008 | 14.56 | 7.09 |
| 2009 | 14.13 | 7.12 |
| 2010 | 11.94 | 4.86 |
| 2011 | 18.28 | 7.65 |
| 2012 | 15.40 | 4.65 |
| 2013 | 17.13 | 4.73 |
| 2014 | 18.01 | 5.10 |
| 2015 | 18.46 | 5.06 |
| 2016 | 12.97 | 5.03 |
| 2017 | 13.19 | 5.24 |
| 2018 | 11.94 | 4.86 |
The standard deviations seem generally higher in the years preceding the conference switch in 2011. Longitude standard deviations seem around the same standard deviations for the years 2011-2015 after Nebraska left the Big XII Conference. However, the standard deviations dip to 11-13 degrees in longitude and 4-5 degrees in latitude in the years 2016-2018.
Discussion
Nebraska switching conferences seems to exhibit some impact on the where their recruits come from. The general trend of mean centers seems to be moving eastward away from Texas, which might mean Nebraska is recruiting players from Big Ten Conference states. However, the mean center trend isn’t moving eastward on a consistent year-to-year basis. The mean center for 2014 is the farthest east, however the trend moves west in 2015 and 2016. The mean centers move east again for the years 2017 and 2018, but there doesn’t seem to be a definitive correlation between mean center and year other than it generally moved east.
Standard deviations also seem to illustrate a general trend after the conference switch. The standard deviations seem to shrink after the conference switch, especially in the years 2016-2018 when Nebraska firmly rooted itself in the Big Ten Conference. The standard deviations in 2016-2018 fall between 11-13 longitude and 4-5 latitude, while in the years preceding they fall generally fall between 14-21 longitude and 5-7 latitude.
The data not geocoding correctly is a limitation in my study. The 2000 Geocoded Result Map showed some points in Czech Republic and Namibia, while upon further examination, no recruits came from those countries. The inaccurate geocodes might have thrown off the mean center, however only 2-3 points geocoded inaccurately per year. Another limitation of my study is Nebraska potentially losing recruits from Texas may not be directly correlated with the conference switch. Nebraska has had four head coaches since 2004, and coaches might have relationships with a certain geographical area that doesn’t account for a conference switch (i.e. the University of Wisconsin has a recruiting relationship with New Jersey running backs). Also, mean centers are not perfect representations of averages because of longitude disparity. Longitudes are not the same distances between every line on Earth, so while mean centers are relatively accurate, they are not perfect.
Conclusion
Nebraska might be losing recruits from Texas because of the conference switch, however the analysis is inconclusive. The general trend of the mean centers seems to move eastward, possibly meaning Nebraska is recruiting from Big Ten Conference states. Nebraska switching conferences has implications on where they obtain their recruits from, which might determine how well they do if my hypothesis. Texas is a high school recruiting hotspot; in 2018, they produced the second most recruits the ESPN Top 300 Recruits. Recruits can change the culture of a football program to a winning culture. However, if Nebraska misses out on Texas recruits because of their conference switch, they might never land a big-name recruit from Texas to change their program, which has direct financial implications (Goff 2000). For future studies, I would like to analyze the recruiting characteristics of each coach in Nebraska’s history, and where each coach recruited from.
References
Dumond, J. Michael, Allen K. Lynch, and Jennifer Platania. 2008. “An Economic Model of the College Football Recruiting Process.” Journal of Sports Economics 9, no. 1 (February): 67-87.
Goff, B. 2000. “Effects of University Athletics on the University: A Review and Extension of Empirical Assessment.” Journal of Sports Management 14, no. 2 (April): 85-104.