The purpose of this project is to determine if the climate of National Football League teams’ host cities impacts that team’s performance in extreme weather games. Data from NFLSavant.com detailed the outcome of every NFL game from 1970 to 2013 along with the weather from that game. I decided to split the results of NFL games up between the approximately 1000 coldest games played during this time frame, with the approximately 1000 warmest games played. These bounds were 33 degrees Fahrenheit and colder and 73 degrees Fahrenheit. To overcome biases for teams that are just historically better/worse, the difference in win percentage of normal weather games (between 34 and 72 degrees Fahrenheit) with the cold games and the hot games win percentages. I will visualize these results along with figuring out if there is any correlation between host city’s warmest average monthly high and coldest average monthly low during the NFL season with their win percentages in different weather.
# getting data in from NFL Savant
NFLweather <- read_excel("C:/2022 Fall/STAT 5014 Stat Program Packages/Project/weather_20131231.xls")
attach(NFLweather)
# View(NFLweather)
From the original data, I summed each franchise’s total wins, losses, and ties, and calculated their win percentage.
# creating empty vectors to add more data to set
homewin = c()
awaywin = c()
homeloss = c()
awayloss = c()
hometie = c()
awaytie = c()
# to show who won each game
for (i in 1:nrow(NFLweather)) {
if (home_score[i] > away_score[i]) {
homewin[i] = 1
awaywin[i] = 0
homeloss[i] = 0
awayloss[i] = 1
hometie[i] = 0
awaytie[i] = 0
}
if (home_score[i] < away_score[i]) {
homewin[i] = 0
awaywin[i] = 1
homeloss[i] = 1
awayloss[i] = 0
hometie[i] = 0
awaytie[i] = 0
}
if (home_score[i] == away_score[i]) {
homewin[i] = 0
awaywin[i] = 0
homeloss[i] = 0
awayloss[i] = 0
hometie[i] = 1
awaytie[i] = 1
}
}
# adding status of game wins to data
NFLweather1 = data.frame(cbind(NFLweather, homewin, awaywin,
homeloss, awayloss, hometie, awaytie), stringsAsFactors = FALSE)
# adding wins/losses/ties per team
teams = aggregate(NFLweather1$homewin, by = list(Category = NFLweather1$home_team),
FUN = sum)[, 1]
wins = aggregate(NFLweather1$homewin, by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaywin, by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
losses = aggregate(NFLweather1$homeloss, by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awayloss, by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
ties = aggregate(NFLweather1$hometie, by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaytie, by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
winpct = wins/(wins + losses) #calculating win percentage for franchise history
# all time team records
records = data.frame(cbind(teams, wins, losses, ties, round(winpct,
3)))
colnames(records) = c("Team", "Wins", "Losses", "Ties", "Win %")
records
## Team Wins Losses Ties Win %
## 1 Arizona Cardinals 119 179 0 0.399
## 2 Atlanta Falcons 295 419 6 0.413
## 3 Baltimore Colts 187 159 6 0.54
## 4 Baltimore Ravens 167 132 1 0.559
## 5 Boston Patriots 2 12 0 0.143
## 6 Buffalo Bills 321 374 2 0.462
## 7 Carolina Panthers 150 154 0 0.493
## 8 Chicago Bears 416 412 7 0.502
## 9 Cincinnati Bengals 310 378 1 0.451
## 10 Cleveland Browns 366 408 8 0.473
## 11 Dallas Cowboys 488 360 6 0.575
## 12 Denver Broncos 408 293 6 0.582
## 13 Detroit Lions 328 456 15 0.418
## 14 Green Bay Packers 469 366 14 0.562
## 15 Houston Oilers 188 235 2 0.444
## 16 Houston Texans 78 114 0 0.406
## 17 Indianapolis Colts 262 234 0 0.528
## 18 Jacksonville Jaguars 145 159 0 0.477
## 19 Kansas City Chiefs 330 355 7 0.482
## 20 Los Angeles Raiders 124 88 0 0.585
## 21 Los Angeles Rams 279 249 11 0.528
## 22 Miami Dolphins 418 295 2 0.586
## 23 Minnesota Vikings 436 375 10 0.538
## 24 New England Patriots 379 304 0 0.555
## 25 New Orleans Saints 314 399 5 0.44
## 26 New York Giants 417 416 9 0.501
## 27 New York Jets 309 382 2 0.447
## 28 Oakland Raiders 252 242 6 0.51
## 29 Philadelphia Eagles 416 416 13 0.5
## 30 Phoenix Cardinals 32 64 0 0.333
## 31 Pittsburgh Steelers 489 370 9 0.569
## 32 San Diego Chargers 331 360 5 0.479
## 33 San Francisco 49ers 465 373 11 0.555
## 34 Seattle Seahawks 289 300 0 0.491
## 35 St. Louis Cardinals 186 205 14 0.476
## 36 St. Louis Rams 123 158 1 0.438
## 37 Tampa Bay Buccaneers 232 366 1 0.388
## 38 Tennessee Oilers 16 16 0 0.5
## 39 Tennessee Titans 136 111 0 0.551
## 40 Washington Redskins 424 408 12 0.51
I repeated the process above with conditions of the coldest and hottest thousand games in NFL history, along with the most extreme 250 hottest and coldest. I compiled each franchises’ records in each of the weather conditions (Extreme Cold, Cold, Normal, Hot, Extreme Hot). I finally compiled a table of just the win percentages for each franchise in each of the weather conditions and included the average annual temperature, coldest monthly low from the months of September to February, and the highest monthly high temperature from the months of September to February.
# doing the same as above for the coldest 1000ish games...
coldGames = NFLweather1[NFLweather1$temperature < 34, ]
winsC = aggregate(NFLweather1$homewin & NFLweather1$temperature <
34, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaywin & NFLweather1$temperature <
34, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
lossesC = aggregate(NFLweather1$homeloss & NFLweather1$temperature <
34, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awayloss & NFLweather1$temperature <
34, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
tiesC = aggregate(NFLweather1$hometie & NFLweather1$temperature <
34, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaytie & NFLweather1$temperature <
34, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
winpctC = winsC/(winsC + lossesC)
recordsC = data.frame(cbind(teams, winsC, lossesC, tiesC, round(winpctC,
3)))
# and hottest 1000 games...
hotGames = NFLweather1[NFLweather1$temperature > 72, ]
winsH = aggregate(NFLweather1$homewin & NFLweather1$temperature >
72, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaywin & NFLweather1$temperature >
72, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
lossesH = aggregate(NFLweather1$homeloss & NFLweather1$temperature >
72, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awayloss & NFLweather1$temperature >
72, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
tiesH = aggregate(NFLweather1$hometie & NFLweather1$temperature >
72, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaytie & NFLweather1$temperature >
72, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
winpctH = winsH/(winsH + lossesH)
recordsH = data.frame(cbind(teams, winsH, lossesH, tiesH, round(winpctH,
3)))
# and the coldest ~250 games
extremeColdGames = NFLweather1[NFLweather1$temperature < 22,
]
winsXC = aggregate(NFLweather1$homewin & NFLweather1$temperature <
22, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaywin & NFLweather1$temperature <
22, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
lossesXC = aggregate(NFLweather1$homeloss & NFLweather1$temperature <
22, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awayloss & NFLweather1$temperature <
22, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
tiesXC = aggregate(NFLweather1$hometie & NFLweather1$temperature <
22, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaytie & NFLweather1$temperature <
22, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
winpctXC = winsXC/(winsXC + lossesXC)
recordsXC = data.frame(cbind(teams, winsXC, lossesXC, tiesXC,
round(winpctXC, 3)))
# and hottest ~250 games in NFL history
extremeHotGames = NFLweather1[NFLweather1$temperature > 81, ]
winsXH = aggregate(NFLweather1$homewin & NFLweather1$temperature >
81, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaywin & NFLweather1$temperature >
81, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
lossesXH = aggregate(NFLweather1$homeloss & NFLweather1$temperature >
81, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awayloss & NFLweather1$temperature >
81, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
tiesXH = aggregate(NFLweather1$hometie & NFLweather1$temperature >
81, by = list(Category = NFLweather1$home_team), FUN = sum)[,
2] + aggregate(NFLweather1$awaytie & NFLweather1$temperature >
81, by = list(Category = NFLweather1$away_team), FUN = sum)[,
2]
winpctXH = winsXH/(winsXH + lossesXH)
recordsXH = data.frame(cbind(teams, winsXH, lossesXH, tiesXH,
round(winpctXH, 3)))
# games that I think were played indoors from looking at
# data, although I have no way to figure this out for sure
indoorGames = NFLweather1[NFLweather1$temperature == 0 & NFLweather1$wind_chill ==
0 | NFLweather1$weather == "72 degrees- no wind", ]
winsI = aggregate(NFLweather1$homewin & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaywin & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
lossesI = aggregate(NFLweather1$homeloss & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awayloss & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
tiesI = aggregate(NFLweather1$hometie & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaytie & (NFLweather1$temperature ==
0 & NFLweather1$wind_chill == 0 | NFLweather1$weather ==
"72 degrees- no wind"), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
winpctI = winsI/(winsI + lossesI)
recordsI = data.frame(cbind(teams, winsI, lossesI, tiesI, round(winpctI,
3)))
# games played in temperatures between 34 and 72 degrees
normalGames = NFLweather1[(NFLweather1$temperature <= 72 & NFLweather1$temperature >=
34), ]
winsN = aggregate(NFLweather1$homewin & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaywin & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
lossesN = aggregate(NFLweather1$homeloss & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awayloss & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
tiesN = aggregate(NFLweather1$hometie & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$home_team),
FUN = sum)[, 2] + aggregate(NFLweather1$awaytie & (NFLweather1$temperature <=
72 & NFLweather1$temperature >= 34), by = list(Category = NFLweather1$away_team),
FUN = sum)[, 2]
winpctN = winsN/(winsN + lossesN)
recordsN = data.frame(cbind(teams, winsN, lossesN, tiesN, round(winpctN,
3)))
# putting together wins/losses for different types of
# weather
recordsTable = data.frame(cbind(records, recordsXC[, 2:5], recordsC[,
2:5], recordsN[, 2:5], recordsH[, 2:5], recordsXH[, 2:5]))
colnames(recordsTable) = c("Team", "Wins", "Losses", "Ties",
"Win%", "XCWins", "XCLosses", "XCTies", "XCWin%", "CWins",
"CLosses", "CTies", "CWin%", "NWins", "NLosses", "NTies",
"NWin%", "HWins", "HLosses", "HTies", "HWin%", "EHWins",
"EHLosses", "EHTies", "EHWin%")
# instances where a team did not play in that particular
# weather in their history
recordsTable[5, 9] = 0
recordsTable[5, 21] = 0
recordsTable[5, 25] = 0
recordsTable[9, 25] = 0
recordsTable[38, 9] = 0
recordsTable[38, 25] = 0
# changing all the numbers from characters to numbers to do
# calculations with
recordsTable$Wins = as.numeric(as.character(recordsTable$Wins))
recordsTable$Losses = as.numeric(as.character(recordsTable$Losses))
recordsTable$Ties = as.numeric(as.character(recordsTable$Ties))
recordsTable$"Win%" = as.numeric(as.character(recordsTable$"Win%"))
recordsTable$XCWins = as.numeric(as.character(recordsTable$XCWins))
recordsTable$XCLosses = as.numeric(as.character(recordsTable$XCLosses))
recordsTable$XCTies = as.numeric(as.character(recordsTable$XCTies))
recordsTable$"XCWin%" = as.numeric(as.character(recordsTable$"XCWin%"))
recordsTable$CWins = as.numeric(as.character(recordsTable$CWins))
recordsTable$CLosses = as.numeric(as.character(recordsTable$CLosses))
recordsTable$CTies = as.numeric(as.character(recordsTable$CTies))
recordsTable$"CWin%" = as.numeric(as.character(recordsTable$"CWin%"))
recordsTable$NWins = as.numeric(as.character(recordsTable$NWins))
recordsTable$NLosses = as.numeric(as.character(recordsTable$NLosses))
recordsTable$NTies = as.numeric(as.character(recordsTable$NTies))
recordsTable$"NWin%" = as.numeric(as.character(recordsTable$"NWin%"))
recordsTable$HWins = as.numeric(as.character(recordsTable$HWins))
recordsTable$HLosses = as.numeric(as.character(recordsTable$HLosses))
recordsTable$HTies = as.numeric(as.character(recordsTable$HTies))
recordsTable$"HWin%" = as.numeric(as.character(recordsTable$"HWin%"))
recordsTable$EHWins = as.numeric(as.character(recordsTable$EHWins))
recordsTable$EHLosses = as.numeric(as.character(recordsTable$EHLosses))
recordsTable$EHTies = as.numeric(as.character(recordsTable$EHTies))
recordsTable$"EHWin%" = as.numeric(as.character(recordsTable$"EHWin%"))
# recordsTable
# finding win% differentials from overall record to records
# in different temperatures
diffTable = data.frame(cbind(records[, 1], round(recordsTable[,
9] - recordsTable[, 5], 3), round(recordsTable[, 13] - recordsTable[,
5], 3), round(recordsTable[, 17] - recordsTable[, 5], 3),
round(recordsTable[, 21] - recordsTable[, 5], 3), round(recordsTable[,
25] - recordsTable[, 5], 3)))
colnames(diffTable) = c("Team", "XC", "C", "N", "H", "EH")
# changing elements to numbers again
diffTable$XC = as.numeric(as.character(diffTable$XC))
diffTable$C = as.numeric(as.character(diffTable$C))
diffTable$N = as.numeric(as.character(diffTable$N))
diffTable$H = as.numeric(as.character(diffTable$H))
diffTable$EH = as.numeric(as.character(diffTable$EH))
# getting rid of teams who played less than 100 games in
# franchise history
diffTable2 = diffTable[-5, ]
diffTable2 = diffTable2[-29, ]
diffTable2 = diffTable2[-36, ]
# I want temperatures to be a part of this visualization,
# so I found the average annual temperature for each host
# city and added it to the data set
temps = as.numeric(c(87, 73, 67, 67, 57, 72, 61, 64, 61, 78,
66, 59, 46, 80, 80, 63, 80, 65, 75, 75, 84, 55, 59, 79, 63,
63, 64, 65, 61, 62, 64, 61, 57, 57, 83, 72, 68))
tempsH = as.numeric(c(100, 71, 80, 80, 71, 81, 73, 79, 74, 89,
79, 73, 70, 88, 88, 78, 88, 80, 78, 78, 89, 72, 72, 87, 76,
76, 74, 78, 75, 67, 70, 67, 79, 79, 89, 82, 78))
tempsC = as.numeric(c(45, 35, 29, 29, 19, 30, 18, 22, 22, 30,
18, 18, 9, 44, 44, 20, 39, 22, 58, 58, 60, 8, 22, 45, 26,
26, 44, 26, 21, 48, 46, 36, 21, 21, 52, 28, 27))
tempsTable = data.frame(cbind(records[, 1], temps, tempsH, tempsC))
## Warning in cbind(records[, 1], temps, tempsH, tempsC): number of rows of result
## is not a multiple of vector length (arg 2)
tempsTable$temps = as.numeric(as.character(tempsTable$temps))
diffTable3 = data.frame(cbind(diffTable2, temps, tempsH, tempsC))
colnames(diffTable3) = c("Team", "XC", "C", "N", "H", "XH", "Temp",
"TempH", "TempC")
diffTable3
## Team XC C N H XH Temp TempH TempC
## 1 Arizona Cardinals -0.399 -0.399 0.010 0.022 0.045 87 100 45
## 2 Atlanta Falcons -0.097 -0.070 0.006 -0.028 -0.105 73 71 35
## 3 Baltimore Colts 0.174 0.037 0.026 -0.366 -0.540 67 80 29
## 4 Baltimore Ravens 0.441 0.108 -0.013 -0.007 -0.159 67 80 29
## 6 Buffalo Bills 0.153 0.066 -0.003 -0.065 -0.062 57 71 19
## 7 Carolina Panthers 0.007 0.033 0.011 -0.093 -0.160 72 81 30
## 8 Chicago Bears 0.036 0.014 -0.009 0.083 0.298 61 73 18
## 9 Cincinnati Bengals 0.299 0.067 0.004 -0.165 -0.451 64 79 22
## 10 Cleveland Browns 0.027 -0.048 0.008 0.015 -0.273 61 74 22
## 11 Dallas Cowboys -0.218 -0.104 -0.005 0.078 0.131 78 89 30
## 12 Denver Broncos 0.033 0.011 -0.003 0.008 0.118 66 79 18
## 13 Detroit Lions 0.082 -0.006 0.006 -0.085 0.182 59 73 18
## 14 Green Bay Packers 0.046 0.083 -0.015 -0.123 -0.229 46 70 9
## 15 Houston Oilers -0.158 -0.087 0.011 -0.063 0.223 80 88 44
## 16 Houston Texans -0.006 0.094 -0.049 0.227 0.412 80 88 44
## 17 Indianapolis Colts -0.028 -0.122 0.007 0.022 0.108 63 78 20
## 18 Jacksonville Jaguars -0.048 -0.068 -0.010 0.052 -0.124 80 88 39
## 19 Kansas City Chiefs 0.165 0.030 -0.004 -0.005 -0.097 65 80 22
## 20 Los Angeles Raiders -0.385 -0.130 -0.008 0.248 0.415 75 78 58
## 21 Los Angeles Rams -0.255 -0.278 0.011 0.063 0.072 75 78 58
## 22 Miami Dolphins -0.015 -0.162 -0.032 0.063 0.067 84 89 60
## 23 Minnesota Vikings -0.023 0.013 0.000 -0.038 -0.038 55 72 8
## 24 New England Patriots -0.155 0.098 -0.009 -0.032 -0.247 59 72 22
## 25 New Orleans Saints -0.107 -0.040 -0.007 0.092 -0.040 79 87 45
## 26 New York Giants -0.201 -0.008 -0.002 0.028 -0.030 63 76 26
## 27 New York Jets 0.053 -0.064 0.016 -0.062 -0.003 63 76 26
## 28 Oakland Raiders -0.260 -0.125 0.016 -0.150 -0.010 64 74 44
## 29 Philadelphia Eagles 0.071 -0.013 0.004 -0.029 0.136 65 78 26
## 31 Pittsburgh Steelers -0.040 0.071 -0.002 -0.138 -0.236 61 75 21
## 32 San Diego Chargers -0.193 -0.090 -0.002 0.082 0.077 62 67 48
## 33 San Francisco 49ers 0.045 -0.070 0.002 0.012 0.112 64 70 46
## 34 Seattle Seahawks -0.062 -0.146 0.010 -0.036 0.080 61 67 36
## 35 St. Louis Cardinals -0.091 -0.091 0.027 -0.176 -0.476 57 79 21
## 36 St. Louis Rams 0.062 -0.009 0.008 -0.126 -0.188 57 79 21
## 37 Tampa Bay Buccaneers -0.188 -0.240 -0.006 0.048 0.017 83 89 52
## 39 Tennessee Titans -0.107 -0.113 0.018 -0.051 -0.384 72 82 28
## 40 Washington Redskins 0.046 0.018 0.001 -0.024 -0.157 68 78 27
Because temperature are on a different scale than the difference of win percentage, I standardized temperature to be within the range of the differences of win percentages.
# temperatures are on a much different scale than win%, so
# I standardized the temperatures and added them to the
# data
s.temp = (diffTable3$Temp - mean(diffTable3$Temp))/(50 * sd(diffTable3$Temp))
s.tempH = (diffTable3$TempH - mean(diffTable3$TempH))/(50 * sd(diffTable3$TempH))
s.tempC = (diffTable3$TempC - mean(diffTable3$TempC))/(50 * sd(diffTable3$TempC))
diffTable4 = data.frame(cbind(diffTable2, as.numeric(s.tempH),
as.numeric(s.tempC)))
colnames(diffTable4) = c("Team", "XC", "C", "N", "H", "EH", "s.tempH",
"s.tempC")
I used ggplot to visualize the data. I did a barplot on the difference of win percentages, putting them in order from the best performers in that weather to the worst performers in that weather compared to normal weather games. To show the impact of the host city’s temperature, I added dots to the barplot of the standardized host city temperatures; for the cold weather performance barplot, I used the host city’s lowest monthly average low temperature during the NFL season and the same for warm weather.
# base R to make sure I am on the right track
# barplot(diffTable2$XC, col = '907909')
# barplot(diffTable2$C) barplot(diffTable2$N)
# barplot(diffTable2$H) barplot(diffTable2$EH)
# vectors of colors that I felt like best represented each
# franchise's colors
colors = c("firebrick2", "firebrick4", "lightskyblue2", "purple4",
"mediumblue", "deepskyblue1", "chocolate1", "darkorange",
"salmon1", "royalblue", "orange", "dodgerblue1", "darkgreen",
"steelblue1", "midnightblue", "blue2", "khaki3", "red2",
"black", "gold", "aquamarine2", "purple3", "blue4", "goldenrod2",
"royalblue3", "seagreen4", "slategray", "palegreen4", "gold1",
"yellow", "darkgoldenrod2", "darkslateblue", "orangered4",
"burlywood1", "indianred4", "cornflowerblue", "tomato4")
# ggplot(data = diffTable2, aes(x = reorder(Team, C, na.rm
# = TRUE), y = C)) + geom_bar(stat = 'identity', fill =
# colors) + coord_flip() + theme_economist_white()
# ggplot(data = diffTable2, aes(x = reorder(Team, H, na.rm
# = TRUE), y = H)) + geom_bar(stat = 'identity', fill =
# colors) + coord_flip() + theme_economist_white()
# ggplot(data = diffTable2, aes(x = reorder(Team, N, na.rm
# = TRUE), y = N)) + geom_bar(stat = 'identity', fill =
# colors) + coord_flip() + theme_economist_white()
# plotting just the temperature data ggplot(data =
# diffTable4, aes(x = reorder(Team, St.Temp, na.rm = TRUE),
# y = St.Temp)) + geom_bar(stat = 'identity', fill =
# colors) + coord_flip() + theme_economist_white() +
# ggtitle('Standardized Temperature by Team's Host City') +
# theme(plot.title = element_text(hjust = 0.5),
# plot.subtitle = element_text(hjust = 0.5), axis.title.x =
# element_text(margin = margin(t = 20)), axis.title.y =
# element_text(margin = margin(r = 20))) + labs(y =
# 'Standardized Host City Temperature', x = 'Team')
# plotting win percentage differential in cold games
ggplot(data = diffTable4, aes(x = reorder(Team, C, na.rm = TRUE),
y = C)) + geom_bar(stat = "identity", fill = colors) + coord_flip() +
theme_economist_white() + ggtitle("Win Percentage Differential in Cold Weather Games",
subtitle = "Win % in Games Colder than 22 degrees Fahrenheit minus Overall Franchise Win %") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(margin = margin(t = 20)),
axis.title.y = element_text(margin = margin(r = 20))) +
labs(y = "Cold Game Win % - Overall Franchise Win %", x = "Team",
caption = "Blue dot represents standardized host city's annual average temperature.") +
geom_point(data = diffTable4, aes(x = reorder(Team, C, na.rm = TRUE),
y = -s.tempC), col = "blue", alpha = 0.7)
# showing teams who may need to rely on normal/abnormal
# weather to win games ggplot(data = diffTable4, aes(x =
# reorder(Team, N, na.rm = TRUE), y = N)) + geom_bar(stat =
# 'identity', fill = colors) + coord_flip() +
# theme_economist_white() + ggtitle('Win Percentage
# Differential in Typical Weather Games', subtitle = 'Win %
# in Games Between 23 and 72 degrees Fahrenheit minus
# Overall Franchise Win %') + theme(plot.title =
# element_text(hjust = 0.5), plot.subtitle =
# element_text(hjust = 0.5), axis.title.x =
# element_text(margin = margin(t = 20)), axis.title.y =
# element_text(margin = margin(r = 20))) + labs(y =
# 'Typical Game Win % - Overall Franchise Win %', x =
# 'Team', caption = 'Black dot represents standardized host
# city's annual average temperature.') + geom_point(data =
# diffTable4, aes(x = reorder(Team, N, na.rm = TRUE), y =
# s.temp), col = 'black', alpha = 0.5)
# plotting win percentage differential in hot games
ggplot(data = diffTable3, aes(x = reorder(Team, H, na.rm = TRUE),
y = H)) + geom_bar(stat = "identity", fill = colors) + coord_flip() +
theme_economist_white() + ggtitle("Win Percentage Differential in Hot Weather Games",
subtitle = "Win % in Games Warmer than 72 degrees Fahrenheit minus Overall Franchise Win %") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(margin = margin(t = 20)),
axis.title.y = element_text(margin = margin(r = 20))) +
labs(y = "Hot Game Win % - Overall Franchise Win %", x = "Team",
caption = "Red dot represents standardized host city's annual average temperature.") +
geom_point(data = diffTable4, aes(x = reorder(Team, N, na.rm = TRUE),
y = 4 * s.tempH), col = "red", alpha = 0.7)
These ggplots do not format well in RMarkdown, so I will attach them separately. However, it can be seen that more teams located in cold cities (represented by the blue dots) have a improved win percentage in cold weather (bars tend to the right for these cities). More of the warmer cities tend in the worse performances. In the normal temperature graph, the temperature dots appear scattered as expected. Finally, in the hot weather graph, dots again appear pretty scattered, hinting at no relationship between win percentage in hot games and host city high temperatures.
With these visualizations, I want to quantify the relationship between host city temperature and extreme weather game performance.
cor(diffTable3$C, tempsC)
## [1] -0.6294471
cor(diffTable3$N, temps)
## [1] -0.2402024
cor(diffTable3$H, tempsH)
## [1] 0.2333481
ggplot(diffTable3, aes(tempsC, C)) + geom_point() + geom_smooth(method = "lm") +
theme_economist_white() + ggtitle("Win % Differential vs. Host City Temp",
subtitle = "Win % in Games Colder than 34 degree F - Overall Franchise Win %") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(margin = margin(t = 20)),
axis.title.y = element_text(margin = margin(r = 20))) +
labs(y = "Cold Game Win % - Overall Franchise Win %", x = "Team",
caption = "Win Percentage Differential in Cold Weather Games vs. Team Host City Temperature")
## `geom_smooth()` using formula 'y ~ x'
ggplot(diffTable3, aes(temps, N)) + geom_point() + geom_smooth(method = "lm") +
theme_economist_white() + ggtitle("Win % Differential vs. Host City Temp",
subtitle = "Win % in Games between 35 and 72 degrees F minus Overall Franchise Win %") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(margin = margin(t = 20)),
axis.title.y = element_text(margin = margin(r = 20))) +
labs(y = "Typical Weather Game Win % - Overall Franchise Win %",
x = "Team", caption = "Win Percentage Differential in Cold Weather Games vs. Team Host City Temperature")
## `geom_smooth()` using formula 'y ~ x'
ggplot(diffTable3, aes(tempsH, H)) + geom_point() + geom_smooth(method = "lm") +
theme_economist_white() + ggtitle("Win % Differential vs. Host City Temp",
subtitle = "Win % in Games Hotter than 72 degrees Fahrenheit minus Overall Franchise Win %") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5),
axis.title.x = element_text(margin = margin(t = 20)),
axis.title.y = element_text(margin = margin(r = 20))) +
labs(y = "Hot Game Win % - Overall Franchise Win %", x = "Team",
caption = "Win Percentage Differential in Hot Weather Games vs. Team Host City Temperature")
## `geom_smooth()` using formula 'y ~ x'
There appeared to be a pretty strong inverse relationship between host
city’s coldest temperatures and the team’s performance during cold
weather games, with the correlation between the two variables being
-0.63. For games played in temperatures between 34 and 73 degrees, there
was still a much lower correlation between the two, at -0.24. This
number is expected to be lower since in theory there should not be
advantages in normal weather for NFL teams located in extreme weather
cities. Finally, the same was found for warm weather which is
interesting. The correlation between win percentage in hot weather with
the host city’s highest monthly average high during the NFL season is
0.23. These results show that cold weather teams may have an advantage
in cold weather games, but not as much for teams based in other
climates.
I next wanted to quantify the linear relationship between the NFL host city’s winter weather with the team’s win percentage in games colder than 33 degrees Fahrenheit. The following shows result from a simple linear regression model.
# plot(diffTable3$C ~ tempsC)
summary(lm(diffTable3$C ~ tempsC))
##
## Call:
## lm(formula = diffTable3$C ~ tempsC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28410 -0.04885 0.01128 0.05005 0.20387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.111566 0.036012 3.098 0.00383 **
## tempsC -0.005033 0.001050 -4.792 3e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08639 on 35 degrees of freedom
## Multiple R-squared: 0.3962, Adjusted R-squared: 0.379
## F-statistic: 22.97 on 1 and 35 DF, p-value: 2.999e-05
To predict each team’s win percentage, one could take the host city’s coldest average monthly low and multiply it by -0.005, then add this to 0.112. This model and it’s coefficients are significant, but only around 40% of the variation in the team’s win percentage is explained by their winter temperatures. Because of this low expanatory rate, I would never use this model to place sports bets. I added another term to the model to show that good teams just play better regardless of weather, too. The results of this is shown in the R output below.
summary(lm(diffTable3$C ~ tempsC + diffTable3$N))
##
## Call:
## lm(formula = diffTable3$C ~ tempsC + diffTable3$N)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.245582 -0.043680 0.001797 0.050000 0.150452
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.133870 0.031730 4.219 0.000172 ***
## tempsC -0.005678 0.000925 -6.138 5.7e-07 ***
## diffTable3$N -3.179690 0.885943 -3.589 0.001033 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07464 on 34 degrees of freedom
## Multiple R-squared: 0.5621, Adjusted R-squared: 0.5363
## F-statistic: 21.82 on 2 and 34 DF, p-value: 8.004e-07
# summary(lm(diffTable3$H ~ tempsH))
# summary(lm(diffTable3$H ~ tempsH + diffTable3$N))
To predict each team’s win percentage with this model, one could take the host city’s coldest average monthly low and multiply it by -0.006 and add this to 0.134. This model and it’s coefficients are significant and now explains around 56% of the variation in the team’s win percentage. This is a little better, but I am still not sure if it is enough to use to gamble with. To determine a better prediction model, more variables and different types of models should be considered.
For this project, I really wanted to focus on data visualization. I
referred to many online resources on how to customize graphs on ggplot.
I really hadn’t customized a plot using ggplot this much before. It was
interesting to see how weather can affect team performance. This project
idea came to me after reading an article about home field advantage for
teams who play in cities with cold winter weather. However, this
analysis showed that the Cleveland Browns are the worst team in cold
weather over the past few years; I immediately thought that this is just
because they were the worst team overall. Because of this, I wanted to
compare the team’s record over this time with their record in extreme
weather. I did end up showing that the Browns do play worse in cold
games, but there were many teams that were even worse. A large majority
of the time spent on this project was just getting the data to where I
could work with it. The next most time consuming part of this project
with just customizing the ggplots to get it to answer my questions
visually. I do believe there are some limitations in this project. If I
am truly trying to predict outcomes of NFL games, I would need many more
variables. Also, since the NFL season is mainly in December, warm
weather cities would never really get the chance to show their advantage
in extreme heat. Also, the hottest 1000 NFL games go down all the way to
73 degrees Fahrenheit, which really is not that extreme of heat. The
coldest games, again because the NFL mainly takes place in winter, go
all the way down to near freezing temperatures. Along with performing
better in weather similar to the team’s host city, I did not take into
account teams just naturally playing better in home games (home field
advantage). It was also difficult to take into account indoor stadiums.
They were grouped in the normal temperature group, but I did not
investigate what is called “dome-field advantage”.
Overall, I am happy with how this project went and I was happy to learn
some more tools, especially in ggplot. I learned how to do a ton of
different customization on ggplots to get them to be how I want them to
be. I put in a lot of work in cleaning the data to get it to be
workable. My favorite part was combining a bar plot with a scatter plot
to add deeper meaning to a graph. Although it was pretty simple, it was
also nice to visualize the linear model and correlation of different
variables. I always enjoy answering questions about sports performances
that come through my head.