The file size has to be reduced in order to upload as a repository on Github.
library(ggplot2)
away_games <- read.csv("https://raw.githubusercontent.com/wco1216/DATA-606/master/awaygames.csv", TRUE, ",")
home_games <- read.csv("https://raw.githubusercontent.com/wco1216/DATA-606/master/homegames.csv", TRUE, ",")
What conditions allowed for the most rushing yards by NFL running backs in 2018?
There are 247,962 cases.
The data was received from “NFL Big Data Bowl” posted on kaggle (https://www.kaggle.com/c/nfl-big-data-bowl-2020/data).
These are observational cases.
The dependent variable will be yards in this study.
There are multiple independent variables but some significant variables are player names, offensive formation and player height & weight.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(away_games$Yards)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -12.00 1.00 3.00 4.42 6.00 99.00
summary(home_games$Yards)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -12.00 1.00 3.00 4.42 6.00 99.00
ggplot(home_games, aes(Yards)) +
geom_histogram(binwidth = 1) +
scale_x_continuous(limits = c(-12,99))
ggplot(away_games, aes(Yards)) +
geom_histogram(binwidth = 1) +
scale_x_continuous(limits = c(-12,99))
summary(away_games$PlayerWeight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 159.0 210.0 245.0 252.7 305.0 380.0
summary(home_games$PlayerWeight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 159.0 210.0 245.0 253.1 305.0 380.0
ggplot(away_games, aes(PlayerWeight)) +
geom_histogram(binwidth = 4) +
scale_x_continuous(limits = c(159,380))
## Warning: Removed 2 rows containing missing values (geom_bar).
ggplot(home_games, aes(PlayerWeight)) +
geom_histogram(binwidth = 4) +
scale_x_continuous(limits = c(159,380))
## Warning: Removed 2 rows containing missing values (geom_bar).