library(dplyr)
library(ggplot2)
# load data
library(fueleconomy)
vehicles
Per EPA (Environmental protection agency), combined fuel economy is a weighted average of City and Highway MPG values that is calculated by weighting the City value by 55% and the Highway value by 45%.
vehicles <- na.omit(vehicles)
vehicles
vehicles <- vehicles %>% mutate(mpg = 0.55 * vehicles$cty + 0.45 * vehicles$hwy)
vehicles
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
What are the cases, and how many are there?
Fuel economy data contains data for all cars sold in the US from 1984 to 2015. The package fueleconomy has 33,442 rows and 12 variables.
Describe the method of data collection.
The data is collected from the R package: fueleconomy. The fueleconomy package’s data was sourced from the EPA (Environmental Protection Agency). In this package, the data is stored in vehicles dataset.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
https://blog.rstudio.com/2014/07/23/new-data-packages/ https://www.fueleconomy.gov/feg/download.shtml
What is the response variable? Is it quantitative or qualitative?
The response variable is combined mpg. It is quantitative.
You should have two independent variables, one quantitative and one qualitative.
The two independent variables are number of cylinders and displacement.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(vehicles$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.80 16.70 19.70 20.11 22.60 54.40
# standard deviation
sd(vehicles$mpg)
## [1] 4.999651
qqnorm(vehicles$mpg)
qqline(vehicles$mpg)
hist(vehicles$mpg, breaks = 50)
ggplot(vehicles, aes(mpg)) + geom_histogram(bins = 50, aes(fill = factor(class)))
ggplot(vehicles, aes(mpg)) + geom_histogram(bins = 50, aes(fill = factor(cyl)))
ggplot(vehicles, aes(mpg)) + geom_histogram(bins = 50, aes(fill = factor(displ)))
ggplot(vehicles, aes(cyl, mpg)) + geom_boxplot(aes(fill = factor(cyl)))