# load data
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(fueleconomy)
vehicles <- fueleconomy::vehicles
Is there a correlation between a vehicles fuel efficiency and the number of cylinders it has?
There are 33,442 cases with cars from the year 1984 to 2015.
The data collected was a direct observational study from the EPA.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link. The data set was installed into RStudio https://blog.rstudio.org/2014/07/23/new-data-packages/ but originally comes from the EPA fuel economy website. Link - http://www.fueleconomy.gov/feg/download.shtml
What is the response variable, and what type is it (numerical/categorical)?
The response variable is the fuel efficiency for highway MPG (city is being ignored) and is numerical continuous.
What is the explanatory variable, and what type is it (numerical/categorival)?
The explanatory variable is the number of cylinders the vehicle has and is numerical discrete. Ther are others but they are ignored for this study.
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
##statistics summary for the number of cylinders in a vehicle
summary(vehicles$cyl)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.000 4.000 6.000 5.772 6.000 16.000 58
sd(vehicles$cyl, na.rm=TRUE)
## [1] 1.740931
var(vehicles$cyl, na.rm=TRUE)
## [1] 3.03084
hist(vehicles$cyl, breaks=30, main = 'Histogram of vehicle cylinders')
##this means that the mean cyl is 5.8 with a SD of 1.7 cylinders.
##statistics summary for the mpg highway data
summary(vehicles$hwy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 19.00 23.00 23.55 27.00 109.00
sd(vehicles$hwy, na.rm=TRUE)
## [1] 6.211417
var(vehicles$hwy, na.rm=TRUE)
## [1] 38.5817
hist(vehicles$hwy, breaks=30, main = 'Histogram of vehicle highway mpg')
#this means that the mean mpg is 23.55 with a SD of 6.2 mpg.
## is the data normally distributed?
qqnorm(vehicles$hwy, main= 'Normal QQ Plot for Highway MPG')
qqline(vehicles$hwy)
##The data shows it is right skewed. This can be overcome by sampling the sample in further analysis.
The next question is to determine if the the vehicle highway mpg is normally distributed.
qqnorm(vehicles$hwy, main= 'Normal QQ Plot for Highway MPG')
qqline(vehicles$hwy)
##The data shows it is right skewed. This can be overcome by sampling the sample in further analysis.
In analyzing the dataset, other variables MAY be introduced to determine if the response variable is influenced.