This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
Through analysis of a large dataset of vehicles, highway and city fuel economy will be compared between cars with electric vs. gas engines.
install.packages("fueleconomy", repos='http://cran.us.r-project.org')
##
## The downloaded binary packages are in
## /var/folders/mp/lz8604y53r940tqkpj4v6qpw0000gn/T//RtmpnDeq53/downloaded_packages
library("fueleconomy", lib.loc="/Library/Frameworks/R.framework/Versions/3.1/Resources/library")
df<-vehicles
head(df)
## id make model year class
## 1 27550 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5 1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6 1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
## trans drive cyl displ fuel hwy cty
## 1 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 2 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 3 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 4 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 5 Automatic 3-spd Rear-Wheel Drive 4 2.5 Regular 17 16
## 6 Automatic 3-spd Rear-Wheel Drive 6 4.2 Regular 13 13
The Factor for these analyses is type of engine and its levels are electric or gas.
summary(df)
## id make model year
## Min. : 1 Length:33442 Length:33442 Min. :1984
## 1st Qu.: 8361 Class :character Class :character 1st Qu.:1991
## Median :16724 Mode :character Mode :character Median :1999
## Mean :17038 Mean :1999
## 3rd Qu.:25265 3rd Qu.:2008
## Max. :34932 Max. :2015
##
## class trans drive cyl
## Length:33442 Length:33442 Length:33442 Min. : 2.00
## Class :character Class :character Class :character 1st Qu.: 4.00
## Mode :character Mode :character Mode :character Median : 6.00
## Mean : 5.77
## 3rd Qu.: 6.00
## Max. :16.00
## NA's :58
## displ fuel hwy cty
## Min. :0.00 Length:33442 Min. : 9.0 Min. : 6.0
## 1st Qu.:2.30 Class :character 1st Qu.: 19.0 1st Qu.: 15.0
## Median :3.00 Mode :character Median : 23.0 Median : 17.0
## Mean :3.35 Mean : 23.6 Mean : 17.5
## 3rd Qu.:4.30 3rd Qu.: 27.0 3rd Qu.: 20.0
## Max. :8.40 Max. :109.0 Max. :138.0
## NA's :57
#Types of engines
levels(factor(df$fuel))
## [1] "CNG" "Diesel"
## [3] "Electricity" "Gasoline or E85"
## [5] "Gasoline or natural gas" "Gasoline or propane"
## [7] "Midgrade" "Premium"
## [9] "Premium and Electricity" "Premium Gas or Electricity"
## [11] "Premium or E85" "Regular"
## [13] "Regular Gas and Electricity"
#Collection of all engines with an electric component
Electricity <- df[(df$fuel == "Electricity") | (df$fuel == "Premium Gas and Electricity") | (df$fuel == "Regular Gas and Electricity") | (df$fuel == "Premium and Electricity"),]
#Collection of all gas engines that do not have an electric component
Gas <- df[(df$fuel == "Midgrade") | (df$fuel == "Premium") | (df$fuel == "Regular") & (df$year >= 1998),]
#Determining range of time to analyze where both electric and gas engines existed
min(Electricity$year, na.rm = T)
## [1] 1998
max(Electricity$year, na.rm = T)
## [1] 2015
min(Gas$year, na.rm = T)
## [1] 1985
max(Gas$year, na.rm = T)
## [1] 2015
summary(Electricity)
## id make model year
## Min. :16424 Length:64 Length:64 Min. :1998
## 1st Qu.:31043 Class :character Class :character 1st Qu.:2011
## Median :33308 Mode :character Mode :character Median :2013
## Mean :31615 Mean :2010
## 3rd Qu.:33951 3rd Qu.:2014
## Max. :34918 Max. :2015
##
## class trans drive cyl
## Length:64 Length:64 Length:64 Min. :4.00
## Class :character Class :character Class :character 1st Qu.:4.00
## Mode :character Mode :character Mode :character Median :4.00
## Mean :4.22
## 3rd Qu.:4.00
## Max. :6.00
## NA's :55
## displ fuel hwy cty
## Min. :1.80 Length:64 Min. : 28.0 Min. : 23.0
## 1st Qu.:1.80 Class :character 1st Qu.: 53.5 1st Qu.: 57.2
## Median :2.00 Mode :character Median : 74.0 Median : 84.0
## Mean :2.04 Mean : 73.6 Mean : 83.9
## 3rd Qu.:2.00 3rd Qu.: 97.0 3rd Qu.:113.0
## Max. :3.00 Max. :109.0 Max. :138.0
## NA's :55
summary(Gas)
## id make model year
## Min. : 33 Length:18796 Length:18796 Min. :1985
## 1st Qu.:16858 Class :character Class :character 1st Qu.:2001
## Median :21830 Mode :character Mode :character Median :2006
## Mean :22170 Mean :2005
## 3rd Qu.:29220 3rd Qu.:2010
## Max. :34931 Max. :2015
## class trans drive cyl
## Length:18796 Length:18796 Length:18796 Min. : 2.00
## Class :character Class :character Class :character 1st Qu.: 4.00
## Mode :character Mode :character Mode :character Median : 6.00
## Mean : 5.88
## 3rd Qu.: 6.00
## Max. :16.00
## displ fuel hwy cty
## Min. :1.00 Length:18796 Min. :10 Min. : 6.0
## 1st Qu.:2.40 Class :character 1st Qu.:20 1st Qu.:15.0
## Median :3.00 Mode :character Median :24 Median :17.0
## Mean :3.35 Mean :24 Mean :17.4
## 3rd Qu.:4.20 3rd Qu.:27 3rd Qu.:19.0
## Max. :8.40 Max. :61 Max. :53.0
Highway (hwy) and City (cty) fuel economy are both continuous variables in this study. ### Response variables The highway and city fuel economy are resonse variables which are dependent models of cars which have either electric or gas engines. ### The Data: How is it organized and what does it look like? For the purposes of this study, the data is organized by types of engines, their year of production, and their respective highway or city fuel economy. ### Randomization Fuel economy numbers were randomized when obtained.
Data on highway and city fuel economy from the EPA will be compared between Electric and Gas based engines. ### What is the rationale for this design? To determine if electric or gas based engines have been fuel economy. ### Randomize: What is the Randomization Scheme? Randomization was involved prior to fuel economy numbers being generated. ### Replicate: Are there replicates and/or repeated measures? Replicates were involved prior to fuel economy numbers being generated. ### Block: Did you use blocking in the design? Blocking was not necessary in this study.
hist(Electricity$year,ylim=c(0,25),xlim=c(1998,2015), breaks =20, main = "Histogram of Electric car models per year", xlab = "Year")
hist(Gas$year, ylim = c(0,1500), xlim=c(1998,2015), breaks = 25, main = "Histogram of Gas car models per year", xlab = "Year")
boxplot(Gas$hwy,Electricity$hwy, main = "Highway fuel economy", names = c("Gas", "Electric"))
boxplot(Gas$hwy,Electricity$cty, main = "City fuel economy", names = c("Gas", "Electric"))
T-tests were initially done to analyze data
t.test(Gas$cty,Electricity$cty,var.equal=TRUE)
##
## Two Sample t-test
##
## data: Gas$cty and Electricity$cty
## t = -111.3, df = 18858, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -67.66 -65.31
## sample estimates:
## mean of x mean of y
## 17.44 83.92
t.test(Gas$hwy,Electricity$hwy,var.equal=TRUE)
##
## Two Sample t-test
##
## data: Gas$hwy and Electricity$hwy
## t = -72.73, df = 18858, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -50.93 -48.26
## sample estimates:
## mean of x mean of y
## 24.03 73.62
Normality of fuel economy were analyzed
qqnorm(Electricity$hwy,ylab="Highway Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of electric highway mileage")
qqnorm(Electricity$cty,ylab="City Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of electric city mileage")
qqnorm(Gas$hwy,ylab="Highway Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of gas highway mileage")
qqnorm(Gas$cty,ylab="City Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of gas city mileage")
# Shapiro-Wilk test of normality. Adequate if p < 0.1
shapiro.test(Electricity$cty)
##
## Shapiro-Wilk normality test
##
## data: Electricity$cty
## W = 0.9384, p-value = 0.003205
shapiro.test(Electricity$hwy)
##
## Shapiro-Wilk normality test
##
## data: Electricity$hwy
## W = 0.9202, p-value = 0.0005043
#Running test on a random sample of 5000 since you can't run on a bigger sample
shapiro.test(sample(Gas$cty,5000))
##
## Shapiro-Wilk normality test
##
## data: sample(Gas$cty, 5000)
## W = 0.9042, p-value < 2.2e-16
shapiro.test(sample(Gas$hwy,5000))
##
## Shapiro-Wilk normality test
##
## data: sample(Gas$hwy, 5000)
## W = 0.9745, p-value < 2.2e-16
wilcox.test(Gas$cty,Electricity$cty)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Gas$cty and Electricity$cty
## W = 2514, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(Gas$hwy,Electricity$hwy)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Gas$hwy and Electricity$hwy
## W = 11260, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0