Recipe 1: Recipes for the Design of Experiments/Chapter 1: One Factor, Two Level Experiments

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Recipes for the Design of Experiments: Recipe Outline

as of August 28, 2014, superceding the version of August 24. Always use the most recent version.

Recipes for the Design of Experiments

John Mariani

ISMMS

September 15th, 2015; Version 1

1. Setting

System under test

Through analysis of a large dataset of vehicles, highway and city fuel economy will be compared between cars with electric vs. gas engines.

install.packages("fueleconomy", repos='http://cran.us.r-project.org')
## 
## The downloaded binary packages are in
##  /var/folders/mp/lz8604y53r940tqkpj4v6qpw0000gn/T//RtmpnDeq53/downloaded_packages
library("fueleconomy", lib.loc="/Library/Frameworks/R.framework/Versions/3.1/Resources/library")
df<-vehicles
head(df)
##      id       make               model year                       class
## 1 27550 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5  1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6  1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
##             trans            drive cyl displ    fuel hwy cty
## 1 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 2 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 3 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 4 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 5 Automatic 3-spd Rear-Wheel Drive   4   2.5 Regular  17  16
## 6 Automatic 3-spd Rear-Wheel Drive   6   4.2 Regular  13  13

Factors and Levels

The Factor for these analyses is type of engine and its levels are electric or gas.

summary(df)
##        id            make              model                year     
##  Min.   :    1   Length:33442       Length:33442       Min.   :1984  
##  1st Qu.: 8361   Class :character   Class :character   1st Qu.:1991  
##  Median :16724   Mode  :character   Mode  :character   Median :1999  
##  Mean   :17038                                         Mean   :1999  
##  3rd Qu.:25265                                         3rd Qu.:2008  
##  Max.   :34932                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl       
##  Length:33442       Length:33442       Length:33442       Min.   : 2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.00  
##                                                           Mean   : 5.77  
##                                                           3rd Qu.: 6.00  
##                                                           Max.   :16.00  
##                                                           NA's   :58     
##      displ          fuel                hwy             cty       
##  Min.   :0.00   Length:33442       Min.   :  9.0   Min.   :  6.0  
##  1st Qu.:2.30   Class :character   1st Qu.: 19.0   1st Qu.: 15.0  
##  Median :3.00   Mode  :character   Median : 23.0   Median : 17.0  
##  Mean   :3.35                      Mean   : 23.6   Mean   : 17.5  
##  3rd Qu.:4.30                      3rd Qu.: 27.0   3rd Qu.: 20.0  
##  Max.   :8.40                      Max.   :109.0   Max.   :138.0  
##  NA's   :57
#Types of engines
levels(factor(df$fuel)) 
##  [1] "CNG"                         "Diesel"                     
##  [3] "Electricity"                 "Gasoline or E85"            
##  [5] "Gasoline or natural gas"     "Gasoline or propane"        
##  [7] "Midgrade"                    "Premium"                    
##  [9] "Premium and Electricity"     "Premium Gas or Electricity" 
## [11] "Premium or E85"              "Regular"                    
## [13] "Regular Gas and Electricity"
#Collection of all engines with an electric component
Electricity <- df[(df$fuel == "Electricity") | (df$fuel == "Premium Gas and Electricity") | (df$fuel == "Regular Gas and Electricity") | (df$fuel == "Premium and Electricity"),] 

#Collection of all gas engines that do not have an electric component
Gas <- df[(df$fuel == "Midgrade") | (df$fuel == "Premium") | (df$fuel == "Regular") & (df$year >= 1998),]

#Determining range of time to analyze where both electric and gas engines existed
min(Electricity$year, na.rm = T)
## [1] 1998
max(Electricity$year, na.rm = T)
## [1] 2015
min(Gas$year, na.rm = T)
## [1] 1985
max(Gas$year, na.rm = T)
## [1] 2015
summary(Electricity)
##        id            make              model                year     
##  Min.   :16424   Length:64          Length:64          Min.   :1998  
##  1st Qu.:31043   Class :character   Class :character   1st Qu.:2011  
##  Median :33308   Mode  :character   Mode  :character   Median :2013  
##  Mean   :31615                                         Mean   :2010  
##  3rd Qu.:33951                                         3rd Qu.:2014  
##  Max.   :34918                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl      
##  Length:64          Length:64          Length:64          Min.   :4.00  
##  Class :character   Class :character   Class :character   1st Qu.:4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :4.00  
##                                                           Mean   :4.22  
##                                                           3rd Qu.:4.00  
##                                                           Max.   :6.00  
##                                                           NA's   :55    
##      displ          fuel                hwy             cty       
##  Min.   :1.80   Length:64          Min.   : 28.0   Min.   : 23.0  
##  1st Qu.:1.80   Class :character   1st Qu.: 53.5   1st Qu.: 57.2  
##  Median :2.00   Mode  :character   Median : 74.0   Median : 84.0  
##  Mean   :2.04                      Mean   : 73.6   Mean   : 83.9  
##  3rd Qu.:2.00                      3rd Qu.: 97.0   3rd Qu.:113.0  
##  Max.   :3.00                      Max.   :109.0   Max.   :138.0  
##  NA's   :55
summary(Gas)
##        id            make              model                year     
##  Min.   :   33   Length:18796       Length:18796       Min.   :1985  
##  1st Qu.:16858   Class :character   Class :character   1st Qu.:2001  
##  Median :21830   Mode  :character   Mode  :character   Median :2006  
##  Mean   :22170                                         Mean   :2005  
##  3rd Qu.:29220                                         3rd Qu.:2010  
##  Max.   :34931                                         Max.   :2015  
##     class              trans              drive                cyl       
##  Length:18796       Length:18796       Length:18796       Min.   : 2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.00  
##                                                           Mean   : 5.88  
##                                                           3rd Qu.: 6.00  
##                                                           Max.   :16.00  
##      displ          fuel                hwy          cty      
##  Min.   :1.00   Length:18796       Min.   :10   Min.   : 6.0  
##  1st Qu.:2.40   Class :character   1st Qu.:20   1st Qu.:15.0  
##  Median :3.00   Mode  :character   Median :24   Median :17.0  
##  Mean   :3.35                      Mean   :24   Mean   :17.4  
##  3rd Qu.:4.20                      3rd Qu.:27   3rd Qu.:19.0  
##  Max.   :8.40                      Max.   :61   Max.   :53.0

Continuous variables (if any)

Highway (hwy) and City (cty) fuel economy are both continuous variables in this study. ### Response variables The highway and city fuel economy are resonse variables which are dependent models of cars which have either electric or gas engines. ### The Data: How is it organized and what does it look like? For the purposes of this study, the data is organized by types of engines, their year of production, and their respective highway or city fuel economy. ### Randomization Fuel economy numbers were randomized when obtained.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

Data on highway and city fuel economy from the EPA will be compared between Electric and Gas based engines. ### What is the rationale for this design? To determine if electric or gas based engines have been fuel economy. ### Randomize: What is the Randomization Scheme? Randomization was involved prior to fuel economy numbers being generated. ### Replicate: Are there replicates and/or repeated measures? Replicates were involved prior to fuel economy numbers being generated. ### Block: Did you use blocking in the design? Blocking was not necessary in this study.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

hist(Electricity$year,ylim=c(0,25),xlim=c(1998,2015), breaks =20, main = "Histogram of Electric car models per year", xlab = "Year")

plot of chunk unnamed-chunk-3

hist(Gas$year, ylim = c(0,1500), xlim=c(1998,2015), breaks = 25, main = "Histogram of Gas car models per year", xlab = "Year")

plot of chunk unnamed-chunk-3

boxplot(Gas$hwy,Electricity$hwy, main = "Highway fuel economy", names = c("Gas", "Electric"))

plot of chunk unnamed-chunk-3

boxplot(Gas$hwy,Electricity$cty, main = "City fuel economy", names = c("Gas", "Electric"))

plot of chunk unnamed-chunk-3

Testing

T-tests were initially done to analyze data

t.test(Gas$cty,Electricity$cty,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  Gas$cty and Electricity$cty
## t = -111.3, df = 18858, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -67.66 -65.31
## sample estimates:
## mean of x mean of y 
##     17.44     83.92
t.test(Gas$hwy,Electricity$hwy,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  Gas$hwy and Electricity$hwy
## t = -72.73, df = 18858, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -50.93 -48.26
## sample estimates:
## mean of x mean of y 
##     24.03     73.62

Diagnostics/Model Adequacy Checking

Normality of fuel economy were analyzed

qqnorm(Electricity$hwy,ylab="Highway Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of electric highway mileage")

plot of chunk unnamed-chunk-5

qqnorm(Electricity$cty,ylab="City Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of electric city mileage")

plot of chunk unnamed-chunk-5

qqnorm(Gas$hwy,ylab="Highway Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of gas highway mileage")

plot of chunk unnamed-chunk-5

qqnorm(Gas$cty,ylab="City Mileage",ylim=c(0,150), main = "Normal Q-Q Plot of gas city mileage")