Recipe 2: Two or more Factor Experiments

Recipes for the Design of Experiments

Zoe Konrad

Rensselaer Polytechnic Institute

Fall 2014 v1

1. Setting

System under test

library("nasaweather")
x<-storms
attach(x)
head(x)
##      name year month day hour  lat  long pressure wind                type
## 1 Allison 1995     6   3    0 17.4 -84.3     1005   30 Tropical Depression
## 2 Allison 1995     6   3    6 18.3 -84.9     1004   30 Tropical Depression
## 3 Allison 1995     6   3   12 19.3 -85.7     1003   35      Tropical Storm
## 4 Allison 1995     6   3   18 20.6 -85.8     1001   40      Tropical Storm
## 5 Allison 1995     6   4    0 22.0 -86.0      997   50      Tropical Storm
## 6 Allison 1995     6   4    6 23.3 -86.3      995   60      Tropical Storm
##   seasday
## 1       3
## 2       3
## 3       3
## 4       3
## 5       4
## 6       4

Factors and Levels

There are three factors of interest, each with multiple levels.

Latitude: is a continuous variable which we will manually categorize into 6 ranges.

plot(pressure~lat)

plot of chunk unnamed-chunk-2

lat[lat<15] = "0-15"
lat[lat<20 & lat>=15] = "15-20"
lat[lat<30 & lat>=20] = "20-30"
lat[lat<40 & lat>=30] = "30-40"
lat[lat<50 & lat>=40] = "40-50"
lat[lat>=50] = "50-80"

lat <- as.factor(lat)
summary(lat)
##  0-15 15-20 20-30 30-40 40-50 50-80 
##   400   573   772   663   235   104

Type: of storm has 4 levels (Extratropical, Hurricane, Tropical Depression, Tropical Storm)

type <- as.factor(type)
summary(type)
##       Extratropical           Hurricane Tropical Depression 
##                 412                 896                 513 
##      Tropical Storm 
##                 926

Month is a discrete continuous variable from 6-12, June-Dec (because there are no storms in the other months.)

month <- as.factor(month)
summary(month)
##   6   7   8   9  10  11  12 
##  83 251 720 947 569 170   7

Response variables

The response variable we are interested in is pressure.

summary(pressure)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     905     980     995     990    1000    1020

2. (Experimental) Design

Three-factor design.

3. (Statistical) Analysis

Exploratory Data Analysis Graphics

plot(pressure~type+month+lat)

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

There does appear to be at least slight differences in means accross all the factors and their levels.

par(mfrow=c(1,2))
interaction.plot(type, lat, pressure)
interaction.plot(type, month, pressure)

plot of chunk unnamed-chunk-7

interaction.plot(month, lat, pressure)

plot of chunk unnamed-chunk-7

There are no clearly significant interactions between type/lat nor type/month but there is obvious interaction between month/lat.

Testing

model <- aov(pressure~type+month+lat+month*lat)
anova(model)
## Analysis of Variance Table
## 
## Response: pressure
##             Df Sum Sq Mean Sq F value Pr(>F)    
## type         3 538001  179334 1410.52 <2e-16 ***
## month        6  15228    2538   19.96 <2e-16 ***
## lat          5  37405    7481   58.84 <2e-16 ***
## month:lat   26  25029     963    7.57 <2e-16 ***
## Residuals 2706 344041     127                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA results we see that each of our three factors are found to have a significant difference in means, thus we can reject the null hypothesis that randomization alone can account for the variation in pressure of the storms.

Storm type, month, and latitude are all explanatory factors in the variation in pressure of the storms.

Estimation (of Parameters)

Find following comparison of means accross factor levels:

TukeyHSD(aov(pressure~as.factor(type)))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = pressure ~ as.factor(type))
## 
## $`as.factor(type)`
##                                       diff     lwr     upr p adj
## Hurricane-Extratropical            -23.516 -25.414 -21.619     0
## Tropical Depression-Extratropical   12.202  10.093  14.311     0
## Tropical Storm-Extratropical         3.743   1.855   5.630     0
## Tropical Depression-Hurricane       35.718  33.954  37.483     0
## Tropical Storm-Hurricane            27.259  25.765  28.753     0
## Tropical Storm-Tropical Depression  -8.459 -10.214  -6.705     0
plot(aggregate(pressure, by=list(type), FUN=mean), xlab="type", ylab="pressure")

plot of chunk unnamed-chunk-9

aggregate(pressure, by=list(type), FUN=mean)
##               Group.1      x
## 1       Extratropical  994.0
## 2           Hurricane  970.4
## 3 Tropical Depression 1006.2
## 4      Tropical Storm  997.7

Tropical Depressions have the highest pressure (around 1005) while Hurricanes have the lowest pressure (around 970). Extratropical and Tropical storms have similar, but still statistically distinct pressure values (around 995).

par(mfrow=c(1,2))
plot(TukeyHSD(aov(pressure~as.factor(month)))) ; plot(aggregate(pressure, by=list(month), FUN=mean), xlab="month", ylab="pressure")

plot of chunk unnamed-chunk-10

aggregate(pressure, by=list(month), FUN=mean)
##   Group.1      x
## 1       6 1001.4
## 2       7  999.9
## 3       8  989.8
## 4       9  984.9
## 5      10  991.0
## 6      11  993.0
## 7      12  984.9

Pressure is highest in storms in Jule and July (around 1000), dipping down to be lowest in September and again in December (around 984). Not all groups have a statistically significant difference in means.

par(mfrow=c(1,2))
plot(TukeyHSD(aov(pressure~as.factor(lat)))) ; plot(aggregate(pressure, by=list(lat), FUN=mean), xlab="lat", ylab="pressure")

plot of chunk unnamed-chunk-11

aggregate(pressure, by=list(lat), FUN=mean)
##   Group.1      x
## 1    0-15 1000.2
## 2   15-20  987.9
## 3   20-30  986.3
## 4   30-40  990.5
## 5   40-50  990.0
## 6   50-80  982.5

With data-snooping, we can see that the latitude could eaisily have been categorized into fewer groups with similar results (0-15, 15-30, 30-50, 50+). Pressure is highest in storms at lower latitudes, less than 15, (around 1000) and pressure is lowest in storms at higher latitudes, greater than 50, (around 980). Not all groups have a statistically significant difference in means.

Diagnostics/Model Adequacy Checking

qqnorm(residuals(model)) ; qqline(residuals(model))

plot of chunk unnamed-chunk-12

Unfortunately, the residuals of the model are not perfectly normally distributed on the low tail because the response variable pressure is not normally distributed. This indicates that the model may not be the most appropriate fit.

4. References to the literature