library("nasaweather")
x<-storms
attach(x)
head(x)
## name year month day hour lat long pressure wind type
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical Depression
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical Depression
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical Storm
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical Storm
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50 Tropical Storm
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical Storm
## seasday
## 1 3
## 2 3
## 3 3
## 4 3
## 5 4
## 6 4
There are three factors of interest, each with multiple levels.
Latitude: is a continuous variable which we will manually categorize into 6 ranges.
plot(pressure~lat)
lat[lat<15] = "0-15"
lat[lat<20 & lat>=15] = "15-20"
lat[lat<30 & lat>=20] = "20-30"
lat[lat<40 & lat>=30] = "30-40"
lat[lat<50 & lat>=40] = "40-50"
lat[lat>=50] = "50-80"
lat <- as.factor(lat)
summary(lat)
## 0-15 15-20 20-30 30-40 40-50 50-80
## 400 573 772 663 235 104
Type: of storm has 4 levels (Extratropical, Hurricane, Tropical Depression, Tropical Storm)
type <- as.factor(type)
summary(type)
## Extratropical Hurricane Tropical Depression
## 412 896 513
## Tropical Storm
## 926
Month is a discrete continuous variable from 6-12, June-Dec (because there are no storms in the other months.)
month <- as.factor(month)
summary(month)
## 6 7 8 9 10 11 12
## 83 251 720 947 569 170 7
The response variable we are interested in is pressure.
summary(pressure)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 905 980 995 990 1000 1020
Three-factor design.
plot(pressure~type+month+lat)
There does appear to be at least slight differences in means accross all the factors and their levels.
par(mfrow=c(1,2))
interaction.plot(type, lat, pressure)
interaction.plot(type, month, pressure)
interaction.plot(month, lat, pressure)
There are no clearly significant interactions between type/lat nor type/month but there is obvious interaction between month/lat.
model <- aov(pressure~type+month+lat+month*lat)
anova(model)
## Analysis of Variance Table
##
## Response: pressure
## Df Sum Sq Mean Sq F value Pr(>F)
## type 3 538001 179334 1410.52 <2e-16 ***
## month 6 15228 2538 19.96 <2e-16 ***
## lat 5 37405 7481 58.84 <2e-16 ***
## month:lat 26 25029 963 7.57 <2e-16 ***
## Residuals 2706 344041 127
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA results we see that each of our three factors are found to have a significant difference in means, thus we can reject the null hypothesis that randomization alone can account for the variation in pressure of the storms.
Storm type, month, and latitude are all explanatory factors in the variation in pressure of the storms.
Find following comparison of means accross factor levels:
TukeyHSD(aov(pressure~as.factor(type)))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = pressure ~ as.factor(type))
##
## $`as.factor(type)`
## diff lwr upr p adj
## Hurricane-Extratropical -23.516 -25.414 -21.619 0
## Tropical Depression-Extratropical 12.202 10.093 14.311 0
## Tropical Storm-Extratropical 3.743 1.855 5.630 0
## Tropical Depression-Hurricane 35.718 33.954 37.483 0
## Tropical Storm-Hurricane 27.259 25.765 28.753 0
## Tropical Storm-Tropical Depression -8.459 -10.214 -6.705 0
plot(aggregate(pressure, by=list(type), FUN=mean), xlab="type", ylab="pressure")
aggregate(pressure, by=list(type), FUN=mean)
## Group.1 x
## 1 Extratropical 994.0
## 2 Hurricane 970.4
## 3 Tropical Depression 1006.2
## 4 Tropical Storm 997.7
Tropical Depressions have the highest pressure (around 1005) while Hurricanes have the lowest pressure (around 970). Extratropical and Tropical storms have similar, but still statistically distinct pressure values (around 995).
par(mfrow=c(1,2))
plot(TukeyHSD(aov(pressure~as.factor(month)))) ; plot(aggregate(pressure, by=list(month), FUN=mean), xlab="month", ylab="pressure")
aggregate(pressure, by=list(month), FUN=mean)
## Group.1 x
## 1 6 1001.4
## 2 7 999.9
## 3 8 989.8
## 4 9 984.9
## 5 10 991.0
## 6 11 993.0
## 7 12 984.9
Pressure is highest in storms in Jule and July (around 1000), dipping down to be lowest in September and again in December (around 984). Not all groups have a statistically significant difference in means.
par(mfrow=c(1,2))
plot(TukeyHSD(aov(pressure~as.factor(lat)))) ; plot(aggregate(pressure, by=list(lat), FUN=mean), xlab="lat", ylab="pressure")
aggregate(pressure, by=list(lat), FUN=mean)
## Group.1 x
## 1 0-15 1000.2
## 2 15-20 987.9
## 3 20-30 986.3
## 4 30-40 990.5
## 5 40-50 990.0
## 6 50-80 982.5
With data-snooping, we can see that the latitude could eaisily have been categorized into fewer groups with similar results (0-15, 15-30, 30-50, 50+). Pressure is highest in storms at lower latitudes, less than 15, (around 1000) and pressure is lowest in storms at higher latitudes, greater than 50, (around 980). Not all groups have a statistically significant difference in means.
qqnorm(residuals(model)) ; qqline(residuals(model))
Unfortunately, the residuals of the model are not perfectly normally distributed on the low tail because the response variable pressure is not normally distributed. This indicates that the model may not be the most appropriate fit.