This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
Choose one of the large datasets listed on the Realtime Board (e.g., babynames or nasaweather)
Make sure you have > 1000 data
What is the problem that you were given?
In this study, we explore the effects of year of occurrence and storm type on the air pressure at the storm's center. To do so, a two-factor, multi-level approach is used to conduct the analysis of variance.
install.packages("nasaweather")
## Installing package into 'C:/Users/wei/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
library("nasaweather", lib.loc="C:/Users/wei/Documents/R/win-library/3.1")
data1<-data.frame(storms)
summary(data1)
## name year month day
## Length:2747 Min. :1995 Min. : 6.0 Min. : 1
## Class :character 1st Qu.:1995 1st Qu.: 8.0 1st Qu.: 9
## Mode :character Median :1997 Median : 9.0 Median :18
## Mean :1997 Mean : 8.8 Mean :17
## 3rd Qu.:1999 3rd Qu.:10.0 3rd Qu.:25
## Max. :2000 Max. :12.0 Max. :31
## hour lat long pressure
## Min. : 0.00 Min. : 8.3 Min. :-107.3 Min. : 905
## 1st Qu.: 3.50 1st Qu.:17.2 1st Qu.: -77.6 1st Qu.: 980
## Median :12.00 Median :25.0 Median : -60.9 Median : 995
## Mean : 9.06 Mean :26.7 Mean : -60.9 Mean : 990
## 3rd Qu.:18.00 3rd Qu.:33.9 3rd Qu.: -45.8 3rd Qu.:1004
## Max. :18.00 Max. :70.7 Max. : 1.0 Max. :1019
## wind type seasday
## Min. : 15.0 Length:2747 Min. : 3
## 1st Qu.: 35.0 Class :character 1st Qu.: 84
## Median : 50.0 Mode :character Median :103
## Mean : 54.7 Mean :103
## 3rd Qu.: 70.0 3rd Qu.:125
## Max. :155.0 Max. :185
There are two factors in this analysis: for “Year of occurrence”, it has 6 levels ranging from year 1995 to year 2000; for “storm type”, it has 4 levels (Tropical Depression, Tropical Storm, Hurricane, or Extratropical).
data1$year <-as.factor(data1$year)
nlevels(data1$year)
## [1] 6
data1$type<-as.factor(data1$type)
nlevels(data1$type)
## [1] 4
There are a number of continuous variables in the dataset, for example, “lat” and “long”, “pressure”, etc.
The response variable in this study is “pressure”, which is the air pressure at the center of the storm.
The data are originated from the National Hurricane Center's archive of Tropical Cyclone Report. It has information on date and name of the storm, hour and location of the occurrence, air pressure and storm type.
The data recorded all the storms occurred during the given time period, therefore the whole population characteristics were captured and we may see these data as randomed.
In this study, we are trying to explore whether the year of occurrence and the storm type were affecting the air pressure at the center of the storm. To do so, we conduct a two factor, multi-level analysis of variance and estimnate a linear regression model to capture the quantitative effects.
We are interested in finding out whether there is a trend in air pressure according to the year of occurrence or the storm type, so that we can predict the effects of future storms more accurately.
The data were trying to capture the characteristics of the entire population. There's no specially designed experiment to include randomization scheme.
No, there are no replicates/repeated measures.
This study used all the samples to conduct the analysis, so there is no blocking in the design.
For the first boxplot (by year of occurrence), the mean of air pressure at storm center doesn't show much variance in different levels (years), there are a number of outliers in each year, especially for year 1998. For the second boxplot (by storm type), we can see that the mean of the air pressure at storm center varies a lot for different storm types, indicating that the randomness in the air pressure might have something to do with the storm type.
boxplot(pressure~year, data1)
boxplot(pressure~type,data1)
We conduct three analysis of variance, one for the effect of year of occurrence, one for the storm type, and the last one for the interaction effect of year of occurrence and storm type. The results show that the randomness in air pressure at storm center are due to something else than simple randomization (the probability of attribute to simple randomization are very small). We may say that the year of occurrence, storm type and their interaction have an effect on the air pressure at the storm center.
model1<-aov(pressure~year,data1)
anova(model1)
## Analysis of Variance Table
##
## Response: pressure
## Df Sum Sq Mean Sq F value Pr(>F)
## year 5 22560 4512 13.2 1e-12 ***
## Residuals 2741 937143 342
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2<-aov(pressure~type,data1)
anova(model2)
## Analysis of Variance Table
##
## Response: pressure
## Df Sum Sq Mean Sq F value Pr(>F)
## type 3 538001 179334 1166 <2e-16 ***
## Residuals 2743 421702 154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model3<-aov(pressure~year*type,data1)
anova(model3)
## Analysis of Variance Table
##
## Response: pressure
## Df Sum Sq Mean Sq F value Pr(>F)
## year 5 22560 4512 30.75 <2e-16 ***
## type 3 528623 176208 1200.88 <2e-16 ***
## year:type 15 8967 598 4.07 2e-07 ***
## Residuals 2723 399553 147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We use three testing methods to varify the model adequacy. For model1, the qqplot shows that the data generally follows the normal distribution assumption. For model2, the plot for fitted reponse variable against residuals shows that there's no obvious trend in the residues, ensuring the randomness. For model3. the interaction plot shows that interaction effect does exist, which is consistent with our model findings.
qqnorm(residuals(model1))
plot(fitted(model2),residuals(model2))
interaction.plot(data1$year,data1$type,data1$pressure)
http://cran.r-project.org/web/packages/nasaweather/nasaweather.pdf