This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
This study takes a look at storm data from National Hurricane Center. It tracks different tropical cyclones through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005. It includes various metadata about each storm including name, year, month, date, hour, latitude, longitude, type, air pressure, maximum wind speeds, and day of the hurricane season. This specific recipe will be taking a look at the air pressure at the storm’s center (in millibars) for different types of storms.
remove(list=ls())
install.packages("nasaweather", repos='http://cran.us.r-project.org')
## package 'nasaweather' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Caroline\AppData\Local\Temp\RtmpwjOvpn\downloaded_packages
require(nasaweather)
## Loading required package: nasaweather
#library("nasaweather", lib.loc="~/R/win-library/3.0/")
x<-storms
attach(storms)
## The following object is masked from package:datasets:
##
## pressure
head(storms)
## name year month day hour lat long pressure wind type
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical Depression
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical Depression
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical Storm
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical Storm
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50 Tropical Storm
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical Storm
## seasday
## 1 3
## 2 3
## 3 3
## 4 3
## 5 4
## 6 4
str(storms)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2747 obs. of 11 variables:
## $ name : chr "Allison" "Allison" "Allison" "Allison" ...
## $ year : int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
## $ month : int 6 6 6 6 6 6 6 6 6 6 ...
## $ day : int 3 3 3 3 4 4 4 4 5 5 ...
## $ hour : int 0 6 12 18 0 6 12 18 0 6 ...
## $ lat : num 17.4 18.3 19.3 20.6 22 23.3 24.7 26.2 27.6 28.5 ...
## $ long : num -84.3 -84.9 -85.7 -85.8 -86 -86.3 -86.2 -86.2 -86.1 -85.6 ...
## $ pressure: int 1005 1004 1003 1001 997 995 987 988 988 990 ...
## $ wind : int 30 30 35 40 50 60 65 65 65 60 ...
## $ type : chr "Tropical Depression" "Tropical Depression" "Tropical Storm" "Tropical Storm" ...
## $ seasday : int 3 3 3 3 4 4 4 4 5 5 ...
The factor that was used in this analysis was Storm Type. The levels analyzed were Tropical Storms and Hurricanes. The other levels in this factor were Extratropical and Tropical Depression.
head(x)
## name year month day hour lat long pressure wind type
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical Depression
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical Depression
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical Storm
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical Storm
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50 Tropical Storm
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical Storm
## seasday
## 1 3
## 2 3
## 3 3
## 4 3
## 5 4
## 6 4
tail(x)
## name year month day hour lat long pressure wind type
## 2742 Nadine 2000 10 21 6 33.3 -53.5 1000 50 Tropical Storm
## 2743 Nadine 2000 10 21 12 34.1 -52.3 1000 50 Tropical Storm
## 2744 Nadine 2000 10 21 18 34.8 -51.3 1000 45 Tropical Storm
## 2745 Nadine 2000 10 22 0 35.7 -50.5 1004 40 Extratropical
## 2746 Nadine 2000 10 22 6 37.0 -49.0 1005 40 Extratropical
## 2747 Nadine 2000 10 22 12 39.0 -47.0 1005 35 Extratropical
## seasday
## 2742 143
## 2743 143
## 2744 143
## 2745 144
## 2746 144
## 2747 144
summary(x)
## name year month day
## Length:2747 Min. :1995 Min. : 6.0 Min. : 1
## Class :character 1st Qu.:1995 1st Qu.: 8.0 1st Qu.: 9
## Mode :character Median :1997 Median : 9.0 Median :18
## Mean :1997 Mean : 8.8 Mean :17
## 3rd Qu.:1999 3rd Qu.:10.0 3rd Qu.:25
## Max. :2000 Max. :12.0 Max. :31
## hour lat long pressure
## Min. : 0.00 Min. : 8.3 Min. :-107.3 Min. : 905
## 1st Qu.: 3.50 1st Qu.:17.2 1st Qu.: -77.6 1st Qu.: 980
## Median :12.00 Median :25.0 Median : -60.9 Median : 995
## Mean : 9.06 Mean :26.7 Mean : -60.9 Mean : 990
## 3rd Qu.:18.00 3rd Qu.:33.9 3rd Qu.: -45.8 3rd Qu.:1004
## Max. :18.00 Max. :70.7 Max. : 1.0 Max. :1019
## wind type seasday
## Min. : 15.0 Length:2747 Min. : 3
## 1st Qu.: 35.0 Class :character 1st Qu.: 84
## Median : 50.0 Mode :character Median :103
## Mean : 54.7 Mean :103
## 3rd Qu.: 70.0 3rd Qu.:125
## Max. :155.0 Max. :185
The continuous variables in this dataset are longitude, latitude, air pressure, and wind speed.
The response variables in this dataset are air pressure and wind speed.
The data from ‘storms’ describes data about the tropical cyclones that are tracked through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005. The information about storms include various metadata about each storm including name, year, month, date, hour, latitude, longitude, type, air pressure, maximum wind speeds, and day of the hurricane season. There are four levels to type factor which includes Extratropical, Tropical Depression, Hurricane, and Tropical Storm.
This data originated from the National Hurricane Center’s archive of Tropical Cyclone reports, handscraped from track tables of individual tropical cyclone reports. We can assume that this data was collected using proper randomization techniques.
This data was publically available for anyone to use and perform analysis on. In this analysis, I will be testing data in order to see if there is a different in air pressure at the center of the storm for two different types of storms: Tropical Storms and Hurricanes. A two-sample t-test will be performed in order to determine if there was a difference between the means.
The null hypothesis that will be tested is:
The mean air pressure for Tropical Storms is equal to the mean air pressure for Hurricanes.
The rationale for the collection of data was just for the National Hurricane Center to gather information on the tropical cyclones that travel through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005.
This data was collected with no intention, just for data collection.
No, there were no replicates or repeated measures.
The original dataset was organized without experimental groups, with measurements recorded based on certain variables. For this analysis, the data from two different types of Tropical cyclones were used: Hurricans and Tropical Storms.
# Logical vector identifying all Tropical Storms
sub1 <- subset(x, x$type =='Tropical Storm')
is.data.frame(sub1)
## [1] TRUE
summary (sub1)
## name year month day
## Length:926 Min. :1995 Min. : 6.00 Min. : 1.0
## Class :character 1st Qu.:1995 1st Qu.: 8.00 1st Qu.: 9.0
## Mode :character Median :1997 Median : 9.00 Median :18.0
## Mean :1997 Mean : 8.72 Mean :16.9
## 3rd Qu.:1999 3rd Qu.:10.00 3rd Qu.:24.0
## Max. :2000 Max. :12.00 Max. :31.0
## hour lat long pressure
## Min. : 0.00 Min. :10.2 Min. :-99.3 Min. : 935
## 1st Qu.: 6.00 1st Qu.:16.0 1st Qu.:-76.1 1st Qu.: 994
## Median : 6.00 Median :22.0 Median :-60.8 Median :1000
## Mean : 8.97 Mean :23.9 Mean :-61.7 Mean : 998
## 3rd Qu.:12.00 3rd Qu.:30.5 3rd Qu.:-46.1 3rd Qu.:1004
## Max. :18.00 Max. :50.7 Max. :-19.4 Max. :1013
## wind type seasday
## Min. : 35.0 Length:926 Min. : 3
## 1st Qu.: 40.0 Class :character 1st Qu.: 81
## Median : 45.0 Mode :character Median :101
## Mean : 47.3 Mean :100
## 3rd Qu.: 55.0 3rd Qu.:127
## Max. :120.0 Max. :184
# histogram of the pressure of all Tropical Storms
hist(sub1$pressure,xlim=c(900,1020),ylim=c(0,400))
# Logical vector identifying all Hurricanes
sub2 <-subset(x, x$type =='Hurricane')
is.data.frame(sub2)
## [1] TRUE
summary (sub2)
## name year month day
## Length:896 Min. :1995 Min. : 6.00 Min. : 1.0
## Class :character 1st Qu.:1995 1st Qu.: 8.00 1st Qu.:12.0
## Mode :character Median :1998 Median : 9.00 Median :20.0
## Mean :1997 Mean : 8.82 Mean :18.6
## 3rd Qu.:1999 3rd Qu.: 9.00 3rd Qu.:26.0
## Max. :2000 Max. :12.00 Max. :31.0
## hour lat long pressure
## Min. : 0.00 Min. :10.5 Min. :-97.9 Min. : 905
## 1st Qu.: 6.00 1st Qu.:18.0 1st Qu.:-73.1 1st Qu.: 961
## Median :12.00 Median :24.2 Median :-61.4 Median : 974
## Mean : 9.05 Mean :25.1 Mean :-61.9 Mean : 970
## 3rd Qu.:18.00 3rd Qu.:31.3 3rd Qu.:-49.1 3rd Qu.: 984
## Max. :18.00 Max. :48.3 Max. :-25.2 Max. :1005
## wind type seasday
## Min. : 65.0 Length:896 Min. : 4
## 1st Qu.: 70.0 Class :character 1st Qu.: 88
## Median : 80.0 Mode :character Median :102
## Mean : 84.7 Mean :105
## 3rd Qu.: 95.0 3rd Qu.:120
## Max. :155.0 Max. :184
# histogram of the pressure of all Hurricanes
hist(sub2$pressure,xlim=c(900,1020),ylim=c(0,400))
par(mfrow=c(2,1))
hist(sub1$pressure,xlim=c(900,1020),ylim=c(0,400))
hist(sub2$pressure,xlim=c(900,1020),ylim=c(0,400))
#Here we will just look at the boxplots to see the data from a different view.
par(mfrow=c(1,2))
boxplot(sub1$pressure, ylim = c(900,1020), main = "Tropical Storm pressure", ylab = "Pressure (in millibars)")
boxplot(sub2$pressure, ylim = c(900,1020), main = "Hurricane pressure", ylab = "Pressure (in millibars)")
A two sample t-test will be performed to test the above hypothesis.
t.test(sub1$pressure, sub2$pressure,var.equal=TRUE)
##
## Two Sample t-test
##
## data: sub1$pressure and sub2$pressure
## t = 43.12, df = 1820, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 26.02 28.50
## sample estimates:
## mean of x mean of y
## 997.7 970.4
From the t-test, it is clear that we can reject the null hypothesis that the mean air pressure of Tropical Storms is equal to the mean air pressure of Hurricanes.
Next, an estimation of parameters is performed in order to help estimate the population. A regression test is performed in order to estimate the correlation of the type of storms and the air pressure at the center of the storms.
fit <- lm(pressure~type, x)
summary(fit)
##
## Call:
## lm(formula = pressure ~ type, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.44 -4.70 1.56 7.30 34.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 993.954 0.611 1627.1 < 2e-16 ***
## typeHurricane -23.516 0.738 -31.9 < 2e-16 ***
## typeTropical Depression 12.202 0.820 14.9 < 2e-16 ***
## typeTropical Storm 3.743 0.734 5.1 3.7e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.4 on 2743 degrees of freedom
## Multiple R-squared: 0.561, Adjusted R-squared: 0.56
## F-statistic: 1.17e+03 on 3 and 2743 DF, p-value: <2e-16
summary(sub1$pressure)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 935 994 1000 998 1000 1010
summary(sub2$pressure)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 905 961 974 970 984 1000
A QQ plot is used in order to test the normality of the data. From the plots seen below, it can be seen that normal Q-Q plots returned a linear relationship between the air pressure and their theoretical quantities.
par(mfrow=c(1,1))
qqnorm(sub1$pressure, ylab="Tropical Storms Pressure", ylim=c(900,1020))
qqnorm(sub2$pressure, ylab="Hurricane Pressure", ylim=c(900,1020))
Shapiro-Wilk tests use the null hypothesis as a test of normality. As we can see, both of the p-values returned less than 0.1, meaning the population is normal.
# Shapiro-Wilk test of normality. Adequate if p < 0.1
shapiro.test(sub1$pressure)
##
## Shapiro-Wilk normality test
##
## data: sub1$pressure
## W = 0.8482, p-value < 2.2e-16
shapiro.test(sub1$pressure)
##
## Shapiro-Wilk normality test
##
## data: sub1$pressure
## W = 0.8482, p-value < 2.2e-16
No literature was used.
The data originated from the National Hurricane Center’s archive of Tropical Cyclone Reports (http://www.nhc.noaa.gov/). This dataset was hand-scraped from best track tables in the individual tropical cyclone reports (PDF, HTML and Microsoft Word) by Jon Hobbs and is publically available at: https://github.com/hadley/nasaweather.
The Tropical Cyclone Reports had a variety of storm type designations and there appeared to be no consistent naming convention for cyclones that were not hurricanes, tropical depressions, or tropical storms. Many of these designations have been combined into the “Extratropical” category in this dataset.