Recipe 1: Example of Descriptive Statistics

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Recipes for the Design of Experiments: Recipe Outline

as of August 28, 2014, superceding the version of August 24. Always use the most recent version.

Tropical Storm and Hurricane Pressure Analysis

Caroline Hsia

Rensselaer Polytechnic Institute

September 18, 2014 V1

1. Setting

System under test

This study takes a look at storm data from National Hurricane Center. It tracks different tropical cyclones through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005. It includes various metadata about each storm including name, year, month, date, hour, latitude, longitude, type, air pressure, maximum wind speeds, and day of the hurricane season. This specific recipe will be taking a look at the air pressure at the storm’s center (in millibars) for different types of storms.

remove(list=ls())
install.packages("nasaweather", repos='http://cran.us.r-project.org')
## package 'nasaweather' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Caroline\AppData\Local\Temp\RtmpwjOvpn\downloaded_packages
require(nasaweather)
## Loading required package: nasaweather
#library("nasaweather", lib.loc="~/R/win-library/3.0/")
x<-storms
attach(storms)
## The following object is masked from package:datasets:
## 
##     pressure
head(storms)
##      name year month day hour  lat  long pressure wind                type
## 1 Allison 1995     6   3    0 17.4 -84.3     1005   30 Tropical Depression
## 2 Allison 1995     6   3    6 18.3 -84.9     1004   30 Tropical Depression
## 3 Allison 1995     6   3   12 19.3 -85.7     1003   35      Tropical Storm
## 4 Allison 1995     6   3   18 20.6 -85.8     1001   40      Tropical Storm
## 5 Allison 1995     6   4    0 22.0 -86.0      997   50      Tropical Storm
## 6 Allison 1995     6   4    6 23.3 -86.3      995   60      Tropical Storm
##   seasday
## 1       3
## 2       3
## 3       3
## 4       3
## 5       4
## 6       4
str(storms)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2747 obs. of  11 variables:
##  $ name    : chr  "Allison" "Allison" "Allison" "Allison" ...
##  $ year    : int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
##  $ month   : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ day     : int  3 3 3 3 4 4 4 4 5 5 ...
##  $ hour    : int  0 6 12 18 0 6 12 18 0 6 ...
##  $ lat     : num  17.4 18.3 19.3 20.6 22 23.3 24.7 26.2 27.6 28.5 ...
##  $ long    : num  -84.3 -84.9 -85.7 -85.8 -86 -86.3 -86.2 -86.2 -86.1 -85.6 ...
##  $ pressure: int  1005 1004 1003 1001 997 995 987 988 988 990 ...
##  $ wind    : int  30 30 35 40 50 60 65 65 65 60 ...
##  $ type    : chr  "Tropical Depression" "Tropical Depression" "Tropical Storm" "Tropical Storm" ...
##  $ seasday : int  3 3 3 3 4 4 4 4 5 5 ...

Factors and Levels

The factor that was used in this analysis was Storm Type. The levels analyzed were Tropical Storms and Hurricanes. The other levels in this factor were Extratropical and Tropical Depression.

head(x)
##      name year month day hour  lat  long pressure wind                type
## 1 Allison 1995     6   3    0 17.4 -84.3     1005   30 Tropical Depression
## 2 Allison 1995     6   3    6 18.3 -84.9     1004   30 Tropical Depression
## 3 Allison 1995     6   3   12 19.3 -85.7     1003   35      Tropical Storm
## 4 Allison 1995     6   3   18 20.6 -85.8     1001   40      Tropical Storm
## 5 Allison 1995     6   4    0 22.0 -86.0      997   50      Tropical Storm
## 6 Allison 1995     6   4    6 23.3 -86.3      995   60      Tropical Storm
##   seasday
## 1       3
## 2       3
## 3       3
## 4       3
## 5       4
## 6       4
tail(x)
##        name year month day hour  lat  long pressure wind           type
## 2742 Nadine 2000    10  21    6 33.3 -53.5     1000   50 Tropical Storm
## 2743 Nadine 2000    10  21   12 34.1 -52.3     1000   50 Tropical Storm
## 2744 Nadine 2000    10  21   18 34.8 -51.3     1000   45 Tropical Storm
## 2745 Nadine 2000    10  22    0 35.7 -50.5     1004   40  Extratropical
## 2746 Nadine 2000    10  22    6 37.0 -49.0     1005   40  Extratropical
## 2747 Nadine 2000    10  22   12 39.0 -47.0     1005   35  Extratropical
##      seasday
## 2742     143
## 2743     143
## 2744     143
## 2745     144
## 2746     144
## 2747     144
summary(x)
##      name                year          month           day    
##  Length:2747        Min.   :1995   Min.   : 6.0   Min.   : 1  
##  Class :character   1st Qu.:1995   1st Qu.: 8.0   1st Qu.: 9  
##  Mode  :character   Median :1997   Median : 9.0   Median :18  
##                     Mean   :1997   Mean   : 8.8   Mean   :17  
##                     3rd Qu.:1999   3rd Qu.:10.0   3rd Qu.:25  
##                     Max.   :2000   Max.   :12.0   Max.   :31  
##       hour            lat            long           pressure   
##  Min.   : 0.00   Min.   : 8.3   Min.   :-107.3   Min.   : 905  
##  1st Qu.: 3.50   1st Qu.:17.2   1st Qu.: -77.6   1st Qu.: 980  
##  Median :12.00   Median :25.0   Median : -60.9   Median : 995  
##  Mean   : 9.06   Mean   :26.7   Mean   : -60.9   Mean   : 990  
##  3rd Qu.:18.00   3rd Qu.:33.9   3rd Qu.: -45.8   3rd Qu.:1004  
##  Max.   :18.00   Max.   :70.7   Max.   :   1.0   Max.   :1019  
##       wind           type              seasday   
##  Min.   : 15.0   Length:2747        Min.   :  3  
##  1st Qu.: 35.0   Class :character   1st Qu.: 84  
##  Median : 50.0   Mode  :character   Median :103  
##  Mean   : 54.7                      Mean   :103  
##  3rd Qu.: 70.0                      3rd Qu.:125  
##  Max.   :155.0                      Max.   :185

Continuous variables (if any)

The continuous variables in this dataset are longitude, latitude, air pressure, and wind speed.

Response variables

The response variables in this dataset are air pressure and wind speed.

The Data: How is it organized and what does it look like?

The data from ‘storms’ describes data about the tropical cyclones that are tracked through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005. The information about storms include various metadata about each storm including name, year, month, date, hour, latitude, longitude, type, air pressure, maximum wind speeds, and day of the hurricane season. There are four levels to type factor which includes Extratropical, Tropical Depression, Hurricane, and Tropical Storm.

Randomization

This data originated from the National Hurricane Center’s archive of Tropical Cyclone reports, handscraped from track tables of individual tropical cyclone reports. We can assume that this data was collected using proper randomization techniques.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

This data was publically available for anyone to use and perform analysis on. In this analysis, I will be testing data in order to see if there is a different in air pressure at the center of the storm for two different types of storms: Tropical Storms and Hurricanes. A two-sample t-test will be performed in order to determine if there was a difference between the means.

The null hypothesis that will be tested is:

The mean air pressure for Tropical Storms is equal to the mean air pressure for Hurricanes.

What is the rationale for this design?

The rationale for the collection of data was just for the National Hurricane Center to gather information on the tropical cyclones that travel through the Atlantic Ocean, Carribean Sea, and Gulf of Mexico from 1995 to 2005.

Randomize: What is the Randomization Scheme?

This data was collected with no intention, just for data collection.

Replicate: Are there replicates and/or repeated measures?

No, there were no replicates or repeated measures.

Block: Did you use blocking in the design?

The original dataset was organized without experimental groups, with measurements recorded based on certain variables. For this analysis, the data from two different types of Tropical cyclones were used: Hurricans and Tropical Storms.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

# Logical vector identifying all Tropical Storms
sub1 <- subset(x, x$type =='Tropical Storm')
is.data.frame(sub1)
## [1] TRUE
summary (sub1)
##      name                year          month            day      
##  Length:926         Min.   :1995   Min.   : 6.00   Min.   : 1.0  
##  Class :character   1st Qu.:1995   1st Qu.: 8.00   1st Qu.: 9.0  
##  Mode  :character   Median :1997   Median : 9.00   Median :18.0  
##                     Mean   :1997   Mean   : 8.72   Mean   :16.9  
##                     3rd Qu.:1999   3rd Qu.:10.00   3rd Qu.:24.0  
##                     Max.   :2000   Max.   :12.00   Max.   :31.0  
##       hour            lat            long          pressure   
##  Min.   : 0.00   Min.   :10.2   Min.   :-99.3   Min.   : 935  
##  1st Qu.: 6.00   1st Qu.:16.0   1st Qu.:-76.1   1st Qu.: 994  
##  Median : 6.00   Median :22.0   Median :-60.8   Median :1000  
##  Mean   : 8.97   Mean   :23.9   Mean   :-61.7   Mean   : 998  
##  3rd Qu.:12.00   3rd Qu.:30.5   3rd Qu.:-46.1   3rd Qu.:1004  
##  Max.   :18.00   Max.   :50.7   Max.   :-19.4   Max.   :1013  
##       wind           type              seasday   
##  Min.   : 35.0   Length:926         Min.   :  3  
##  1st Qu.: 40.0   Class :character   1st Qu.: 81  
##  Median : 45.0   Mode  :character   Median :101  
##  Mean   : 47.3                      Mean   :100  
##  3rd Qu.: 55.0                      3rd Qu.:127  
##  Max.   :120.0                      Max.   :184
# histogram of the pressure of all Tropical Storms
hist(sub1$pressure,xlim=c(900,1020),ylim=c(0,400))

plot of chunk unnamed-chunk-4

# Logical vector identifying all Hurricanes
sub2 <-subset(x, x$type =='Hurricane')
is.data.frame(sub2)
## [1] TRUE
summary (sub2)
##      name                year          month            day      
##  Length:896         Min.   :1995   Min.   : 6.00   Min.   : 1.0  
##  Class :character   1st Qu.:1995   1st Qu.: 8.00   1st Qu.:12.0  
##  Mode  :character   Median :1998   Median : 9.00   Median :20.0  
##                     Mean   :1997   Mean   : 8.82   Mean   :18.6  
##                     3rd Qu.:1999   3rd Qu.: 9.00   3rd Qu.:26.0  
##                     Max.   :2000   Max.   :12.00   Max.   :31.0  
##       hour            lat            long          pressure   
##  Min.   : 0.00   Min.   :10.5   Min.   :-97.9   Min.   : 905  
##  1st Qu.: 6.00   1st Qu.:18.0   1st Qu.:-73.1   1st Qu.: 961  
##  Median :12.00   Median :24.2   Median :-61.4   Median : 974  
##  Mean   : 9.05   Mean   :25.1   Mean   :-61.9   Mean   : 970  
##  3rd Qu.:18.00   3rd Qu.:31.3   3rd Qu.:-49.1   3rd Qu.: 984  
##  Max.   :18.00   Max.   :48.3   Max.   :-25.2   Max.   :1005  
##       wind           type              seasday   
##  Min.   : 65.0   Length:896         Min.   :  4  
##  1st Qu.: 70.0   Class :character   1st Qu.: 88  
##  Median : 80.0   Mode  :character   Median :102  
##  Mean   : 84.7                      Mean   :105  
##  3rd Qu.: 95.0                      3rd Qu.:120  
##  Max.   :155.0                      Max.   :184
# histogram of the pressure of all Hurricanes
hist(sub2$pressure,xlim=c(900,1020),ylim=c(0,400))

plot of chunk unnamed-chunk-5

par(mfrow=c(2,1))
hist(sub1$pressure,xlim=c(900,1020),ylim=c(0,400))
hist(sub2$pressure,xlim=c(900,1020),ylim=c(0,400))

plot of chunk unnamed-chunk-6

#Here we will just look at the boxplots to see the data from a different view.
par(mfrow=c(1,2))
boxplot(sub1$pressure, ylim = c(900,1020), main = "Tropical Storm pressure", ylab = "Pressure (in millibars)")
boxplot(sub2$pressure, ylim = c(900,1020), main = "Hurricane pressure", ylab = "Pressure (in millibars)")

plot of chunk unnamed-chunk-7

Testing

A two sample t-test will be performed to test the above hypothesis.

t.test(sub1$pressure, sub2$pressure,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  sub1$pressure and sub2$pressure
## t = 43.12, df = 1820, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  26.02 28.50
## sample estimates:
## mean of x mean of y 
##     997.7     970.4

From the t-test, it is clear that we can reject the null hypothesis that the mean air pressure of Tropical Storms is equal to the mean air pressure of Hurricanes.

Estimation (of Parameters)

Next, an estimation of parameters is performed in order to help estimate the population. A regression test is performed in order to estimate the correlation of the type of storms and the air pressure at the center of the storms.

fit <- lm(pressure~type, x)
summary(fit)
## 
## Call:
## lm(formula = pressure ~ type, data = x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -65.44  -4.70   1.56   7.30  34.56 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              993.954      0.611  1627.1  < 2e-16 ***
## typeHurricane            -23.516      0.738   -31.9  < 2e-16 ***
## typeTropical Depression   12.202      0.820    14.9  < 2e-16 ***
## typeTropical Storm         3.743      0.734     5.1  3.7e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.4 on 2743 degrees of freedom
## Multiple R-squared:  0.561,  Adjusted R-squared:  0.56 
## F-statistic: 1.17e+03 on 3 and 2743 DF,  p-value: <2e-16
summary(sub1$pressure)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     935     994    1000     998    1000    1010
summary(sub2$pressure)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     905     961     974     970     984    1000

Diagnostics/Model Adequacy Checking

A QQ plot is used in order to test the normality of the data. From the plots seen below, it can be seen that normal Q-Q plots returned a linear relationship between the air pressure and their theoretical quantities.

par(mfrow=c(1,1))
qqnorm(sub1$pressure, ylab="Tropical Storms Pressure", ylim=c(900,1020))

plot of chunk unnamed-chunk-12

qqnorm(sub2$pressure, ylab="Hurricane Pressure", ylim=c(900,1020))

plot of chunk unnamed-chunk-12

Shapiro-Wilk tests use the null hypothesis as a test of normality. As we can see, both of the p-values returned less than 0.1, meaning the population is normal.

# Shapiro-Wilk test of normality.  Adequate if p < 0.1
shapiro.test(sub1$pressure)
## 
##  Shapiro-Wilk normality test
## 
## data:  sub1$pressure
## W = 0.8482, p-value < 2.2e-16
shapiro.test(sub1$pressure)
## 
##  Shapiro-Wilk normality test
## 
## data:  sub1$pressure
## W = 0.8482, p-value < 2.2e-16

4. References to the literature

No literature was used.

5. Appendices

A summary of, or pointer to, the raw data

complete and documented R code

The data originated from the National Hurricane Center’s archive of Tropical Cyclone Reports (http://www.nhc.noaa.gov/). This dataset was hand-scraped from best track tables in the individual tropical cyclone reports (PDF, HTML and Microsoft Word) by Jon Hobbs and is publically available at: https://github.com/hadley/nasaweather.

The Tropical Cyclone Reports had a variety of storm type designations and there appeared to be no consistent naming convention for cyclones that were not hurricanes, tropical depressions, or tropical storms. Many of these designations have been combined into the “Extratropical” category in this dataset.