For this recipie, the storm data from the nasaweather package will be examined. Specifically, the data will be subsetted to view only the midnight (hour=0) and at noon (hour=12) observations and the wind speed will be examined.
To access the package and nasaweather datasets:
library("nasaweather", lib.loc="C:/Users/svoboa/Documents/R/win-library/3.1")
library(nasaweather)
Save storms data to workspace and create subsets:
storm<-storms
view(storm)
## Error: could not find function "view"
AM<-subset(storm,storm$hour=="0")
NOON<-subset(storm,storm$hour=="12")
** Note ** throughout this recipie, the AM subset refers to hour 0 or midnight and the NOON subset is noon or hour 12.
The factor in this experiment is hour with 0 and 12 as the levels being examined. Other factors in the storms dataset include year(levels= 1995-2000), month(levels= 6-12), name(Levels= Allison-Nadine), and type(levels= Tropical Depression, Tropical Storm, Hurricane, or Extratropica)
First and last 6 observations for the midnight and noon observations
head(AM)
## name year month day hour lat long pressure wind
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50
## 9 Allison 1995 6 5 0 27.6 -86.1 988 65
## 13 Allison 1995 6 6 0 31.8 -82.8 993 30
## 17 Allison 1995 6 7 0 35.6 -75.9 992 40
## 21 Allison 1995 6 8 0 41.0 -67.7 982 50
## type seasday
## 1 Tropical Depression 3
## 5 Tropical Storm 4
## 9 Hurricane 5
## 13 Tropical Depression 6
## 17 Extratropical 7
## 21 Extratropical 8
tail(AM)
## name year month day hour lat long pressure wind
## 2723 Michael 2000 10 18 0 30.4 -70.9 988 65
## 2727 Michael 2000 10 19 0 34.2 -67.8 983 75
## 2731 Michael 2000 10 20 0 48.0 -56.5 966 75
## 2737 Nadine 2000 10 20 0 28.7 -58.8 1008 30
## 2741 Nadine 2000 10 21 0 32.4 -55.2 999 50
## 2745 Nadine 2000 10 22 0 35.7 -50.5 1004 40
## type seasday
## 2723 Hurricane 140
## 2727 Hurricane 141
## 2731 Extratropical 142
## 2737 Tropical Depression 142
## 2741 Tropical Storm 143
## 2745 Extratropical 144
head(NOON)
## name year month day hour lat long pressure wind type
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical Storm
## 7 Allison 1995 6 4 12 24.7 -86.2 987 65 Hurricane
## 11 Allison 1995 6 5 12 29.6 -84.7 990 60 Tropical Storm
## 15 Allison 1995 6 6 12 33.6 -80.0 995 35 Extratropical
## 19 Allison 1995 6 7 12 38.5 -71.0 988 45 Extratropical
## 23 Allison 1995 6 8 12 43.8 -63.7 989 50 Extratropical
## seasday
## 3 3
## 7 4
## 11 5
## 15 6
## 19 7
## 23 8
tail(NOON)
## name year month day hour lat long pressure wind
## 2729 Michael 2000 10 19 12 39.8 -61.6 979 75
## 2733 Michael 2000 10 20 12 51.0 -53.5 968 65
## 2735 Nadine 2000 10 19 12 26.2 -59.9 1009 25
## 2739 Nadine 2000 10 20 12 30.4 -57.2 1003 35
## 2743 Nadine 2000 10 21 12 34.1 -52.3 1000 50
## 2747 Nadine 2000 10 22 12 39.0 -47.0 1005 35
## type seasday
## 2729 Hurricane 141
## 2733 Extratropical 142
## 2735 Tropical Depression 141
## 2739 Tropical Storm 142
## 2743 Tropical Storm 143
## 2747 Extratropical 144
The continuous variable that will be examined is wind or the storms maximum sustained wind speed measured in knots. Other continuous variables in the dataset are day, latitude and longitude, pressure and day of the hurricane season.
The response variable for this recipie will be the wind speed. If further analysis was done, one of the other continuous variables could be considered the response.
The storms dataset constains 2747 observations of 11 variables/factors. The data is organized chronologically. For the subsets (AM and NOON) they each have one observation everyday the storm exisited. After the storm ends, the next row of data corresponds to the first day of the next storm.
Structure of the storms dataset:
str(storm)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2747 obs. of 11 variables:
## $ name : chr "Allison" "Allison" "Allison" "Allison" ...
## $ year : int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
## $ month : int 6 6 6 6 6 6 6 6 6 6 ...
## $ day : int 3 3 3 3 4 4 4 4 5 5 ...
## $ hour : int 0 6 12 18 0 6 12 18 0 6 ...
## $ lat : num 17.4 18.3 19.3 20.6 22 23.3 24.7 26.2 27.6 28.5 ...
## $ long : num -84.3 -84.9 -85.7 -85.8 -86 -86.3 -86.2 -86.2 -86.1 -85.6 ...
## $ pressure: int 1005 1004 1003 1001 997 995 987 988 988 990 ...
## $ wind : int 30 30 35 40 50 60 65 65 65 60 ...
## $ type : chr "Tropical Depression" "Tropical Depression" "Tropical Storm" "Tropical Storm" ...
## $ seasday : int 3 3 3 3 4 4 4 4 5 5 ...
head(storm)
## name year month day hour lat long pressure wind type
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical Depression
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical Depression
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical Storm
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical Storm
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50 Tropical Storm
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical Storm
## seasday
## 1 3
## 2 3
## 3 3
## 4 3
## 5 4
## 6 4
tail(storm)
## name year month day hour lat long pressure wind type
## 2742 Nadine 2000 10 21 6 33.3 -53.5 1000 50 Tropical Storm
## 2743 Nadine 2000 10 21 12 34.1 -52.3 1000 50 Tropical Storm
## 2744 Nadine 2000 10 21 18 34.8 -51.3 1000 45 Tropical Storm
## 2745 Nadine 2000 10 22 0 35.7 -50.5 1004 40 Extratropical
## 2746 Nadine 2000 10 22 6 37.0 -49.0 1005 40 Extratropical
## 2747 Nadine 2000 10 22 12 39.0 -47.0 1005 35 Extratropical
## seasday
## 2742 143
## 2743 143
## 2744 143
## 2745 144
## 2746 144
## 2747 144
For more information about the storms dataset:
?storms
## starting httpd help server ... done
The Null Hypothesis is that there will be no significant difference in the wind speeds between the midnight and afternoon for the storms. The Alternate Hypothesis is that there is a significant difference in wind speed means between the midnight and afternoon measurement.
This will be tested by comparing the two subsets of data that were created in section 1 (AM and NOON). Once the necessary tests are performed, the assumptions will be checked to ensure a valid conclusion is reached.
The data points in the midnight (hour=0) and at noon (hour=12) were the subsets of observations choosen since they are at two opposite times of the day and the goal is to determine if the wind speed is significanly different between these two times; in other words, does the time of day cause the wind speed of storms to change?
The data collected in the storms data set is not random, but collected every 6 hours that the particular storm was at tropical storm status or higher up until it fell back down below this status.
There are not replicates as the observations are of natural occurrences that can not be replicated by the experimenters. There are repeated measures as all storms that were “named”“ from 1995 until 2000 were recorded at the same time intervals.
Blocking was not required in creating the storms dataset.
Summary Statistics for entire storms dataset(all hours):
summary(storm)
## name year month day
## Length:2747 Min. :1995 Min. : 6.0 Min. : 1
## Class :character 1st Qu.:1995 1st Qu.: 8.0 1st Qu.: 9
## Mode :character Median :1997 Median : 9.0 Median :18
## Mean :1997 Mean : 8.8 Mean :17
## 3rd Qu.:1999 3rd Qu.:10.0 3rd Qu.:25
## Max. :2000 Max. :12.0 Max. :31
## hour lat long pressure
## Min. : 0.00 Min. : 8.3 Min. :-107.3 Min. : 905
## 1st Qu.: 3.50 1st Qu.:17.2 1st Qu.: -77.6 1st Qu.: 980
## Median :12.00 Median :25.0 Median : -60.9 Median : 995
## Mean : 9.06 Mean :26.7 Mean : -60.9 Mean : 990
## 3rd Qu.:18.00 3rd Qu.:33.9 3rd Qu.: -45.8 3rd Qu.:1004
## Max. :18.00 Max. :70.7 Max. : 1.0 Max. :1019
## wind type seasday
## Min. : 15.0 Length:2747 Min. : 3
## 1st Qu.: 35.0 Class :character 1st Qu.: 84
## Median : 50.0 Mode :character Median :103
## Mean : 54.7 Mean :103
## 3rd Qu.: 70.0 3rd Qu.:125
## Max. :155.0 Max. :185
Summary Statistics for each time subset (midnight and noon):
summary(AM)
## name year month day hour
## Length:686 Min. :1995 Min. : 6.0 Min. : 1 Min. :0
## Class :character 1st Qu.:1995 1st Qu.: 8.0 1st Qu.: 9 1st Qu.:0
## Mode :character Median :1997 Median : 9.0 Median :18 Median :0
## Mean :1997 Mean : 8.8 Mean :17 Mean :0
## 3rd Qu.:1999 3rd Qu.:10.0 3rd Qu.:25 3rd Qu.:0
## Max. :2000 Max. :12.0 Max. :31 Max. :0
## lat long pressure wind
## Min. : 8.4 Min. :-107.3 Min. : 910 Min. : 15.0
## 1st Qu.:17.1 1st Qu.: -77.2 1st Qu.: 980 1st Qu.: 35.0
## Median :24.7 Median : -60.7 Median : 995 Median : 50.0
## Mean :26.6 Mean : -60.8 Mean : 990 Mean : 54.6
## 3rd Qu.:34.0 3rd Qu.: -45.8 3rd Qu.:1004 3rd Qu.: 70.0
## Max. :69.0 Max. : 1.0 Max. :1017 Max. :155.0
## type seasday
## Length:686 Min. : 3
## Class :character 1st Qu.: 85
## Mode :character Median :103
## Mean :103
## 3rd Qu.:125
## Max. :185
summary(NOON)
## name year month day
## Length:691 Min. :1995 Min. : 6.00 Min. : 1
## Class :character 1st Qu.:1995 1st Qu.: 8.00 1st Qu.: 9
## Mode :character Median :1997 Median : 9.00 Median :18
## Mean :1997 Mean : 8.81 Mean :17
## 3rd Qu.:1999 3rd Qu.:10.00 3rd Qu.:25
## Max. :2000 Max. :12.00 Max. :31
## hour lat long pressure
## Min. :12 Min. : 8.6 Min. :-104.0 Min. : 914
## 1st Qu.:12 1st Qu.:17.3 1st Qu.: -77.5 1st Qu.: 980
## Median :12 Median :25.0 Median : -60.9 Median : 995
## Mean :12 Mean :26.7 Mean : -60.8 Mean : 990
## 3rd Qu.:12 3rd Qu.:33.9 3rd Qu.: -45.8 3rd Qu.:1004
## Max. :12 Max. :65.5 Max. : -4.4 Max. :1019
## wind type seasday
## Min. : 15.0 Length:691 Min. : 3
## 1st Qu.: 35.0 Class :character 1st Qu.: 85
## Median : 50.0 Mode :character Median :103
## Mean : 54.5 Mean :103
## 3rd Qu.: 70.0 3rd Qu.:125
## Max. :150.0 Max. :185
Mean wind speed for each time:
mean(AM$wind)
## [1] 54.63
mean(NOON$wind)
## [1] 54.54
Boxplots:
par(mfrow=c(1,2))
boxplot(AM$wind, main="AM Wind Speed")
boxplot(NOON$wind, main="NOON Wind Speed")
Initial Analysis: From the graphs and descriptive summary, the wind speeds of the two groups (AM and NOON) appear very similar but tests should be done to see if the differences are significant.
The hypothesis will first be tested with a t-test. The null hypothesis for this test is that there will be no difference in means between the two groups being tested.
T-Test:
t.test(AM$wind, NOON$wind)
##
## Welch Two Sample t-test
##
## data: AM$wind and NOON$wind
## t = 0.0656, df = 1375, p-value = 0.9477
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.642 2.825
## sample estimates:
## mean of x mean of y
## 54.63 54.54
Since the p-value is large and greater than the significance level (.05), we fail to reject the null hypothesis.
Visually inspect normality of data:
par(mfrow=c(1,1))
qqnorm(storm$wind)
qqline(storm$wind)
Check that the data follows the normailty assumption of the t-test with a shapiro-wilks normailty test:
shapiro.test(AM$wind)
##
## Shapiro-Wilk normality test
##
## data: AM$wind
## W = 0.9269, p-value < 2.2e-16
shapiro.test(NOON$wind)
##
## Shapiro-Wilk normality test
##
## data: NOON$wind
## W = 0.9263, p-value < 2.2e-16
We reject the null hypothesis that the data is normall since the p-values are less than the significance level. This violates the assumptions of the t-test used above. See section 5: Contingencies for
None used.
Since the normality assumption of the t-test run in section 3 was violated, a different test must be performed to truely test the difference of the wind speeds at the two times. The wilcoxon ranked sum test does not assume normality and therefore can be used:
wilcox.test(AM$wind, NOON$wind)
##
## Wilcoxon rank sum test with continuity correction
##
## data: AM$wind and NOON$wind
## W = 237709, p-value = 0.9247
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the significance level so we fail to reject the null hypothesis that there is no true difference between the two groups.
We cannot prove that the time of day causes a difference in wind speed observed for storms.
https://github.com/hadley/nasaweather
All included above.