- Yue Wu
- YinChia Huang
- Francesco Ignazio Re
The purpose of this project is for us to exercise, extend, and demonstrate our abilities to use the ideas and tools of the Tidyverse. We are assigned data from NOAA Weather Station buoy 46035 at 57.026 N 177.738 W in the NOAA National Data Buoy Center. Our goal is to obtain data, clean, organize, and explore it.
We only focused on Air Temperature and Sea Temperature from 1988 to 2017. However, data of 2012 and 2013 are missing, which we replaced by the data of nearby buoy 46070, assuming their temperatures should be pretty close. In addition to 2012 and 2013, there are a few missing values in other years. Every missing data has value 99.0 or 999.0. As a result, we deleted them, otherwise they would be regarded as outliers.
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## ## Welch Two Sample t-test ## ## data: MR_1988VS2017_DailyNoon["ATMP1988"] and MR_1988VS2017_DailyNoon["ATMP2017"] ## t = -7.4843, df = 701.29, p-value = 2.162e-13 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -3.012741 -1.760561 ## sample estimates: ## mean of x mean of y ## 2.338997 4.725648
p=0 < 0.05, we can reject the null hypothesis that daily Air Temperature recorded at noonin in 1988 and 2017 have the same mean
## ## Welch Two Sample t-test ## ## data: MR_1988VS2017_DailyNoon["WTMP1988"] and MR_1988VS2017_DailyNoon["WTMP2017"] ## t = -7.2687, df = 695.59, p-value = 9.78e-13 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.999515 -1.149048 ## sample estimates: ## mean of x mean of y ## 4.572981 6.147262
p=0 < 0.05, we can reject the null hypothesis that daily Water Temperature recorded at noonin in 1988 and 2017 have the same mean
Method 1: Test the difference based on choice of time of the day Ramdonly chose 20:00 to test
## ## Welch Two Sample t-test ## ## data: MR_NoonVS20["ATMP12"] and MR_NoonVS20["ATMP20"] ## t = -0.98177, df = 17132, p-value = 0.3262 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.19661040 0.06538341 ## sample estimates: ## mean of x mean of y ## 3.107325 3.172939
p=0.3262 > 0.05, we cannot reject the null hypothesis that daily noon and 20:00 air tmp have the same mean
## ## Welch Two Sample t-test ## ## data: MR_NoonVS20["WTMP12"] and MR_NoonVS20["WTMP20"] ## t = -0.28812, df = 17142, p-value = 0.7733 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.10051075 0.07474858 ## sample estimates: ## mean of x mean of y ## 5.037255 5.050136
p=0.7733 > 0.05, we cannot reject the null hypothesis that daily noon and 20:00 water tmp have the same mean
Method 2: Prove that choice of the hour won't effect the test result
## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD") ## Joining, by = c("YYYY", "MM", "DD")
## ATMP00 WTMP00 ATMP02 WTMP02 ## Min. :-11.100 Min. : 0.000 Min. :-11.100 Min. : 0.900 ## 1st Qu.: 7.700 1st Qu.: 8.600 1st Qu.: 7.800 1st Qu.: 8.700 ## Median : 7.700 Median : 8.600 Median : 7.800 Median : 8.700 ## Mean : 7.624 Mean : 8.537 Mean : 7.726 Mean : 8.638 ## 3rd Qu.: 7.700 3rd Qu.: 8.600 3rd Qu.: 7.800 3rd Qu.: 8.700 ## Max. : 13.100 Max. :12.900 Max. : 12.800 Max. :12.900 ## NA's :324 NA's :324 ## ATMP04 WTMP04 ATMP06 WTMP06 ## Min. :-11.100 Min. :-0.700 Min. :-11.700 Min. : 0.300 ## 1st Qu.: 7.900 1st Qu.: 8.700 1st Qu.: 7.900 1st Qu.: 8.700 ## Median : 7.900 Median : 8.700 Median : 7.900 Median : 8.700 ## Mean : 7.823 Mean : 8.638 Mean : 7.821 Mean : 8.637 ## 3rd Qu.: 7.900 3rd Qu.: 8.700 3rd Qu.: 7.900 3rd Qu.: 8.700 ## Max. : 12.800 Max. :13.000 Max. : 12.800 Max. :12.900 ## NA's :302 NA's :302 NA's :230 NA's :230 ## ATMP08 WTMP08 ATMP10 WTMP10 ## Min. :-11.700 Min. : 0.400 Min. :-11.500 Min. : 0.500 ## 1st Qu.: 7.900 1st Qu.: 8.600 1st Qu.: 8.300 1st Qu.: 8.600 ## Median : 7.900 Median : 8.600 Median : 8.300 Median : 8.600 ## Mean : 7.821 Mean : 8.539 Mean : 8.214 Mean : 8.539 ## 3rd Qu.: 7.900 3rd Qu.: 8.600 3rd Qu.: 8.300 3rd Qu.: 8.600 ## Max. : 12.600 Max. :12.800 Max. : 12.400 Max. :12.700 ## NA's :339 NA's :339 NA's :333 NA's :333 ## ATMP12 WTMP12 ATMP14 WTMP14 ## Min. :-11.300 Min. : 0.400 Min. :-11.400 Min. : 0.10 ## 1st Qu.: 8.300 1st Qu.: 8.600 1st Qu.: 8.200 1st Qu.: 7.90 ## Median : 8.300 Median : 8.600 Median : 8.200 Median : 7.90 ## Mean : 8.213 Mean : 8.538 Mean : 8.115 Mean : 7.85 ## 3rd Qu.: 8.300 3rd Qu.: 8.600 3rd Qu.: 8.200 3rd Qu.: 7.90 ## Max. : 12.400 Max. :12.700 Max. : 12.400 Max. :12.60 ## NA's :240 NA's :240 NA's :320 NA's :320 ## ATMP16 WTMP16 ATMP18 WTMP18 ## Min. :-11.40 Min. : 0.90 Min. :-11.300 Min. : 0.800 ## 1st Qu.: 7.90 1st Qu.: 7.90 1st Qu.: 8.200 1st Qu.: 7.500 ## Median : 7.90 Median : 7.90 Median : 8.200 Median : 7.500 ## Mean : 7.82 Mean : 7.85 Mean : 8.114 Mean : 7.456 ## 3rd Qu.: 7.90 3rd Qu.: 7.90 3rd Qu.: 8.200 3rd Qu.: 7.500 ## Max. : 12.60 Max. :12.60 Max. : 12.300 Max. :12.700 ## NA's :332 NA's :332 NA's :256 NA's :256 ## ATMP20 WTMP20 ATMP22 WTMP22 ## Min. :-11.200 Min. : 0.700 Min. :-11.300 Min. : 0.600 ## 1st Qu.: 7.900 1st Qu.: 7.500 1st Qu.: 8.200 1st Qu.: 7.500 ## Median : 7.900 Median : 7.500 Median : 8.200 Median : 7.500 ## Mean : 7.821 Mean : 7.457 Mean : 8.117 Mean : 7.457 ## 3rd Qu.: 7.900 3rd Qu.: 7.500 3rd Qu.: 8.200 3rd Qu.: 7.500 ## Max. : 12.500 Max. :12.600 Max. : 12.600 Max. :12.600 ## NA's :325 NA's :325 NA's :338 NA's :338
Method 3. Test the difference if we use all daily data, instead of specific hours per day
## ## Welch Two Sample t-test ## ## data: MR_1988VS2017["ATMP1988"] and MR_1988VS2017["ATMP2017"] ## t = -36.513, df = 16842, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -2.519929 -2.263162 ## sample estimates: ## mean of x mean of y ## 2.450989 4.842534
p=0 < 0.05, we can reject the null hypothesis that Air Temperature in 1988 and 2017 have the same mean
## ## Welch Two Sample t-test ## ## data: MR_1988VS2017["WTMP1988"] and MR_1988VS2017["WTMP2017"] ## t = -35.351, df = 16708, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.657566 -1.483408 ## sample estimates: ## mean of x mean of y ## 4.611994 6.182481
p=0 < 0.05, we can reject the null hypothesis that Water Temperature in 1988 and 2017 have the same mean
(1)ATMP at different times of a day have similar distribution (2)WTMP at different times of a day have similar distribution (3)Choice of the hour during a day does not make a difference