Team members

  • Yue Wu
  • YinChia Huang
  • Francesco Ignazio Re

Outline

  • Introduction
  • Data cleaning
  • Data visualization
  • Statistic Analysis

Introduction

The purpose of this project is for us to exercise, extend, and demonstrate our abilities to use the ideas and tools of the Tidyverse. We are assigned data from NOAA Weather Station buoy 46035 at 57.026 N 177.738 W in the NOAA National Data Buoy Center. Our goal is to obtain data, clean, organize, and explore it.

Data cleaning

We only focused on Air Temperature and Sea Temperature from 1988 to 2017. However, data of 2012 and 2013 are missing, which we replaced by the data of nearby buoy 46070, assuming their temperatures should be pretty close. In addition to 2012 and 2013, there are a few missing values in other years. Every missing data has value 99.0 or 999.0. As a result, we deleted them, otherwise they would be regarded as outliers.

Data visualization

Has the mean temperature changed over the past 30 years?

## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.

Statistical Test- Air temperature :1988_noon V.S 2017_noon

## 
##  Welch Two Sample t-test
## 
## data:  MR_1988VS2017_DailyNoon["ATMP1988"] and MR_1988VS2017_DailyNoon["ATMP2017"]
## t = -7.4843, df = 701.29, p-value = 2.162e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.012741 -1.760561
## sample estimates:
## mean of x mean of y 
##  2.338997  4.725648

p=0 < 0.05, we can reject the null hypothesis that daily Air Temperature recorded at noonin in 1988 and 2017 have the same mean

Statistical Test - Water temperature :1988_noon V.S 2017_noon

## 
##  Welch Two Sample t-test
## 
## data:  MR_1988VS2017_DailyNoon["WTMP1988"] and MR_1988VS2017_DailyNoon["WTMP2017"]
## t = -7.2687, df = 695.59, p-value = 9.78e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.999515 -1.149048
## sample estimates:
## mean of x mean of y 
##  4.572981  6.147262

p=0 < 0.05, we can reject the null hypothesis that daily Water Temperature recorded at noonin in 1988 and 2017 have the same mean

Conclusion : There are significant changes in the past 30 years

Since we only use one sample per day day out of 24 daily hourly temperature readings. Has your sampling affected your evaluation of temperature change? In what way? Explain and demonstrate

Statistical Test - air temperature

Method 1: Test the difference based on choice of time of the day Ramdonly chose 20:00 to test

## 
##  Welch Two Sample t-test
## 
## data:  MR_NoonVS20["ATMP12"] and MR_NoonVS20["ATMP20"]
## t = -0.98177, df = 17132, p-value = 0.3262
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.19661040  0.06538341
## sample estimates:
## mean of x mean of y 
##  3.107325  3.172939

p=0.3262 > 0.05, we cannot reject the null hypothesis that daily noon and 20:00 air tmp have the same mean

Statistical Test - water temperature

## 
##  Welch Two Sample t-test
## 
## data:  MR_NoonVS20["WTMP12"] and MR_NoonVS20["WTMP20"]
## t = -0.28812, df = 17142, p-value = 0.7733
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.10051075  0.07474858
## sample estimates:
## mean of x mean of y 
##  5.037255  5.050136

p=0.7733 > 0.05, we cannot reject the null hypothesis that daily noon and 20:00 water tmp have the same mean

Statistical Test

Method 2: Prove that choice of the hour won't effect the test result

## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
## Joining, by = c("YYYY", "MM", "DD")
##      ATMP00            WTMP00           ATMP02            WTMP02      
##  Min.   :-11.100   Min.   : 0.000   Min.   :-11.100   Min.   : 0.900  
##  1st Qu.:  7.700   1st Qu.: 8.600   1st Qu.:  7.800   1st Qu.: 8.700  
##  Median :  7.700   Median : 8.600   Median :  7.800   Median : 8.700  
##  Mean   :  7.624   Mean   : 8.537   Mean   :  7.726   Mean   : 8.638  
##  3rd Qu.:  7.700   3rd Qu.: 8.600   3rd Qu.:  7.800   3rd Qu.: 8.700  
##  Max.   : 13.100   Max.   :12.900   Max.   : 12.800   Max.   :12.900  
##                                     NA's   :324       NA's   :324     
##      ATMP04            WTMP04           ATMP06            WTMP06      
##  Min.   :-11.100   Min.   :-0.700   Min.   :-11.700   Min.   : 0.300  
##  1st Qu.:  7.900   1st Qu.: 8.700   1st Qu.:  7.900   1st Qu.: 8.700  
##  Median :  7.900   Median : 8.700   Median :  7.900   Median : 8.700  
##  Mean   :  7.823   Mean   : 8.638   Mean   :  7.821   Mean   : 8.637  
##  3rd Qu.:  7.900   3rd Qu.: 8.700   3rd Qu.:  7.900   3rd Qu.: 8.700  
##  Max.   : 12.800   Max.   :13.000   Max.   : 12.800   Max.   :12.900  
##  NA's   :302       NA's   :302      NA's   :230       NA's   :230     
##      ATMP08            WTMP08           ATMP10            WTMP10      
##  Min.   :-11.700   Min.   : 0.400   Min.   :-11.500   Min.   : 0.500  
##  1st Qu.:  7.900   1st Qu.: 8.600   1st Qu.:  8.300   1st Qu.: 8.600  
##  Median :  7.900   Median : 8.600   Median :  8.300   Median : 8.600  
##  Mean   :  7.821   Mean   : 8.539   Mean   :  8.214   Mean   : 8.539  
##  3rd Qu.:  7.900   3rd Qu.: 8.600   3rd Qu.:  8.300   3rd Qu.: 8.600  
##  Max.   : 12.600   Max.   :12.800   Max.   : 12.400   Max.   :12.700  
##  NA's   :339       NA's   :339      NA's   :333       NA's   :333     
##      ATMP12            WTMP12           ATMP14            WTMP14     
##  Min.   :-11.300   Min.   : 0.400   Min.   :-11.400   Min.   : 0.10  
##  1st Qu.:  8.300   1st Qu.: 8.600   1st Qu.:  8.200   1st Qu.: 7.90  
##  Median :  8.300   Median : 8.600   Median :  8.200   Median : 7.90  
##  Mean   :  8.213   Mean   : 8.538   Mean   :  8.115   Mean   : 7.85  
##  3rd Qu.:  8.300   3rd Qu.: 8.600   3rd Qu.:  8.200   3rd Qu.: 7.90  
##  Max.   : 12.400   Max.   :12.700   Max.   : 12.400   Max.   :12.60  
##  NA's   :240       NA's   :240      NA's   :320       NA's   :320    
##      ATMP16           WTMP16          ATMP18            WTMP18      
##  Min.   :-11.40   Min.   : 0.90   Min.   :-11.300   Min.   : 0.800  
##  1st Qu.:  7.90   1st Qu.: 7.90   1st Qu.:  8.200   1st Qu.: 7.500  
##  Median :  7.90   Median : 7.90   Median :  8.200   Median : 7.500  
##  Mean   :  7.82   Mean   : 7.85   Mean   :  8.114   Mean   : 7.456  
##  3rd Qu.:  7.90   3rd Qu.: 7.90   3rd Qu.:  8.200   3rd Qu.: 7.500  
##  Max.   : 12.60   Max.   :12.60   Max.   : 12.300   Max.   :12.700  
##  NA's   :332      NA's   :332     NA's   :256       NA's   :256     
##      ATMP20            WTMP20           ATMP22            WTMP22      
##  Min.   :-11.200   Min.   : 0.700   Min.   :-11.300   Min.   : 0.600  
##  1st Qu.:  7.900   1st Qu.: 7.500   1st Qu.:  8.200   1st Qu.: 7.500  
##  Median :  7.900   Median : 7.500   Median :  8.200   Median : 7.500  
##  Mean   :  7.821   Mean   : 7.457   Mean   :  8.117   Mean   : 7.457  
##  3rd Qu.:  7.900   3rd Qu.: 7.500   3rd Qu.:  8.200   3rd Qu.: 7.500  
##  Max.   : 12.500   Max.   :12.600   Max.   : 12.600   Max.   :12.600  
##  NA's   :325       NA's   :325      NA's   :338       NA's   :338

Statistical Test- air temperature

Method 3. Test the difference if we use all daily data, instead of specific hours per day

## 
##  Welch Two Sample t-test
## 
## data:  MR_1988VS2017["ATMP1988"] and MR_1988VS2017["ATMP2017"]
## t = -36.513, df = 16842, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.519929 -2.263162
## sample estimates:
## mean of x mean of y 
##  2.450989  4.842534

p=0 < 0.05, we can reject the null hypothesis that Air Temperature in 1988 and 2017 have the same mean

Statistical Test- water temperature

## 
##  Welch Two Sample t-test
## 
## data:  MR_1988VS2017["WTMP1988"] and MR_1988VS2017["WTMP2017"]
## t = -35.351, df = 16708, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.657566 -1.483408
## sample estimates:
## mean of x mean of y 
##  4.611994  6.182481

p=0 < 0.05, we can reject the null hypothesis that Water Temperature in 1988 and 2017 have the same mean

Conclusion:

(1)ATMP at different times of a day have similar distribution (2)WTMP at different times of a day have similar distribution (3)Choice of the hour during a day does not make a difference