Lab 2 Instructions

load(“C:/Users/Dairy/Desktop/Baby Mo Files/Pierce/MATH146 Intro to Stats/olywthr.rdata”)

This lab is an analysis of weather in Olympia, Washington. It uses daily data from July 1, 1877 to the middle of July 2017.

Problem 1

The first step is to load the data. The load command in the following chunk works on my computer, but you will have to modify it. Of course before you can do this you need to download the data from the course onto your computer. After that, you can bring it into RStudio with the following steps.

Click on “File”
Click on “Open File”
Navigate to the file and double-click it.
Say yes.
Copy the command that this process placed in your console.
Paste this on top of the command from my system. Leave this command in your first chunk so that it runs everytime you knit.

Run the command str() on your file to verify that your import was successful.

# The load command below must be run every time you run knitr.

load("C:/Users/Dairy/Desktop/Baby Mo Files/Pierce/MATH146 Intro to Stats/olywthr.rdata")

Be careful to keep the R code you need within chunks. Always include the R code and output needed to answer the questions.

Problem 2

Run a summary of the dataframe to look for anomalies. Do all of the minimum and maximum values make sense? Note 2 specific values which you would consider the closest to being suspicious. Are there any NA values?

summary(olywthr)

##  STATION_NAME            DATE                 PRCP       
##  Length:49316       Min.   :1877-07-01   Min.   :0.0000  
##  Class :character   1st Qu.:1913-05-10   1st Qu.:0.0000  
##  Mode  :character   Median :1949-12-24   Median :0.0000  
##                     Mean   :1948-11-12   Mean   :0.1409  
##                     3rd Qu.:1983-09-26   3rd Qu.:0.1400  
##                     Max.   :2017-07-11   Max.   :4.8200  
##       SNOW               TMAX             TMIN             yr      
##  Min.   : 0.00000   Min.   : 15.00   Min.   :-8.00   Min.   :1877  
##  1st Qu.: 0.00000   1st Qu.: 50.00   1st Qu.:34.00   1st Qu.:1913  
##  Median : 0.00000   Median : 59.00   Median :41.00   Median :1949  
##  Mean   : 0.02647   Mean   : 60.64   Mean   :40.42   Mean   :1948  
##  3rd Qu.: 0.00000   3rd Qu.: 71.00   3rd Qu.:47.00   3rd Qu.:1983  
##  Max.   :14.20000   Max.   :104.00   Max.   :76.00   Max.   :2017  
##        mo               dy       
##  Min.   : 1.000   Min.   : 1.00  
##  1st Qu.: 4.000   1st Qu.: 8.00  
##  Median : 7.000   Median :16.00  
##  Mean   : 6.516   Mean   :15.74  
##  3rd Qu.:10.000   3rd Qu.:23.00  
##  Max.   :12.000   Max.   :31.00

# Place your R code here.

Problem 3

Provide the basic descriptive statistics, a histogram and a boxplot for minimum daily temperature (TMIN). You can use summary() but you need to add the interquartile range and the standard deviation. Is this distribution symmetric? Which of these two is the better measure of variation for this variable? Make two additional correct statements about TMAX.

# Place the R code you need to answer this question in this chunk.

Problem 4

I have supplied code to create a smaller dataframe (janmar) containing only observations from the months of January and March. Use tapply() with summary() to compare the TMAX values from these two months. Also produce a side-by-side boxplot. Make two correct statements to describe your results.

janmar = olywthr[olywthr$mo == 1 | olywthr$mo == 3,]

# Place your R code here.

Problem 5

I have supplied code to create a new boolean variable QRain in the dataframe janmar. If PRCP is greater than 0, the variable will be set to TRUE. Otherwise it will be set to FALSE. Produce a table and a barplot of QRain. Describe the results verbally.

janmar$QRain = janmar$PRCP > 0
# Place the R code you need to answer this question in this chunk.

Problem 6

Produce a table and a mosaicplot to describe the relationship between the variables QRain and mo in the dataframe janmar Describe what you see.

# Place the R code you need to answer this question in this chunk.

Problem 7

I have provided code to create an even smaller dataframe (recent) which contains only data from 2013 and later, but uses all days of the year. * Produce a scatterplot to describe the relationship between TMAX and TMIN using the data in recent. Put TMAX in the role of the explanatory variable. * Compute the correlation coefficient. * Explain the meaning of these results.

recent = olywthr[olywthr$yr >= 2014,]

# Place the R code you need to answer this question in this chunk.

Problem 8

Produce a linear model using the dataframe recent which could be used to predict the value of TMIN from a given value of TMAX. There are detailed instructions on how to do this at the very end of the Module 2 notes .

Display the summary results of the linear model.
Use the results of the model to predict the value of TMIN if the value of TMAX is 100. Show the R code you used to make this prediction.

# Place the R code you need to answer this question in this chunk.