load(“C:/Users/Dairy/Desktop/Baby Mo Files/Pierce/MATH146 Intro to Stats/olywthr.rdata”)
This lab is an analysis of weather in Olympia, Washington. It uses daily data from July 1, 1877 to the middle of July 2017.
The first step is to load the data. The load command in the following chunk works on my computer, but you will have to modify it. Of course before you can do this you need to download the data from the course onto your computer. After that, you can bring it into RStudio with the following steps.
Run the command str() on your file to verify that your import was successful.
# The load command below must be run every time you run knitr.
load("C:/Users/Dairy/Desktop/Baby Mo Files/Pierce/MATH146 Intro to Stats/olywthr.rdata")
Be careful to keep the R code you need within chunks. Always include the R code and output needed to answer the questions.
Run a summary of the dataframe to look for anomalies. Do all of the minimum and maximum values make sense? Note 2 specific values which you would consider the closest to being suspicious. Are there any NA values?
summary(olywthr)
## STATION_NAME DATE PRCP
## Length:49316 Min. :1877-07-01 Min. :0.0000
## Class :character 1st Qu.:1913-05-10 1st Qu.:0.0000
## Mode :character Median :1949-12-24 Median :0.0000
## Mean :1948-11-12 Mean :0.1409
## 3rd Qu.:1983-09-26 3rd Qu.:0.1400
## Max. :2017-07-11 Max. :4.8200
## SNOW TMAX TMIN yr
## Min. : 0.00000 Min. : 15.00 Min. :-8.00 Min. :1877
## 1st Qu.: 0.00000 1st Qu.: 50.00 1st Qu.:34.00 1st Qu.:1913
## Median : 0.00000 Median : 59.00 Median :41.00 Median :1949
## Mean : 0.02647 Mean : 60.64 Mean :40.42 Mean :1948
## 3rd Qu.: 0.00000 3rd Qu.: 71.00 3rd Qu.:47.00 3rd Qu.:1983
## Max. :14.20000 Max. :104.00 Max. :76.00 Max. :2017
## mo dy
## Min. : 1.000 Min. : 1.00
## 1st Qu.: 4.000 1st Qu.: 8.00
## Median : 7.000 Median :16.00
## Mean : 6.516 Mean :15.74
## 3rd Qu.:10.000 3rd Qu.:23.00
## Max. :12.000 Max. :31.00
# Place your R code here.
Provide the basic descriptive statistics, a histogram and a boxplot for minimum daily temperature (TMIN). You can use summary() but you need to add the interquartile range and the standard deviation. Is this distribution symmetric? Which of these two is the better measure of variation for this variable? Make two additional correct statements about TMAX.
# Place the R code you need to answer this question in this chunk.
I have supplied code to create a smaller dataframe (janmar) containing only observations from the months of January and March. Use tapply() with summary() to compare the TMAX values from these two months. Also produce a side-by-side boxplot. Make two correct statements to describe your results.
janmar = olywthr[olywthr$mo == 1 | olywthr$mo == 3,]
# Place your R code here.
I have supplied code to create a new boolean variable QRain in the dataframe janmar. If PRCP is greater than 0, the variable will be set to TRUE. Otherwise it will be set to FALSE. Produce a table and a barplot of QRain. Describe the results verbally.
janmar$QRain = janmar$PRCP > 0
# Place the R code you need to answer this question in this chunk.
Produce a table and a mosaicplot to describe the relationship between the variables QRain and mo in the dataframe janmar Describe what you see.
# Place the R code you need to answer this question in this chunk.
I have provided code to create an even smaller dataframe (recent) which contains only data from 2013 and later, but uses all days of the year. * Produce a scatterplot to describe the relationship between TMAX and TMIN using the data in recent. Put TMAX in the role of the explanatory variable. * Compute the correlation coefficient. * Explain the meaning of these results.
recent = olywthr[olywthr$yr >= 2014,]
# Place the R code you need to answer this question in this chunk.
Produce a linear model using the dataframe recent which could be used to predict the value of TMIN from a given value of TMAX. There are detailed instructions on how to do this at the very end of the Module 2 notes .
# Place the R code you need to answer this question in this chunk.