This assignment is based on the Feb 10 Weather notes I discussed on that day.

Problem 1

Using as much of my code as you need, replicate my graph showing the mean value of the daily maximum temperature with the following changes for every calendar day.

  1. Use the median, the minimum, and the maximum instead of the mean. Give them different colors, but don’t bother with a legend.

  2. Fix the date display so that no specific year appears.

  3. Use an alternative to the default theme.

# Put your code here.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggplot2)
library(dplyr)
library(ggthemes)

load("C:/Users/Ken/Documents/olyw1018.Rdata")

str(olyw1018)
## tibble [28,686 x 9] (S3: tbl_df/tbl/data.frame)
##  $ STATION_NAME: chr [1:28686] "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" ...
##  $ DATE        : Date[1:28686], format: "1948-01-01" "1948-01-02" ...
##  $ PRCP        : num [1:28686] 0.82 0.15 0.62 0.53 0 0.39 0.51 0.68 0 1.32 ...
##  $ SNOW        : num [1:28686] 0 0 0 0 0 0 0 0 0 0 ...
##  $ TMAX        : num [1:28686] 50 43 48 45 46 45 52 49 48 43 ...
##  $ TMIN        : num [1:28686] 40 40 35 35 31 33 40 36 29 31 ...
##  $ yr          : num [1:28686] 1948 1948 1948 1948 1948 ...
##  $ mo          : num [1:28686] 1 1 1 1 1 1 1 1 1 1 ...
##  $ dy          : int [1:28686] 1 2 3 4 5 6 7 8 9 10 ...
summary(olyw1018)
##  STATION_NAME            DATE                 PRCP             SNOW         
##  Length:28686       Min.   :1948-01-01   Min.   :0.0000   Min.   : 0.00000  
##  Class :character   1st Qu.:1959-10-07   1st Qu.:0.0000   1st Qu.: 0.00000  
##  Mode  :character   Median :1979-05-26   Median :0.0000   Median : 0.00000  
##                     Mean   :1980-03-14   Mean   :0.1405   Mean   : 0.03355  
##                     3rd Qu.:1999-01-25   3rd Qu.:0.1400   3rd Qu.: 0.00000  
##                     Max.   :2018-09-21   Max.   :4.8200   Max.   :14.20000  
##       TMAX             TMIN             yr             mo        
##  Min.   : 18.00   Min.   :-8.00   Min.   :1948   Min.   : 1.000  
##  1st Qu.: 50.00   1st Qu.:33.00   1st Qu.:1959   1st Qu.: 4.000  
##  Median : 59.00   Median :40.00   Median :1979   Median : 7.000  
##  Mean   : 60.46   Mean   :39.85   Mean   :1980   Mean   : 6.502  
##  3rd Qu.: 71.00   3rd Qu.:47.00   3rd Qu.:1999   3rd Qu.: 9.000  
##  Max.   :104.00   Max.   :69.00   Max.   :2018   Max.   :12.000  
##        dy       
##  Min.   : 1.00  
##  1st Qu.: 8.00  
##  Median :16.00  
##  Mean   :15.72  
##  3rd Qu.:23.00  
##  Max.   :31.00
olyw1018 %>% 
  group_by(mo,dy) %>% 
  summarize(dmax = median(TMAX),
            midmax = mean(TMAX),
            dmin = median(TMIN)) %>% 
             
    ungroup() %>% 
  mutate(date =make_date(1948:2020,mo,dy))-> cal20
## `summarise()` has grouped output by 'mo'. You can override using the `.groups` argument.
cal20 %>% ggplot(aes(x=date,y=dmax)) +
  geom_point(alpha=.5,size=.5) +
  geom_point(aes(y = dmax,color = "TMAX")) +
  geom_point(aes(y = midmax,color = "MIDMAX")) +
  geom_point(aes(y = dmin,color = "TMIN")) +
  ggtitle("Average Daily Median, Minimum and Maximum Temperature") + theme_grey()+ theme_linedraw()+
  theme(axis.text.x=element_blank(),plot.title = element_text(hjust = 0.5)) -> v0
ggplotly(v0)

Problem 2

We have continuous variables TMAX and PRCP to describe the weather for a day. Add the following discrete variables to the dataframe.

  1. The new variable warmth has the value “Cold” if the value of TMAX is below 40. It has the value “Warm” if TMAX is between 40 and 75. It is “Hot” otherwise.

  2. The new variable wetness has the value “Dry” if PRCP is 0. It has the value “Damp” if PRCP is positive but below .2. Otherwise it has the value “Wet”.

Create these variables using the dplyr function case_when().

Use appropriate graphics to describe the relationships between the continuous variables and the discrete variables on which they were based to verify that your code worked. Pick a different theme from Problem 1.

# Place your code here.
olyw1018 %>%
  mutate(
    
    Warmth = case_when(
      TMAX < 40 ~ "Cold",
      TMAX >= 40 & TMAX < 75.0 ~ "Warm",
      TRUE ~ "Hot"),
    
    Wetness = case_when(
      PRCP == 0 ~ "Dry",
       PRCP > 0 & PRCP < .2 ~ "Damp",
      TRUE ~ "Wet")
  
    ) -> v2



ggplot(v2, aes(x = Wetness, y = PRCP))+ geom_jitter()+ theme_minimal()

ggplot(v2, aes(x = Warmth, y = TMAX))+ geom_boxplot()+ theme_minimal()

Problem 3

We saw a few different graphs useful in examining the relationship between two categorical variables. Try two of them here. Use different themes.

# Place your code here.
#ggplot(v2, aes(x= warmth, y = wetness, color = STATION_NAME))+geom_jitter(size = 0.8)


ggplot(v2, aes(x= Warmth, y = Wetness))+geom_jitter(alpha=0.5,size = 0.8)

ggplot(v2, aes(x= Warmth, y = Wetness, size = PRCP))+geom_point()

Problem 4

There is no required code. Describe the relationship between warmth and wetness. Also comment on the relative effectiveness of the graphs you used.

It is safe to say that either the hot or cold condition, from problem #3 first graph, it is pointing towards dryness, especially for hot condition where it is extremely obvious. It is also quite obvious when it comes to warm condition where problem #3 graph 1 pointing towards dryness more than wet or damp. However, when you are looking at graph 2, it is also obvious that in warm condition there are more precipitation compared to cold condition because referring back to graph 1, cold condition does tend to be more dry.

The effectiveness of the graph that I have used in this assignment are simple comparison of two categorical for Problem #3. Of course, higher number of categorical variables seems to provide additional information and interesting insight when comparing graphs.