This assignment is based on the Feb 10 Weather notes I discussed on that day.
Using as much of my code as you need, replicate my graph showing the mean value of the daily maximum temperature with the following changes for every calendar day.
Use the median, the minimum, and the maximum instead of the mean. Give them different colors, but don’t bother with a legend.
Fix the date display so that no specific year appears.
Use an alternative to the default theme.
# Put your code here.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
library(dplyr)
library(ggthemes)
load("C:/Users/Ken/Documents/olyw1018.Rdata")
str(olyw1018)
## tibble [28,686 x 9] (S3: tbl_df/tbl/data.frame)
## $ STATION_NAME: chr [1:28686] "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" ...
## $ DATE : Date[1:28686], format: "1948-01-01" "1948-01-02" ...
## $ PRCP : num [1:28686] 0.82 0.15 0.62 0.53 0 0.39 0.51 0.68 0 1.32 ...
## $ SNOW : num [1:28686] 0 0 0 0 0 0 0 0 0 0 ...
## $ TMAX : num [1:28686] 50 43 48 45 46 45 52 49 48 43 ...
## $ TMIN : num [1:28686] 40 40 35 35 31 33 40 36 29 31 ...
## $ yr : num [1:28686] 1948 1948 1948 1948 1948 ...
## $ mo : num [1:28686] 1 1 1 1 1 1 1 1 1 1 ...
## $ dy : int [1:28686] 1 2 3 4 5 6 7 8 9 10 ...
summary(olyw1018)
## STATION_NAME DATE PRCP SNOW
## Length:28686 Min. :1948-01-01 Min. :0.0000 Min. : 0.00000
## Class :character 1st Qu.:1959-10-07 1st Qu.:0.0000 1st Qu.: 0.00000
## Mode :character Median :1979-05-26 Median :0.0000 Median : 0.00000
## Mean :1980-03-14 Mean :0.1405 Mean : 0.03355
## 3rd Qu.:1999-01-25 3rd Qu.:0.1400 3rd Qu.: 0.00000
## Max. :2018-09-21 Max. :4.8200 Max. :14.20000
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 Min. :1948 Min. : 1.000
## 1st Qu.: 50.00 1st Qu.:33.00 1st Qu.:1959 1st Qu.: 4.000
## Median : 59.00 Median :40.00 Median :1979 Median : 7.000
## Mean : 60.46 Mean :39.85 Mean :1980 Mean : 6.502
## 3rd Qu.: 71.00 3rd Qu.:47.00 3rd Qu.:1999 3rd Qu.: 9.000
## Max. :104.00 Max. :69.00 Max. :2018 Max. :12.000
## dy
## Min. : 1.00
## 1st Qu.: 8.00
## Median :16.00
## Mean :15.72
## 3rd Qu.:23.00
## Max. :31.00
olyw1018 %>%
group_by(mo,dy) %>%
summarize(dmax = median(TMAX),
midmax = mean(TMAX),
dmin = median(TMIN)) %>%
ungroup() %>%
mutate(date =make_date(1948:2020,mo,dy))-> cal20
## `summarise()` has grouped output by 'mo'. You can override using the `.groups` argument.
cal20 %>% ggplot(aes(x=date,y=dmax)) +
geom_point(alpha=.5,size=.5) +
geom_point(aes(y = dmax,color = "TMAX")) +
geom_point(aes(y = midmax,color = "MIDMAX")) +
geom_point(aes(y = dmin,color = "TMIN")) +
ggtitle("Average Daily Median, Minimum and Maximum Temperature") + theme_grey()+ theme_linedraw()+
theme(axis.text.x=element_blank(),plot.title = element_text(hjust = 0.5)) -> v0
ggplotly(v0)
We have continuous variables TMAX and PRCP to describe the weather for a day. Add the following discrete variables to the dataframe.
The new variable warmth has the value “Cold” if the value of TMAX is below 40. It has the value “Warm” if TMAX is between 40 and 75. It is “Hot” otherwise.
The new variable wetness has the value “Dry” if PRCP is 0. It has the value “Damp” if PRCP is positive but below .2. Otherwise it has the value “Wet”.
Create these variables using the dplyr function case_when().
Use appropriate graphics to describe the relationships between the continuous variables and the discrete variables on which they were based to verify that your code worked. Pick a different theme from Problem 1.
# Place your code here.
olyw1018 %>%
mutate(
Warmth = case_when(
TMAX < 40 ~ "Cold",
TMAX >= 40 & TMAX < 75.0 ~ "Warm",
TRUE ~ "Hot"),
Wetness = case_when(
PRCP == 0 ~ "Dry",
PRCP > 0 & PRCP < .2 ~ "Damp",
TRUE ~ "Wet")
) -> v2
ggplot(v2, aes(x = Wetness, y = PRCP))+ geom_jitter()+ theme_minimal()
ggplot(v2, aes(x = Warmth, y = TMAX))+ geom_boxplot()+ theme_minimal()
We saw a few different graphs useful in examining the relationship between two categorical variables. Try two of them here. Use different themes.
# Place your code here.
#ggplot(v2, aes(x= warmth, y = wetness, color = STATION_NAME))+geom_jitter(size = 0.8)
ggplot(v2, aes(x= Warmth, y = Wetness))+geom_jitter(alpha=0.5,size = 0.8)
ggplot(v2, aes(x= Warmth, y = Wetness, size = PRCP))+geom_point()
There is no required code. Describe the relationship between warmth and wetness. Also comment on the relative effectiveness of the graphs you used.
It is safe to say that either the hot or cold condition, from problem #3 first graph, it is pointing towards dryness, especially for hot condition where it is extremely obvious. It is also quite obvious when it comes to warm condition where problem #3 graph 1 pointing towards dryness more than wet or damp. However, when you are looking at graph 2, it is also obvious that in warm condition there are more precipitation compared to cold condition because referring back to graph 1, cold condition does tend to be more dry.
The effectiveness of the graph that I have used in this assignment are simple comparison of two categorical for Problem #3. Of course, higher number of categorical variables seems to provide additional information and interesting insight when comparing graphs.