Chapter 2 Notes

Harold Nelson

2023-01-23

Setup

Before you do anything else, create a new project in the existing folder Chapter 2.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(readxl)

Task 1

Use read_csv() to import the file mesosm.csv. Then use str(), glimpse(), and head to examine the contents.

Solution

mesosm = read_csv("mesodata_small.csv")
## Rows: 240 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): STID
## dbl  (7): MONTH, YEAR, TMAX, TMIN, HMAX, HMIN, RAIN
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(mesosm)
## spec_tbl_df [240 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ MONTH: num [1:240] 1 2 3 4 5 6 7 8 9 10 ...
##  $ YEAR : num [1:240] 2014 2014 2014 2014 2014 ...
##  $ STID : chr [1:240] "HOOK" "HOOK" "HOOK" "HOOK" ...
##  $ TMAX : num [1:240] 49.5 47.2 60.7 72.4 84.4 ...
##  $ TMIN : num [1:240] 17.9 17.1 26.1 39.3 48.3 ...
##  $ HMAX : num [1:240] 83 88.3 79 81.8 75.4 ...
##  $ HMIN : num [1:240] 29 39.9 25.4 21 18.8 ...
##  $ RAIN : num [1:240] 0.17 0.3 0.31 0.4 1.25 3.18 2.58 0.95 1.48 1.72 ...
##  $ DATE : Date[1:240], format: "2014-01-01" "2014-02-01" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   MONTH = col_double(),
##   ..   YEAR = col_double(),
##   ..   STID = col_character(),
##   ..   TMAX = col_double(),
##   ..   TMIN = col_double(),
##   ..   HMAX = col_double(),
##   ..   HMIN = col_double(),
##   ..   RAIN = col_double(),
##   ..   DATE = col_date(format = "")
##   .. )
##  - attr(*, "problems")=<externalptr>
glimpse(mesosm)
## Rows: 240
## Columns: 9
## $ MONTH <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9…
## $ YEAR  <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014…
## $ STID  <chr> "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", …
## $ TMAX  <dbl> 49.48032, 47.18071, 60.70613, 72.36483, 84.35065, 90.65433, 90.6…
## $ TMIN  <dbl> 17.92903, 17.05357, 26.06806, 39.32103, 48.31516, 61.89567, 64.7…
## $ HMAX  <dbl> 83.04161, 88.28857, 78.96613, 81.82690, 75.37419, 90.86300, 88.2…
## $ HMIN  <dbl> 29.00226, 39.94393, 25.39871, 21.02207, 18.83000, 28.83267, 33.1…
## $ RAIN  <dbl> 0.17, 0.30, 0.31, 0.40, 1.25, 3.18, 2.58, 0.95, 1.48, 1.72, 0.08…
## $ DATE  <date> 2014-01-01, 2014-02-01, 2014-03-01, 2014-04-01, 2014-05-01, 201…
head(mesosm)
## # A tibble: 6 × 9
##   MONTH  YEAR STID   TMAX  TMIN  HMAX  HMIN  RAIN DATE      
##   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <date>    
## 1     1  2014 HOOK   49.5  17.9  83.0  29.0  0.17 2014-01-01
## 2     2  2014 HOOK   47.2  17.1  88.3  39.9  0.3  2014-02-01
## 3     3  2014 HOOK   60.7  26.1  79.0  25.4  0.31 2014-03-01
## 4     4  2014 HOOK   72.4  39.3  81.8  21.0  0.4  2014-04-01
## 5     5  2014 HOOK   84.4  48.3  75.4  18.8  1.25 2014-05-01
## 6     6  2014 HOOK   90.7  61.9  90.9  28.8  3.18 2014-06-01

Task 2

Recreate the lineplot of mesothme.

Then add a geom_point() layer.

Play with the linetype, color, and size parameters of geom_line() and the color and size parameters of geom_point() until the graph looks good to you. If you don’t understand how to do this, ask your best friend.

Note: you need to move the aes() up into the call to ggplot().

Solution

mesomthe <- filter(mesosm, STID =="MTHE")
ggplot(data = mesomthe) +
  geom_line(mapping = aes(x = DATE, y = RAIN))

ggplot(data = mesomthe, mapping = aes(x = DATE, y = RAIN)) +
  geom_line(linetype = "dashed",
            color = "blue",
            size = .2) +
  geom_point(size = 1, color = "red")

Task 3

In the facet_wrap() graph with ncol = 1, replace the geom_line() with a geom_point. Also map color to STID.

Solution

ggplot(data = mesosm) +
  geom_point(mapping = aes(x = DATE,
                          y = RAIN,
                          color = STID)) +
  facet_wrap(facets = vars(STID), ncol = 1)

Task 4

In the first graph in 2.8.3, make the following changes.

  1. Move the aes() to the ggplot() call.
  2. Replace the geom_histogram() with geom_demsity()
  3. Instead of bins, use the parameter adjust with different values until you get something you like. Try at least .5, 1, and 2.

Solution

ggplot(data = mesomthe,aes(x = RAIN)) +
  geom_density(adjust = 1) +
  labs(x = "Precipitation (in)", 
       y = "Count of observations") +
  theme_linedraw() +
  theme(axis.text.x = element_text(size = 12, face = "bold"),
        axis.text.y = element_text(size = 12, face = "bold"),
        axis.title.x = element_text(size = 14, face = "bold"),
        axis.title.y = element_text(size = 14, face = "bold"))

Problem 2.9.1

Create a graph that displays four scatterplots of TMIN versus TMAX - one for each site.

Solution

ggplot(data = mesosm,
       aes(TMIN,TMAX)) + geom_point() +
  facet_wrap(~STID)

Problem 2.9.2

Create a boxplot that compares the distribution of TMAX for each site. Make the axis text and labels bold so that they are easier to see.

Solution

ggplot(data = mesosm,aes(x = STID, y = RAIN)) +
  geom_boxplot()

Problem 2.9.3

Create a graph that displays four histograms of RAIN, one for each site. Experiment with changing the number of bins in the histograms to see how this affects the visualization.

ggplot(data = mesosm, aes(x = RAIN)) +
  geom_histogram(bins = 20) +
  facet_wrap(~STID) +
  theme(axis.text.x = element_text(size = 12, face = "bold"),
        axis.text.y = element_text(size = 12, face = "bold"),
        axis.title.x = element_text(size = 14, face = "bold"),
        axis.title.y = element_text(size = 14, face = "bold"))