Normal Distribution Lab Guide

Harold Nelson

2024-07-14

Intro and Setup

I will walk through the exercises in the lab using a different dataset, OAW2309. It contains weather records from the Olympia airport.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
load("OAW2309.Rdata")
glimpse(OAW2309)
## Rows: 30,075
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo   <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy   <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr   <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…

Exercise 1 (TMAX)

Use the maximum temperature.

TMAXmean <- mean(OAW2309$TMAX)
TMAXsd <- sd(OAW2309$TMAX)

ggplot(data = OAW2309, mapping = aes(x = TMAX)) +
  geom_blank() +
  geom_density() +
  #geom_histogram(aes(y = ..density..)) +
  stat_function(fun = dnorm, args = c(mean = TMAXmean, sd = TMAXsd), col = "tomato")

Exercise 2 (TMAX)

ggplot(data = OAW2309, aes(sample = TMAX)) +
  geom_blank() +
  geom_line(stat = "qq")

There is a better way to do this in base R.

qqnorm(OAW2309$TMAX)
qqline(OAW2309$TMAX)

Exercise 1 (PRCP)

Use Precipitation.

PRCPmean <- mean(OAW2309$PRCP)
PRCPsd <- sd(OAW2309$PRCP)

ggplot(data = OAW2309, mapping = aes(x = PRCP)) +
  geom_blank() +
  geom_density() +
  #geom_histogram(aes(y = ..density..)) +
  stat_function(fun = dnorm, args = c(mean = PRCPmean, sd = PRCPsd), col = "tomato")

Exercise 2 (PRCP)

ggplot(data = OAW2309, aes(sample = PRCP)) +
  geom_blank() +
  geom_line(stat = "qq")

There is a better way to do this in base R.

qqnorm(OAW2309$PRCP)
qqline(OAW2309$PRCP)

Exercise 3

We’ll use TMAX.

sim_norm <- rnorm(n = nrow(OAW2309), mean = TMAXmean, sd = TMAXsd)

df = data_frame(sim_norm)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
str(df)
## tibble [30,075 × 1] (S3: tbl_df/tbl/data.frame)
##  $ sim_norm: num [1:30075] 73.7 54.2 54.6 74.6 79.1 ...
ggplot(df,aes(sample = sim_norm)) +
  geom_line(stat = "qq")

Look a this technique with PRCP, which is far from normal.

qqnormsim(sample = PRCP, data = OAW2309)

Look a this technique with TMAX, which is close to normal.

qqnormsim(sample = TMAX, data = OAW2309)