Load library tidyverse in order to access dplyr and ggplot2
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The source for this dataset is the New York State Department of Conservation and the National Weather Service of 1973 for five months from May to September recorded daily.
Load the data set into your global environment
Because airquality is a pre-built dataset, we can write it to our data directory to store it for later use.
p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data sourcep1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.45, binwidth =7, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p2
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September"))p3
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))p4
p5 <- airquality |>ggplot(aes(x=Wind, fill=Month)) +geom_histogram(position="identity", alpha=0.45, binwidth =1.5, color ="#FFFFFF")+scale_fill_discrete(name ="Months", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Winds from May - Sept", y ="Frequency of Winds",title ="Histogram of Monthly Winds from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p5
Essay
I used the histogram because I felt it best represented a palatable way to show how often certain frequencies occur during the months of May-September in 1973. The graph gives the insights to what I assume are the mphs of the winds during the selected months. It looks to have a bell shaped distribution with July, August and September overlapping one another having the highest frequencies. Main things I changed in the code were the aesthetic (aes), to represent x=Wind instead of Temp and I changed the binwidth to spread the data in a more readable manner so the overlapping wasn’t overwhelming. I also change the opacity (alpha), and I tried using a HTML color code instead of just typing the color.