Load library tidyverse in order to access dplyr and ggplot2
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load the dataset into your global environment
Because airquality is a pre-built dataset, we can write it to our data directory to store it for later use.
data("airquality")
View the data using the “head” function
The function, head, will only display the first 6 rows of the data set. Notice in the global environment to the right, there are 153 observations (rows)
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality[,4])
[1] 89.59133
Rename the months from number to names
Sometimes we prefer the months to be numerical, but here, we need them as the month names. There are MANY ways to do this. Here is one way to convert numbers 5 - 9 to May through September
p1 <- airquality |>ggplot(aes(x=Temp, fill = Month)) +geom_histogram(position ="identity") +scale_fill_discrete (name ="month",labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of average temperature by month
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.6, binwidth =5, color ="black")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p2
Here July stands out for having high frequency of 85 degree temperatures. The dark purple color indicates overlaps of months due to the transparency.
Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() p3
Plot 4: Side by Side Boxplots in Gray Scale
Use the scale_fill_grey command for the grey-scale legend, and again, use fill=Month in the aesthetics.
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))p4
plot 5
p5 <- airquality |>ggplot(aes(Month, Wind, fill = Month)) +labs(x ="Wind speeds from May through September", y ="Wind Speeds", title ="Side-by-Side Boxplot of Wind speeds",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September"))p5
This data visualization is a box plot that shows wind speeds for the months May through September. To make this box plot I did not use any special code but instead substituted the chosen variable of “Temp” by “Wind”. This box plot shows that there are two outliers for wind speeds in June, we can see that June had both the lowest and the highest recorded wind speed. We can also conclude May had a greater median than any other month. Another insight we can gather from this boxplot is that the median wind speeds for July and August is the same.