Air Quality Tutorial and Homework Assignment

install.packages(“tidyverse”) library(tidyverse)

Load the dataset into your global environment

airquality <- airquality

Look at the structure of the data

str(airquality)

Calculating Summary Statistics

mean(airquality$Temp)

mean(airquality[,4])

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)

sd(airquality$Wind)

var(airquality)

Change the Months from 5-9 to May through September

airquality\(Month[airquality\)Month == 5] <- “May” airquality\(Month[airquality\)Month == 6] <- “June” airquality\(Month[airquality\)Month == 7] <- “July” airquality\(Month[airquality\)Month == 8] <- “August” airquality\(Month[airquality\)Month == 9] <- “September”

Look at the summary statistics of the dataset, and see how Month has changed to have characteristics instead of numbers

str(airquality)

summary(airquality)

Month is a categorical variable with different levels, called factors.

airquality\(Month<-factor(airquality\)Month, levels=c(“May”, “June”,“July”, “August”, “September”))

Plot 1: Create a histogram categorized by Month with qplot

p1 <- qplot(data = airquality, Temp, fill = Month, geom= “histogram”, bins = 20) p1

Plot 2: Make a histogram using ggplot

Histogram of Average Temperature by Month

p2 <- airquality %>% ggplot(aes(x=Temp, fill=Month)) + geom_histogram(position=“identity”, alpha=0.5, binwidth = 5, color = “white”)+ scale_fill_discrete(name = “Month”, labels = c(“May”, “June”,“July”, “August”, “September”)) p2

Plot 3: Create side-by-side boxplots categorized by Month

Side by Side Boxplots of Average Temperature by Month

p3 <- airquality %>% ggplot(aes(Month, Temp, fill = Month)) + ggtitle(“Temperatures”) + xlab(“Monthly Temperatures”) + ylab(“Frequency”) + geom_boxplot() + scale_fill_discrete(name = “Month”, labels = c(“May”, “June”,“July”, “August”, “September”)) p3

Plot 4: Make the same side-by-side boxplots, but in grey-scale

p4 <- airquality %>% ggplot(aes(Month, Temp, fill = Month)) + ggtitle(“Monthly Temperature Variations”) + xlab(“Monthly Temperatures”) + ylab(“Frequency”) + geom_boxplot()+ scale_fill_grey(name = “Month”, labels = c(“May”, “June”,“July”, “August”, “September”)) p4

Plot 5: Now make one plot on your own of any of the variables in this dataset.

p5 <- airquality %>% ggplot(aes(Month, Wind, fill = Month)) + ggtitle(“Wind Readings”) + xlab(“Wind Readings”) + ylab(“Frequency”) + geom_boxplot() + scale_fill_discrete(name = “Month”, labels = c(“May”, “June”,“July”, “August”, “September”)) p5

This is my plot of wind readings from the months of May to September using data from the “airquality” dataset. I believe that this plot is the most effective way of demonstrating differences in wind reading between months. You are able to see the maximum, minumum, and median readings (although I do understand that “frequency” may be redundant here). The code I used is very similar to that of plot 3. Embedded in the code is directions to pick up values from “Wind” and provide boxplots. The code also labels the x and y axis appropriately, as well as provides a title for the visualization.