Airquality HW

Author

Mike Alfaro

Loading in the dataset

Because airquality is a pre-built dataset, we can write it to our data directory to store it for later use.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading dataset into my global environment

airquality <- airquality

Looking at the first 6 rows of my dataset (airquality) using the head function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Learning how to Calculate Summary Statistics

If you want to look at specific statistics, here are some variations on coding. Here are 2 different ways to calculate “mean.”

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
mean(airquality$Month)
[1] 6.993464

Next up ill Calculate Median, Standard Deviation, and Variance you can do this by using their functions()

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Learning how to change months from number to names

Number 5 - 9 to May through September

airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

See how Month has changed to have characters instead of numbers

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

Reorder the Months so they do not default to alphabetical

airquality$Month <- factor(airquality$Month, levels = c("May", "June","July", "August", "September"))

Plot 1: Create a histogram categorized by Month with qplot

Qplot stands for “Quick-Plot” (in the ggplot2 package)

p1 <- qplot(data = airquality,Temp, fill = Month, geom = "histogram", bins = 20)
Warning: `qplot()` was deprecated in ggplot2 3.4.0.
p1

Plot 2: Make a histogram using ggplot

ggplot is more sophisticated than qplot, but still uses ggplot2 package (within Tidyverse) Reorder the legend so that it is not the default (alphabetical), but rather in order that months come Outline the bars in white using the color = “white” command

Histogram of Average Temperature by Month

p2 <- airquality %>%
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  xlab("Monthly Temperatures") +
  ylab("Frequency") +
  ggtitle("Histogram of Monthly Temperatures")
p2

For plot 3 ill create side-by-side boxplots categorized by Month

fill= Month command fills each boxplot with a different color in the aesthetics scale_fill_discrete makes the legend on the side for discrete color values use “labs” to include the title, axis labels, caption for the data source

Side by Side Boxplots of Average Temperature by Month

p3 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) +
  labs(x = "Monthly Tempatures", y = "Tempatures", 
  title = "Side-by-Side Boxplot of Monthly Temperatures",
  caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "month", labels = c("May", "June","July", "August", "September"))
p3

Make the same side-by-side boxplots, but in grey-scale

Use the scale_fill_grey command for the grey-scale legend, and again, use fill=Month in the aesthetics ## Side by Side Boxplots in Gray Scale

p4 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) +
  labs(x = "Monthly Temperatures", y = "Temperatures",
  title = "Side-by-Side Boxplot of Monthly Temperatures",
  caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Now Plot 5, showing monthly wind ranges

p5 <- airquality %>%
ggplot(aes(Month, Wind, fill = Month)) +
  labs(x = "Monthly  Wind Range", y = " Wind", 
       title = "Side-by-Side Boxplot of Monthly Wind Range",
        caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "month", labels = c("May", "June","July", "August", "September"))
p5

Above, shows a side by side monthly comparison of different wind ranges. By using a similar Box plot from plot 3 I changed the line code from “ggplot(aes(Month, Temp, fill = Month))” to “ggplot(aes(Month, Wind, fill = Month))”. That way, the data would no longer show Temp, but instead the data for the variable wind.I also changed the presentation to match the findings of this plot. For example, i change the title, x and y names to more accurately describe this plot. I like this plot because its very pleasing to look at. The color box plots and legend is easy to identify. It’s also easy to understand the story that the graph is telling. It seems there are some outliers in June which is important to note and is highlighted by the graph.