Airquality HW

Author

T Konara

Load the library

library(tidyverse)

Load the dataset into your global environment

data("airquality")

Temp Chunk

plotX <- airquality |>
  ggplot(aes(x=Ozone , y= Temp))+
  geom_point()
plotX
Warning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).

View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate summary statistics

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235

Calculate median, standard deviation, and variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the months from numbers to names

airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8] <- "August"
airquality$Month[airquality$Month == 9] <- "September"

Now look at the summary statistics of the data set

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, levels=c("May","June","July","August","September"))

Plot 1: Create a histogram categorized by Month

P1 <- airquality |>

ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position= "identity")+
  scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
  labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")

Plot 1 output

P1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2: Improve the histogram of Average Temperature by Month

p2 <-airquality |>
  ggplot(aes(x= Temp, fill=Month))+
  geom_histogram(position= "identity", alpha=0.5, binwidth=5, color= "white")+
   scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
  labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")

Plot2 output

p2

Plot3: Create side-by-side boxplots categorized by Month

p3 <-airquality |>
  ggplot(aes(Month,Temp, fill = Month))+
  labs(x= "Months from May through September", y= "Temperatures", caption = "New York State Department of Conservation and the National Weather Service")+
  geom_boxplot()+
  scale_fill_discrete(name = "Month",labels = c("May","June","July","August","September"))

Plot3 output

p3

Plot4: Side by Side Boxplots in Gray Scale

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot4 output

p4

Plot5: Create a scatter plot categorized by Month

p5 <- airquality |>
  ggplot(aes(x= Temp, y = Month))+
  labs(x="Monthly Temperatures from May - Sept", y= "Months", title= "Scatter plot of Monthly Temperatures",caption= "New York State Department of Conservation and the National Weather Service")+
  geom_point()+
  scale_fill_discrete(name = "Month", labels = c("May","June","July","August","September"))

Plot5 output

p5

Essay

For my Plot 5, I created a scatter plot to show the temperatures recorded from May through September. I think this type of plot is a good choice because it clearly shows how temperatures vary across different months and makes it easy to compare them side by side.

From this plot, I noticed that temperatures started lower in May and gradually increased through June and July, with the highest temperatures appearing in July and August. In September, the temperatures began to decrease again as the season shifted toward fall. This matches the expected pattern of summer weather.

For the code, I used geom_point() to create the scatter plot. At first, I set it up with ggplot(aes(x= Month,Temp, fill= Month)). However, this caused all the data points for each month to stack vertically, which looked messy and made the plot harder to read. To fix this I switched the axes so that x=Temp and y=Month.This spread the temperature values across the x-axis and placed the months on the y-axis, creating a much cleaner and easier to read scatter plot.