Airquality Assignment

Author

Jorge Pineda

Airquality Assignment

Load Library tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the Dataset into global environment

data("airquality")

The source for this data set is the New York State Department of Conservation and the National Weather Service of 1973 for five months from May to September recorded daily.

View Data

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary statistics

Mean, Median, Standard Deviation, and Variance

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the Months from Numbers to Names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Reordering Months

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source

p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

p2

Plot 3

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

p3

Plot 4

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

p4

Plot 5 Does Solar Radiation affect Wind Speed?

?airquality
starting httpd help server ... done
p5 <- airquality |>
ggplot(aes(x = Solar.R, y = Wind)) +
  geom_point(color = "seagreen", alpha = 0.7, size = 2) +
  labs(title = "Relationship Between Solar Radiation and Wind Speed",
       x = "Solar Radiation (Langley)",
       y = "Wind Speed (mph)",
       caption = "New York State Department of Conservation and the National Weather Service")

p5
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).

Brief Essay

This scatter-plot shows the relationship between solar radiation and wind speed using the airquality data set. The x-axis displays solar radiation in Langley units, and the y-axis shows wind speed in miles per hour (mph).

We see a general amount of data clustered around wind speeds of 10 mph across nearly all levels of solar radiation. From the visualization, we observe that wind speeds remain broadly distributed regardless of radiation level, and there is no clear increasing or decreasing trend.

There is no clear trend in wind speed across solar radiation levels, including both low (<100 Langley) and high (>250 Langley) ranges. Wind speeds remain widely distributed across the entire range, suggesting, in this data set, that solar radiation and wind speed may not be strongly correlated.

Exploring this relationship, one might expect days with lower solar radiation (presumably cloudier days) to bring variation in wind speed, but the data here does not strongly support or refute that idea. While it might seem intuitive that changes in sunlight could influence wind, by the role it plays in climate and known weather dynamics, the current data shows no meaningful relationship.

The type of plot created is a scatter-plot, which is helpful for visualizing relationships between two continuous variables. It allows us to see whether there’s any correlation or pattern between solar radiation and wind speed in the data set.

From this plot, the main insight is that there appears to be little to no relationship between the two variables. The spread of wind speeds remains mostly consistent across the entire range of solar radiation values, indicating that changes in sunlight levels do not clearly affect wind behavior within this data set.

Regarding the code used, I used ?airquality to confirm what units were used for each variable. I also used geom_point() to create the actual scatter-plot, and added custom labels, a title, and a caption crediting the source of the data.