AirqualityHW

Author

Ashley R



Air Quality Assignment Ashley Ramirez

Load the library tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the data in the global enviroment

data("airquality")

View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4]) 
[1] 77.88235

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the Months from number to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1

p1 <- airquality %>%
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  xlab("Monthly Temperatures from May - Sept") +
  ylab("Frequency") +
  ggtitle("Histogram of Monthly Temperatures from May - Sept")
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2

p2 <- airquality %>%
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  xlab("Monthly Temperatures") +
  ylab("Frequency") +
  ggtitle("Histogram of Monthly Temperatures")
p2

Plot 3

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3

Plot 4

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Plot 5

p5 <- airquality %>%
  ggplot(aes(x = Solar.R, y = Temp )) +
  geom_point(aes(color = factor(Month)), size = 2, alpha = 1) +
  scale_color_manual(name = "Month", 
                     values = c("May" = "#1f77b4", "June" = "#ff7f0e", 
                                "July" = "#2ca02c", "August" = "#d62728", 
                                "September" = "#9467bd")) +
  xlab("Solar Radiation") +
  ylab("Temperature") +
  ggtitle("Scatter Plot of Solar Radiation vs. Temperature")
p5
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).

Brief Essay

Describe the plot type you have created

I have created a scatter plot where the relation of the solar radiation and the temperature is shown monthly, from may until September. The Y axis shows the temperature levels while the x axis shows the solar radiation levels.

Any insights that the plot shows

The visualization shows the that during the summer months, June, July and august, the higher the temperature then the solar radiation levels will be high too. Furthermore, The graph also demonstrates how May has colder temperatures hence lower levels of radiation and how at the end of September the temperature and solar radiation lowers.

Describe any special code you used to make this plot

The only code I wanted to use was the manual scale color with the HEX codes. I got help from Github and YouTube through this video https://youtu.be/VmOlVFXBsyY?si=WLrQHZ4EV6Ch3z0g