airquality hw

load in the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

##Load the dataset into your global environment

data("airquality")

##View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

##Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235

##Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

##Rename the Months from number to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

##Now look at the summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

##Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, levels=c("May", "June","July", "August", "September"))

##Plot 1: Create a histogram categorized by Month

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##Plot 2: Improve the histogram using ggplot

##Histogram of Average Temperature by Month

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

##Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3 

##Plot 4: Make the same side-by-side boxplots, but in grey-scale

##Side by Side Boxplots in Gray Scale

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

##Plot 5:Improve the histogram using ggplot

Outline the bars in white using the color = “white” command

Use alpha to add some transparency (values between 0 and 1)

Change the binwidth

##Histogram of Average wind by Month

This plot shows the distribution of wind speeds recorded from May through Septemebr in 1973. My hisogram shows this data using each bar to represent the frequency of dayss that had a certanin range in wind speeds. The colors represent the month. This plot shows insights into the monthly variation in wind speeds. For example, if you look for patterns or trendsin wind speed during thesemonths. The transparency in the bars allows you to be able to see overlapping data which represents similar wind speed accross different months. The code I used “ggplot” sets up the basic graph and defines my chosen aesthetics. Wind is plotted on the x axis and the bars are filled based on the month. I also used “geom_histogram” to add the customized histogram. The uses of alpha=0.5 makes the bars a litle bit transparent and “binwidth = 5” sets the width of the bins wheras “color =”white” just makes the outlines white. Small details but I feel like it adds a lot to the graph. Lastly the major differences were in the different variable names, since in this plot we are working on weather. So wind and temp variables were crucial in the readability of my code.

p5 <- airquality |>
  ggplot(aes(x=Wind, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly speed of wind from May - Sept", 
       y = "Frequency of wind",
       title = "Histogram of Monthly speedy of wind from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p5