── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load the data in the global environment
This data set is a compilation of data spanning from the months of May to September sourced from the New York State Department of Conservation and the National Weather Service.
data("airquality")
Using this function, we are able to isolate the first six rows of data from the data set.
This piece of code creates our first plot which is a histogram categorized by Month
p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data source
Output for plot 1
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This piece of code creates our second plot which improves the histogram of Average Temperature by Month (the first plot).
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.5, binwidth =5, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")
Output for plot 2.
p2
This piece of code creates our third plot which is side-by-side boxplots categorized by Month.
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September"))
Output for plot 3.
p3
This piece of code creates our fourth plot which is side-by-side box plots in gray scale.
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))
Output for plot 4.
p4
This piece of code creates our fifth plot which is a scatter plot of how solar radiation levels affect the temperature.
p5 <- airquality |>ggplot(aes(Solar.R, Temp)) +geom_point(color ="green", alpha =0.9) +labs(x ="Solar Radiation", y ="Temperatures", title ="Scatterplot Comparing How Solar Radiation Affects Temperature",caption ="New York State Department of Conservation and the National Weather Service" )
Output for plot 5.
p5
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).
For plot 5, I created a scatter plot that visualizes the relationship between Solar Radiation and Temperatures. The plot fails to show a strong relationship between the two variables; however, we can see that there is a general increase in temperatures when solar radiation is higher. Using the geom_point function, I was able to create the scatter plot points and change their colors/shades.