The function, head, will only display the first 6 rows of the data set. Notice in the global environment to the right, there are 153 observations (rows)
p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data sourcep1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.5, binwidth =5, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p2
Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>ggplot(aes(x =factor(Month), y = Temp, fill =factor(Month))) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service" ) +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June", "July", "August", "September"))print(p3)
##Plot 4: Side by Side Boxplots in Gray Scale
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))p4
Plot 5: Scatterplot of Ozone vs. Solar Radiation
p5 <- airquality |>ggplot(aes(x = Solar.R, y = Ozone)) +geom_point(aes(color = Month), alpha =0.7) +labs(x ="Solar Radiation (langley)", y ="Ozone Concentration (ppb)", title ="Scatterplot of Ozone vs. Solar Radiation",caption ="Data source: New York State Department of Conservation and the National Weather Service" ) +scale_fill_grey(name ="Month") p5
Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).
brief essay here
I decided to create a scatter plot in order to show the correlation between solar radiation and ozone concentration. This scatter plot gives a visual summary of the correlation and shows how it differs by month. One insight that this plot provides is the seasonal variation of the relationship between solar radiation and ozone concentration, in areas of the plot with more distinct patterns and colors, we can clearly see how this relationship behaves and changes at different times of the year. This can allow us to predict and prepare certain kinds of weather such as heat waves. One noticeable feature of this plot is the fact that I finally learned how to use the themes. I’m still not sure if I did correctly but since it seems to running properly I’m just not going to touch it since I messed with it for like 2 hours. I picked this theme specifically because of the variance of blue in it, its not too overbearing where your eyes are overloaded with colors nor is it too close to where its difficult to differentiate data points.
Load the Library tidyverse
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.3 ✔ tibble 3.2.1
✔ purrr 1.0.2 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors