The first step is always to import whatever data and tools you are working with
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data(airquality)
The source for this dataset is the New York State Department of Conservation and the National Weather Service of 1973 for five months from May to September recorded daily.
p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data source
NINTH SLIDE
Plot 1 Output
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
TENTH SLIDE
Plot 2: Improve the histogram
This includes things like the Alpha, Bin width, and Color of Border
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.5, binwidth =5, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June", "July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")
ELEVENTH SLIDE
Plot 2 Output
p2
This improves the readability of the plot greatly
TWELVTH SLIDE
Plot 3: Side by Side Box Plot Organized by Month
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August","September"))
THIRTEENTH SLIDE
Plot 3 Output
p3
This Presents all of the Outliers Clearly
FOURTEENTH SLIDE
Plot 4: Side by Side Boxplot in Greyscale
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))
scale_fill_grey is the focus here
FIFTEENTH SLIDE
Plot 4 Output
p4
SIXTEENTH SLIDE
Plot 5: A Lineplot Connecting Windspeed and Month
clean_data <-na.omit(airquality)p5 <- clean_data |>ggplot(aes(Ozone, Solar.R)) +geom_point(color ="red", alpha =0.6) +labs(x ="Ozone Concentration (Parts per Billion)", y ="Solar Radiation", title ="Scatterplot Connecting Ozone Concentration with Solar Radiation",caption ="New York State Department of Conservation and the National Weather Service" ) +theme_minimal() +theme(plot.title =element_text(size =15, face ="bold"),axis.title =element_text(size =13, face ="bold"),axis.text =element_text(size =9),plot.caption =element_text(size =7) )
SEVENTEENTH SLIDE
Plot 5 Output
p5
EIGHTEENTH SLIDE
Plot 5 Analysis
Plot 5 is a scatter plot illustrating the relationship between ozone concentration and solar radiation. The plot shows a direct correlation between the two variables, suggesting that higher the Ozone concentration, the lower the Solar radiation levels. However, the data points exhibit some scatter, indicating that other factors may also influence ozone levels.
The plot was created using the ggplot2 package in R, with the na.omit() function used to remove missing data. The geom_point() layer added the scatter plot points, and the theme_minimal() function applied a minimalist theme. The plot’s labels and aesthetics were customized using the labs() and theme() functions. With these the font size and whether it was bold or not was able to be implemented.