p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This plot gives an overview of the temperature distribution according to the dataset. However, it has some limitations that makes it less useful. The colors are all over each other.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.5, binwidth =5, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p2
This improved the readability of the plot because now we can see the colors and they can tell more about the temperature distribution from a month to another.
Plot 3: Create side-by-side boxplot categorized by Month.
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September"))p3
Plot 4: Side-by-side Boxplots in Gray Scale
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))p4
Plot 5: Scatter plot of solar Radiation vs Ozone Levels
airquality <- airquality|>filter(!is.na(Solar.R) &!is.na(Ozone))p5 <- airquality |>ggplot(aes(x = Solar.R, y = Ozone)) +geom_point(aes(color =factor(Month)), size =3, alpha =0.8) +labs(x ="Solar Radiation (Langleys)", y ="Ozone Concentration (ppb)",color ="Month",title ="Scatter Plot of Solar Radiation vs. Ozone Levels",caption ="New York State Department of Conservation and the National Weather Service") +scale_color_manual(values =c("red", "orange", "yellow", "green", "blue"),labels =c("May", "June", "July", "August", "September")) p5
Brief essay on my scatter plot
Plot number 5 is a scatter plot illustrating the relationship between solar radiation and ozone concentration from May to September. Higher solar radiation is attributed to higher ozone levels, especially in June, July, and August. However, in May and September, the ozone level is lower. This made us think that the ozone level is impacted by the change of the seasons.
Before I run the code for my plot I use filter(!is.na()) to remove the missing rows from the data set for my two chosen variables which are Solar R. and Ozone.
Overall I think the scatter plot shows the impact solar radiation can have on the ozone level, and we can also consider a seasonal factor when we consider how the radiation level changes when the seasons change.