Several lightning fires began after August 15th, 2020 in California. How did this affect air quality?
One way to measure air quality after a fire is by particulate matter in the air. According to the Environmental Protection Agency (EPA) “PM stands for particulate matter (also called particle pollution): the term for a mixture of solid particles and liquid droplets found in the air. Some particles, such as dust, dirt, soot, or smoke, are large or dark enough to be seen with the naked eye. … PM2.5 : fine inhalable particles, with diameters that are generally 2.5 micrometers and smaller.” In this assignment, we’ll analyze data provided by the EPA daily outdoor air quality.
as.Date, plot, boxplot, orderLoad the data.
air = read.csv("http://webpages.csus.edu/fitzgerald/files/stat128/fall20/ad_viz_plotval_data.csv")
Pick a site from the column
Site.Namein the data that you find personally interesting, and select the rows in the data frame that correspond to that site. Use this subset for the remainder of the analysis. Mention why this site interests you.
davis = air[ , "Site.Name"] == "Davis-UCD Campus"
air2 = air[davis, ]
We are going to take a closer look of the PM concentration in Davis, California after the Hennessy Fire that took place in Napa Valley. Davis is approximately 40 miles from the fire. For more information regarding the Hennessey Fire, Fire.CA provides the most current and up to date statuses.
Plot a line plot of the columns
Daily.Mean.PM2.5.Concentrationas a function ofDate. Start one month before the fire and go to the end of the data set. Comment on what the graph shows.
d = air2[, "Date"]
d2 = as.Date(d, format = "%m/%d/%Y")
d_Aug17 = as.Date("2020-07-17") <= d2
air2[, "afterAug17"] = as.Date("2020-07-17") <= air2[, "Date"]
air3 = air2[d_Aug17, ]
d3 = air3[, "Date"]
d2_Aug17 = as.Date(d3, format = "%m/%d/%Y")
pm2 = air3[, "Daily.Mean.PM2.5.Concentration"]
plot(d2_Aug17, pm2, type = "l", xlab = "Date", ylab = "Daily PM Concentration")
Take note as the Hennessey Fire took place on August 17. The graph show a significant Daily Mean PM Concentration increase beyond August 17.
Create a comparative boxplot of “Daily.Mean.PM2.5.Concentration” in the month before the fire and the month after the fire. Comment on what the boxplots show. Hint: create a new column that indicates if the observation happened before or after the fire.
#Before the Fire
d_beforefire = as.Date("2020-07-17") <= d2 & as.Date("2020-08-16") >= d2
air2[, "beforefire"] = as.Date("2020-07-17") <= d2 & as.Date("2020-08-16") >= d2
air_beforefire = air2[d_beforefire, ]
d2_beforefire = air3[, "Date"]
d2_beforefire = as.Date(d3, format = "%m/%d/%Y")
pm2_beforefire = air_beforefire[, "Daily.Mean.PM2.5.Concentration"]
# After the fire
d_afterfire = as.Date("2020-08-17") <= d2
air2[, "afterfire"] = as.Date("2020-08-17") <= d2
air_afterfire = air2[d_afterfire, ]
d2_afterfire = air3[, "Date"]
d2_afterfire = as.Date(d3, format = "%m/%d/%Y")
pm2_afterfire = air_afterfire[, "Daily.Mean.PM2.5.Concentration"]
## I just wanted to leave this here because this took me a long long time to come up with. I was still proud of myself as a new user of R. But eject from my final solution to #3.
boxplot(d2_beforefire, pm2_beforefire )
boxplot(d2_afterfire, pm2_beforefire)
air3[, "before"] = d2_beforefire >= as.Date("2020-08-16")
boxplot(Daily.Mean.PM2.5.Concentration ~ before, data = air3, xlab = "TRUE = After Fire 8/17/2020")
False indicates the Pm2 concentration before the fire on August 17. You can see there is a larder range of the Daily Mean PM2 Concentration after the fire. It is more populated between 30-55. The range is rather equally distanced from both ends yet it still farther than what we would like. This is possibly caused by the inconsistency of the fire’s growth.
Use your work above to implement the function below.
#' Plot the n most recent PM2.5 particle measurements on the y axis, with date as the x axis.
#'
#' @param d data frame containing columns Daily.Mean.PM2.5.Concentration and Date for a single site
#' @param n number of observations to include
plot_pm2.5 = function(d, n = 100)
{
d = air2[, "Date"]
d2 = as.Date(d, format = "%m/%d/%Y")
d_Aug17 = as.Date("2020-07-17") <= d2
air2[, "afterAug17"] = as.Date("2020-07-17") <= air2[, "Date"]
air3 = air2[d_Aug17, ]
d3 = air3[, "Date"]
d2_Aug17 = as.Date(d3, format = "%m/%d/%Y")
order(d2_Aug17, decreasing = FALSE)
dn = tail(d2_Aug17, n)
pm2 = air3[, "Daily.Mean.PM2.5.Concentration"]
plot(dn, pm2, type = "l", xlab = "Date", ylab = "Daily PM Concentration")
}
Verify that
plot_pm2.5works forn = 100andn = 50.
plot_pm2.5(d, n=100)
plot_pm2.5(d, n=80)