Note: If you haven’t installed \(LaTex\), change the output mode in the above YAML to html_document for ease of knitting and homework submission.

This R Markdown (.Rmd) file is a template for your Chapter 4 Homework. Do everything within this file. Make it your own, but be careful not to change the code-figure-text integration I set up with the code appendix and the global options. If you have used R Markdown before and are comfortable with the extra options, feel free to customize to your heart’s desire. In the end, we will grade the knitted PDF or HTML document from within your private GitHub repository. Remember to make regular, small commits (e.g., at least one commit per question) to save your work. We will grade the latest knit, as long as it occurs before the start of the class in which we advance to the next chapter. As always, reach out with questions via GitHub Issues or during office hours.

Ozone data

The corresponding data file (.csv) contains hourly ozone data from two sites in Fort Collins. You should already have this file in your /data folder.

Preparation

You completed the following steps in your Chapter 3 Homework. If correct, you should copy-paste the code into this R Markdown document; FYI, you cannot source() R Markdown files for use of its output in another file because they are intended to be self-contained and reproducible. Therefore, you need to copy-paste parts of your Chapter 3 Homework into this document and adjust the pathnames, if needed.

Load R packages

Import, select, and clean data

Recreate the pipe of dplyr functions that you used to import the data, select and rename the variables listed below, drop missing observations, and assign the output as a tibble (not in that particular order).

  • sample_measurement renamed as ozone_ppm (ozone measurement in ppm)
  • datetime (date in YYYY-MM-DD format and time of measurement in HH:MM:SS)

Examine Data

Examine the structure and contents of the dataframe to confirm the file imported and was manipulated properly.

## # A tibble: 10 Ă— 2
##    ozone_ppm datetime           
##        <dbl> <dttm>             
##  1     0.017 2019-01-01 07:00:00
##  2     0.017 2019-01-01 08:00:00
##  3     0.017 2019-01-01 09:00:00
##  4     0.017 2019-01-01 10:00:00
##  5     0.015 2019-01-01 11:00:00
##  6     0.017 2019-01-01 12:00:00
##  7     0.028 2019-01-01 13:00:00
##  8     0.03  2019-01-01 14:00:00
##  9     0.036 2019-01-01 15:00:00
## 10     0.036 2019-01-01 16:00:00

Question 1: ggplot2 time series

Using ggplot and the corresponding geom, create a time series of ozone measurement across time. Warning: This plot will have a very poor ink-to-information ratio. Ugly plots are okay when you are just exploring data.

Question 2: Base R equivalent

For comparison, what function could you use to create a time series of these data in base R? How does the syntax of this function compare to that of ggplot()?

To create a time series of data in base R, the function plot() can be used which is similar to the ggplot() function in Rstudio. The plot() function in base R is a simple yet versatile plotting function that takes in data through vectors or matrices and allows you to create basic plots and graphs. The customization of a base R plot is limited compared to the ggplot() function in Rstudio. Some of the basic customizations you can use for the base R plots include labeling the x and y axis and creating a title. In R studio, the ggplot() function is way more powerful/flexible and allows for a much greater span of customization. First the type of plot is specified then a numerous amount of aesthetics and labels can be added to create a very unique and fully defined plot or graph. Overall, the base R plot is easy to use and creates fast and simple plots, whereas the ggplot() function in R studio is more complex and has almost limitless customization power.

Question 3: ggplot object

Excluding the geom, assign the plot from Question 1 as a ggplot object with a descriptive name.

Question 4: geom

Now, geom_point() to the ggplot object using the following syntax: object_name + geom_point(). Remember, you have already defined the aes() in the ggplot object in Question 3.

Question 5: ggplot layers

Call and examine object within the R Markdown, Console, or using View. How many layers does this ggplot object contain? Why?

Question 6: theme

Next, add the ggplot2 theme of your choice to the ggplot object with theme_*() function prefix.

Question 7: additions

In addition to assigning a plot as a ggplot object, one can also assign aspects of the figure such as axis labels and titles to an object for later use. For example:

Using this technique and the same additive approach (ggplot_object + ... + title) from Questions 5 and 6, add a title and revise the axis labels.

Question 7: ways to see more granularity

The time series from the previous questions does not look nice. It is hard to discern granular patterns in the data because of their sheer density; there are too many hourly measurements over time. We could look at the data on different time scales, but, because we have not discussed how to manipulate dates and times, we will instead focus first on adding transparency to the data points using the alpha = aesthetic. Try recreating the time-series plot with alpha = 0.2.

Alternatively, we could examine just the ozone measurements that exceed the threshold (0.070 ppm) set by the Environmental Protection Agency. Filter the dataset to the ozone levels exceeding 0.070 ppm, and use these data to construct a time series plot with time of measurement on the x-axis and ozone concentration measurement on the y-axis. Remember to add the relevant geom, title, subtitle, axis labels, and your choice of theme.

Question 8: seasonality

Based on the time series, do you see any seasonal pattern for higher levels of ozone? Describe what you see.

Based on the time series, it looks like the ozone concentration peaks in July and then decreases as the temperature gets colder. This indicates that the ozone concentration is at its greatest during the summer.

Question 9: proportion

What proportion of ozone measurements exceed the EPA guidelines? Instead of plugging in the actual values, make R figure out the length of each vector and the corresponding proportion in one line of code.

## 0.503 %

Question 10: ggplot2 extensions

Navigate to this website and browse the ggplot2 extensions. These themes can be very useful, so it’s good to be aware of them. Which theme would be appropriate for your own research or senior project, and why? How would you use it? Briefly describe your data and why the extension would improve the data visualization and communication.

I think that the hrbrtheme would be a great pick for my senior design project. My senior design project is creating a low-cost, durable, and easily deployable anamometer to measure wind speed and direction on power lines. The anamometer will be deployed primarily in mountainous and suburban regions. I think that this graph would be perfect beacause it is colorful yet simple and clearly displays the desired data.

Appendix

# set global options for figures, code, warnings, and messages
knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = "../figs/",
                      echo = FALSE, warning = FALSE, message = FALSE)
# load packages for current R session
library(tidyverse)
library(dplyr)
library(ggplot2)
# ozone: import, select, drop missing observations, rename
fc_ozone_data <- readr::read_csv(file = "ftc_o3.csv") %>%
  dplyr::select("sample_measurement","datetime") %>%
  drop_na() %>%
  dplyr::rename(ozone_ppm = sample_measurement)
# examine dataframe object 
head(fc_ozone_data, n=10)
# create basic time series using ggplot2 package
ggplot(data = fc_ozone_data,
       aes(x = datetime,
           y = ozone_ppm)) +
  geom_line(size = 0.1) +
  xlab("Date and Time (YYYY-MM-DD, HH:MM:SS)") +
  ylab("Ozone Measurement (ppm)") +
  ggtitle("Time Series of Ozone Measurements in Fort Collins")

# create base layer of ozone time series (no geom) and save to object
ozone_plot <- ggplot(data = fc_ozone_data, 
                     aes(x = datetime,
                         y = ozone_ppm)) 
print(ozone_plot)
# add layer to ggplot object
ozone_plot <- ozone_plot + geom_point()
print(ozone_plot)
# add theme to ggplot object
ozone_plot <- ozone_plot + theme_minimal()
print(ozone_plot)
# create new object with ggplot labels 
ozone_labels <- labs(x = "Time of Measurement (YYYY-MM-DD HH:MM:SS)",
                     y = "Ozone Concentration (ppm)",
                     title = "Hourly Ozone Measurements in Fort Collins, CO")
# add title and axis labels to time series
ozone_labels <- labs(x = "Time of Measurement (YYYY-MM-DD HH:MM:SS)",
                     y = "Ozone Concentration (ppm)",
                     title = "Hourly Ozone Measurements in Fort Collins, CO")

ozone_plot <- ozone_plot + ozone_labels
print(ozone_plot)
# add alpha aesthetic to the geom_point()
ozone_plot <- ozone_plot + aes(alpha = 0.2)
print(ozone_plot)

# filter data to ozone concentration measurements exceeding 0.070 ppm 
ozone_filtered <- filter(fc_ozone_data, ozone_ppm > 0.07)
# time series of high ozone measurements
filtered_plot <- ggplot(data = ozone_filtered,
                        aes(x = datetime,
                            y = ozone_ppm)) + 
  geom_point() +
  theme_minimal() +
  xlab("Time of Measurement (YYYY-MM-DD HH:MM:SS)") +
  ylab("Ozone Concentration (ppm)") +
  ggtitle("Hourly Ozone Measurements in Fort Collins, CO")
print(filtered_plot)

# calculate proportion of ozone measurements that exceed 0.070 ppm 
high_ozone <- (nrow(ozone_filtered)/nrow(fc_ozone_data))*100

cat(paste(round(high_ozone, 3),"%"),sep = "")