Note: If you haven’t installed \(LaTex\), change the output mode in the
above YAML to html_document for ease of knitting and
homework submission.
This R Markdown (.Rmd) file is a template for your Chapter 4 Homework. Do everything within this file. Make it your own, but be careful not to change the code-figure-text integration I set up with the code appendix and the global options. If you have used R Markdown before and are comfortable with the extra options, feel free to customize to your heart’s desire. In the end, we will grade the knitted PDF or HTML document from within your private GitHub repository. Remember to make regular, small commits (e.g., at least one commit per question) to save your work. We will grade the latest knit, as long as it occurs before the start of the class in which we advance to the next chapter. As always, reach out with questions via GitHub Issues or during office hours.
The corresponding data file (.csv) contains hourly ozone
data from two sites in Fort Collins. You should already have this file
in your /data folder.
You completed the following steps in your Chapter 3 Homework. If
correct, you should copy-paste the code into this R Markdown document;
FYI, you cannot source() R Markdown files for use of its
output in another file because they are intended to be self-contained
and reproducible. Therefore, you need to copy-paste parts of your
Chapter 3 Homework into this document and adjust the pathnames, if
needed.
Recreate the pipe of dplyr functions that you used to
import the data, select and rename the variables listed below, drop
missing observations, and assign the output as a tibble
(not in that particular order).
sample_measurement renamed as ozone_ppm
(ozone measurement in ppm)datetime (date in YYYY-MM-DD format and time of
measurement in HH:MM:SS)Examine the structure and contents of the dataframe to confirm the file imported and was manipulated properly.
## # A tibble: 10 Ă— 2
## ozone_ppm datetime
## <dbl> <dttm>
## 1 0.017 2019-01-01 07:00:00
## 2 0.017 2019-01-01 08:00:00
## 3 0.017 2019-01-01 09:00:00
## 4 0.017 2019-01-01 10:00:00
## 5 0.015 2019-01-01 11:00:00
## 6 0.017 2019-01-01 12:00:00
## 7 0.028 2019-01-01 13:00:00
## 8 0.03 2019-01-01 14:00:00
## 9 0.036 2019-01-01 15:00:00
## 10 0.036 2019-01-01 16:00:00
ggplot2 time seriesUsing ggplot and the corresponding geom,
create a time series of ozone measurement across time. Warning: This
plot will have a very poor ink-to-information ratio. Ugly plots are okay
when you are just exploring data.
For comparison, what function could you use to create a time series
of these data in base R? How does the syntax of this function compare to
that of ggplot()?
To create a time series of data in base R, the function plot() can be used which is similar to the ggplot() function in Rstudio. The plot() function in base R is a simple yet versatile plotting function that takes in data through vectors or matrices and allows you to create basic plots and graphs. The customization of a base R plot is limited compared to the ggplot() function in Rstudio. Some of the basic customizations you can use for the base R plots include labeling the x and y axis and creating a title. In R studio, the ggplot() function is way more powerful/flexible and allows for a much greater span of customization. First the type of plot is specified then a numerous amount of aesthetics and labels can be added to create a very unique and fully defined plot or graph. Overall, the base R plot is easy to use and creates fast and simple plots, whereas the ggplot() function in R studio is more complex and has almost limitless customization power.
ggplot objectExcluding the geom, assign the plot from Question 1 as a
ggplot object with a descriptive name.
geomNow, geom_point() to the ggplot object
using the following syntax: object_name + geom_point().
Remember, you have already defined the aes() in the
ggplot object in Question 3.
ggplot layersCall and examine object within the R Markdown, Console, or using
View. How many layers does this ggplot object contain?
Why?
Next, add the ggplot2 theme of your choice to the
ggplot object with theme_*() function
prefix.
In addition to assigning a plot as a ggplot object, one
can also assign aspects of the figure such as axis labels and titles to
an object for later use. For example:
Using this technique and the same additive approach
(ggplot_object + ... + title) from Questions 5 and 6, add a
title and revise the axis labels.
The time series from the previous questions does not look nice. It is
hard to discern granular patterns in the data because of their sheer
density; there are too many hourly measurements over time. We could look
at the data on different time scales, but, because we have not discussed
how to manipulate dates and times, we will instead focus first on adding
transparency to the data points using the alpha =
aesthetic. Try recreating the time-series plot with
alpha = 0.2.
Alternatively, we could examine just the ozone measurements that
exceed the threshold (0.070 ppm) set by the Environmental Protection
Agency. Filter the dataset to the ozone levels exceeding 0.070 ppm, and
use these data to construct a time series plot with time of measurement
on the x-axis and ozone concentration measurement on the y-axis.
Remember to add the relevant geom, title, subtitle, axis
labels, and your choice of theme.
Based on the time series, do you see any seasonal pattern for higher levels of ozone? Describe what you see.
Based on the time series, it looks like the ozone concentration peaks in July and then decreases as the temperature gets colder. This indicates that the ozone concentration is at its greatest during the summer.
What proportion of ozone measurements exceed the EPA guidelines? Instead of plugging in the actual values, make R figure out the length of each vector and the corresponding proportion in one line of code.
## 0.503 %
ggplot2 extensionsNavigate to this website and
browse the ggplot2 extensions. These themes can be very
useful, so it’s good to be aware of them. Which theme would be
appropriate for your own research or senior project, and why? How would
you use it? Briefly describe your data and why the extension would
improve the data visualization and communication.
I think that the hrbrtheme would be a great pick for my senior design project. My senior design project is creating a low-cost, durable, and easily deployable anamometer to measure wind speed and direction on power lines. The anamometer will be deployed primarily in mountainous and suburban regions. I think that this graph would be perfect beacause it is colorful yet simple and clearly displays the desired data.
# set global options for figures, code, warnings, and messages
knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = "../figs/",
echo = FALSE, warning = FALSE, message = FALSE)
# load packages for current R session
library(tidyverse)
library(dplyr)
library(ggplot2)
# ozone: import, select, drop missing observations, rename
fc_ozone_data <- readr::read_csv(file = "ftc_o3.csv") %>%
dplyr::select("sample_measurement","datetime") %>%
drop_na() %>%
dplyr::rename(ozone_ppm = sample_measurement)
# examine dataframe object
head(fc_ozone_data, n=10)
# create basic time series using ggplot2 package
ggplot(data = fc_ozone_data,
aes(x = datetime,
y = ozone_ppm)) +
geom_line(size = 0.1) +
xlab("Date and Time (YYYY-MM-DD, HH:MM:SS)") +
ylab("Ozone Measurement (ppm)") +
ggtitle("Time Series of Ozone Measurements in Fort Collins")
# create base layer of ozone time series (no geom) and save to object
ozone_plot <- ggplot(data = fc_ozone_data,
aes(x = datetime,
y = ozone_ppm))
print(ozone_plot)
# add layer to ggplot object
ozone_plot <- ozone_plot + geom_point()
print(ozone_plot)
# add theme to ggplot object
ozone_plot <- ozone_plot + theme_minimal()
print(ozone_plot)
# create new object with ggplot labels
ozone_labels <- labs(x = "Time of Measurement (YYYY-MM-DD HH:MM:SS)",
y = "Ozone Concentration (ppm)",
title = "Hourly Ozone Measurements in Fort Collins, CO")
# add title and axis labels to time series
ozone_labels <- labs(x = "Time of Measurement (YYYY-MM-DD HH:MM:SS)",
y = "Ozone Concentration (ppm)",
title = "Hourly Ozone Measurements in Fort Collins, CO")
ozone_plot <- ozone_plot + ozone_labels
print(ozone_plot)
# add alpha aesthetic to the geom_point()
ozone_plot <- ozone_plot + aes(alpha = 0.2)
print(ozone_plot)
# filter data to ozone concentration measurements exceeding 0.070 ppm
ozone_filtered <- filter(fc_ozone_data, ozone_ppm > 0.07)
# time series of high ozone measurements
filtered_plot <- ggplot(data = ozone_filtered,
aes(x = datetime,
y = ozone_ppm)) +
geom_point() +
theme_minimal() +
xlab("Time of Measurement (YYYY-MM-DD HH:MM:SS)") +
ylab("Ozone Concentration (ppm)") +
ggtitle("Hourly Ozone Measurements in Fort Collins, CO")
print(filtered_plot)
# calculate proportion of ozone measurements that exceed 0.070 ppm
high_ozone <- (nrow(ozone_filtered)/nrow(fc_ozone_data))*100
cat(paste(round(high_ozone, 3),"%"),sep = "")