---
title: "Weeks 7 and 8"
subtitle: "Time Series, Data Formats, Output Formats, Project Introduction"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r include=F}
#|label: setup
knitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunks
options(scipen=100) # suppress scientific notation
# install pacman if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
pacman::p_load(pacman, tidyverse, gridExtra, magrittr,
kableExtra, tidyquant, highcharter, dygraphs,
htmlwidgets, widgetframe, js) # install and load required packages
p_loaded() # verify loaded packages
```
- Final grading in this course:
- adheres to Whitman grading policy, but is fairly gentle.
- takes into account assignments, course project, and class particpation.
- Quiz 2 will be during Week 11 and will combine previous skills with material from weeks 6 through 10
- It will be similar to Quiz 1 but may have more questions and more steps in multi-step tasks.
- If you have questions about your quiz, please let me know.
::: fragment
**HW 4 is posted and is due on Wednesday, 10/15/25.**
:::
- **HW 4 - Part 1** is due on Wednesday 10/8/25 and is required in order for you to complete this course.
- **There are no office hours on Thursday this week, 10/9/25.**
## BUA 455 Group Dashboard Project
::: fragment
**Group Assignments**
:::
- Complete HW 4 - Part 1 TODAY, 10/8! (This should only take 5 min.)
- **Note:** If you do not complete this Survey, I will not put you in a project group and you can not pass this class.
- Groups of 5 or 6 will be determined and posted (Hopefully by Monday)
- If you have a request to work with someone, include that information in your survey (Not required).
- **Friday, 10/10, is the last day I will accept any group requests.**
- I cannot guarantee that requests will be honored, but I will try.
- I control group assignments to maintain some balance in skill level among groups.
##
### BUA 455 Group Dashboard Project Information
- [Project Description](https://docs.google.com/document/d/1U-DJ3yeHPpxcg1o12Cg2qc2Besb6Jw6UiyAo6gR4S2I/edit?usp=sharing){target="_blank"}
- [Interesting Data](https://penelope2040.quarto.pub/bua-455-semester/#interesting-data){target="_blank"}
- Students are also required to use AI tools to find data.
- I will provide a short demo of going from an obscure idea to good semi-related dataset using AI
- Last year, I adapted the course to use the `Quarto Dashboard` because it became available in the late spring of 2024.
- I have posted examples from last year to give you ideas.
- Quarto provides a lot of flexibility BUT requires a little patience and iterative editing.
- [Preview of HW 5 - Part 1 Example Using Quarto Dashboard](https://rpubs.com/PeneLope_PE/1229274)
## Upcoming Dates
- **Groups assigned by Wednesday, 10/15 at the latest.**
- **Thu. 10/30 at 5:00 PM:** Draft Proposals Due - NO GRACE PERIOD
- Proposals should be in bullet point format and include links to data sources
- It should take me 5 minutes to read your proposed ideas and check your data.
- **Proposal Meetings:**
- Recommended but not required: Come with questions and be prepared to answer my questions (5-15 min. per groups)
- Meetings will take place outside of class. See sign-up sheet when it is posted.
- **Wed. 10/29:** HW 5 - Part 1 Due
- **Thu. 11/6:** Quiz 2
- **Tue. 11/11:** Final Proposals Due
- Not much longer than draft proposal and also in bullet point.
- Questions and issues discussed during meeting should be addressed.
## Reminders about HW 4
- In Chunk 6 (Part 5), the chunk header in the the template appears as follows:
::: fragment
{fig-align="center"}
:::
- The `eval=F` prevents this chunk from being evaluated when it is knit.
- `eval=F` was included in the template because original code was incomplete.
- Remember to remove the text `eval=F`
- Other helpful chunk header options for dashboard: `echo=F`, `include=F`
- Chunk options can also be included as fences:
- e.g. `#|label: import data` and `#|echo: false`. See [Quarto Cheat Sheet](https://rstudio.github.io/cheatsheets/quarto.pdf){target="_blank"}
- **NOTE:** If two chunks have the SAME name or label, the file will not render.
## Quarto Output Formats
- So far, all Quarto files in this course have been rendered as HTML (.html) files or slides
- All slides for this course are created in Quarto.
- Other common formats are Word documents, PDF documents, Powerpoint Slides, and **dashboards**
- This [Quarto Reference site](https://quarto.org/docs/reference/) shows all the possible formats and provides details.
- We will use the dashboard (next slide) format in HW 5 and in your projects.
- Groups will also write their two project memos in Quarto and publish them as word documents.
- Writing the memos in Quarto files simplifies formatting R, RStudio and packages citations.
## Quarto Dashboards
- REQUIRED: [Download the latest version of Quarto here](https://quarto.org/docs/get-started/)
- You will not be able to complete HW 5 without having Quarto installed on your computer.
- [Quarto Dashboard](https://quarto.org/docs/dashboards/) is a new feature of Quarto that is extremely flexible and straightforward to use.
- The [Quarto Dashboard Gallery](https://quarto.org/docs/gallery/#dashboards) includes example dashboards made with R, Python, and other langaugages.
- In this course I will provide a simple template for HW 5 that can be used to build your dashboard.
- Once you understand how to add pages, rows, column, tabsets, and modify as needed you are welcome to tailor the template to your project.
- **A Quarto dashboard is a flexible blank canvas that you can tailor to your project and future endeavors.**
## Types of Time Series Data in R
- In recent weeks, we have worked with Box Office Mojo and Bureau of Labor Statistics Data
- These datasets are time series data.
- They all include a date variable and another quantitative variable that changes at each time period.
- So far we have worked with data in an R format called a `tibble`.
- Two common data formats in R, `tibble` and `data.frame` are needed for creating ggplots of time series.
- `tibble` is the more modern format and is more compatible with `tidyverse` commands to manage data.
- Today, we'll discuss a third data format, `xts` that can be used specifically for time series data.
##
### Importing Stock Data as `xts` using `tidyquant` Package
- [Yahoo Finance](https://finance.yahoo.com/), the Federal Reserve Bank, the Wall Street Journal, and others are excellent data sources that can be directly imported into R.
- The default for `getsymbols` in the `tidyquant` package is Yahoo Finance.
- Data format is `xts` which we will cover today
::: fragment
```{r}
#|label: importing data from yahoo finance
#|output: false
# download data from Netflix, Amazon, Disney
# time series starts day after from date specified
# time series ends day before to date specified
getSymbols("NFLX", from = "2016-01-01", to = "2025-10-01")
getSymbols("AMZN", from = "2016-01-01", to = "2025-10-01")
getSymbols("DIS", from = "2016-01-01", to = "2025-10-01")
```
:::
## Example of `hchart` for One Stock
`hchart` in the `highcharter` package is one way to plot `xts` data
This chunk not compatible with published slides or published html file but this code will work in a published dashboard (see posted examples).
```{r hchart of 1 stock, fig.dim=c(15,4.5), echo=T, eval=F}
(hc_nflx <- hchart(NFLX$NFLX.Adjusted, name="Adjusted", color="green") |> # plot adj. close
hc_add_series(NFLX$NFLX.High, name="High" , color="blue") |> # add daily high
hc_add_series(NFLX$NFLX.Low, name="Low" , color="red")) # add daily low
```
## R code for Multi-Panel `hcharts` display
- Stocks can be shown in separate plots that can be shown side by side or in one stacked column
- The command `hw_grid` is used to display them and `ncol` indicates how many columns.
::: fragment
```{r separate stock plots, echo=T, eval=F}
nflx_plt <- hchart(NFLX$NFLX.Adjusted, name="Adjusted", color="green") |>
hc_add_series(NFLX$NFLX.High, name="High" , color="darkgreen") |>
hc_add_series(NFLX$NFLX.Low, name="Low" , color="lightgreen")
amzn_plt <- hchart(AMZN$AMZN.Adjusted, name="Adjusted", color="blue") |>
hc_add_series(AMZN$AMZN.High, name="High" , color="darkblue") |>
hc_add_series(AMZN$AMZN.Low, name="Low" , color="lightblue")
dis_plt <- hchart(DIS$DIS.Adjusted, name="Adjusted", color="mediumpurple") |>
hc_add_series(DIS$DIS.High, name="High" , color="purple4") |>
hc_add_series(DIS$DIS.Low, name="Low" , color="plum")
```
:::
## Multi-Panel `hcharts` Display
This chunk not compatible with published slides or published html file but this code will work in a published dashboard (see posted examples).
```{r fig.dim=c(15,6), echo=T, eval=F}
#|label: display of hcharts
hw_grid(nflx_plt, amzn_plt, dis_plt, ncol=3)
```
## Week 7 In-class Exercises - Q1
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
In the example above, we use the `hw_grid` command to create a multi-plot composition of hcharts.
Previously, we covered another command to create a composition of non-interactive ggplots of `tibble` data.
<br>
**What is that other command?**
**Hints:**
This very useful command is in the `gridExtra` package which is loaded.
If `gridExtra` is loaded in R, start typing `grid` in the console, and the command and others will appear.
## Week 7 In-class Exercises - Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
1. Use provided exampled of `getSymbols` code to write code to import the stock time series for Apple (`AAPL`)
- Use these dates: from = "2017-01-01", to = "2025-10-06"
2. Open the imported `xts` file by clicking on it in the `Global Environment`
3. Sort the `AAPL.Adjusted` column by clicking on it.
4. Answer Question:
- On what recent date, was Apple (AAPL) report it's highest adjusted closing value?
::: fragment
```{r}
#|label: import aapl data
```
:::
## More Information about `xts`
- When these stock datasets are imported, they are in `xts` format.
- `xts` stands for **Extensible Time Series** which means they are self-aware.
- The key feature is that `date` is NOT a variable, but instead the dates become row IDs.
- Any dataset with a `date` variable can be converted to an `xts` dataset.
- Any `xts` dataset can be converted a tibble or data.frame (two common R data formats).
::: fragment
```{r}
#|label: examine xts data
head(NFLX)
```
:::
## Merging `xts` datasets using merge
- Converting xts to a tibble or dataframe (R data formats) is required if you want to create a ggplot or use other methods covered previously
- A good first step is to create a merged `xts` dataset of the desired variables.
::: fragment
```{r}
#|label: merge xts stock data
# data are merged by matching dates
nflx_amzn_dis <- merge(NFLX$NFLX.Adjusted,
AMZN$AMZN.Adjusted,
DIS$DIS.Adjusted)
head(nflx_amzn_dis)
```
:::
## Converting `xts` datasets to tibble format
- There are a few ways to convert an xts to a tibble.
- In the code below I show the conversion and then I rename the the new date variable as `date`
::: fragment
```{r convert xts to tibble}
# converting data to a tibble requires a couple lines of code
# I prefer to rename the index as date
nflx_amzn_dis_tibble <- nflx_amzn_dis |>
fortify.zoo() |> as_tibble(.name_repair = "minimal") |>
rename("date" = "Index")
head(nflx_amzn_dis_tibble)
```
:::
## Converting tibble datasets to `xts`
- Any dataset with a date formatted variable can be converted to an `xts` dataset
- This means that we can create a `hchart` or `dygraph` (next topic) for any dataset with a `date` variable.
::: fragment
```{r}
#|label: convert tibble to xts
exp_imp <- read_csv("data/export_import_tidy.csv", show_col_types=F)
exp_imp_xts <- xts(x=exp_imp[,2:3], order.by=exp_imp$date) # order.by must be a date variable
```
:::
::: fragment
```{r}
#|label: hchart code export import xts
exp_imp_hchart <- hchart(exp_imp_xts$exp_indx,
name="Export Price Index", color="blue") |>
hc_add_series(exp_imp_xts$imp_indx,
name="Import Price Index" , color="red")
```
:::
## Export Import HighChart (`hchart`)
```{r fig.dim=c(15,4)}
#|label: display of hchart
exp_imp_hchart
```
## Dygraphs - An Alternative to `hchart`
:::: panel-tabset
### [Background]{style="color:blue;"}
- `dygraph` is a more flexible alternative to `hchart`.
- Straightforward to modify, add reference lines and shaded regions
- Both `dygraph` and `hchart` allow viewer to interactively select date range
::: fragment
Here is the dataset we will use:
```{r}
#|label: dataset for dygraphs example
three_stocks <- merge(AMZN$AMZN.Adjusted, DIS$DIS.Adjusted, NFLX$NFLX.Adjusted)
names(three_stocks) <- c("AMZN.adj", "DIS.adj", "NFLX.adj")
head(three_stocks, 3) # print first three rows only
```
:::
### [Unformatted]{style="color:blue;"}
Basic unformatted plot of three stocks with the range selector option
```{r fig.dim=c(15,4)}
#|label: dygraph with range selector
(dy3 <- dygraph(three_stocks, main="Streaming Company Stock Trends") |>
dySeries("AMZN.adj", label="AMZN", color= "green") |>
dySeries("DIS.adj", label="DIS", color= "red") |>
dySeries("NFLX.adj", label="NFLX", color= "blue") |>
dyRangeSelector())
```
### [Grid & Axes]{style="color:blue;"}
Two useful formatting options (shown below) to make the plot more readable are: Removing the the grid lines Formatting the axis labels
```{r fig.dim=c(15,3.5)}
#|label: dygraph with axes labeled and gridlines removed
(dy3 <- dy3 |>
dyAxis("y", label = "Adjusted Close", drawGrid = FALSE) |>
dyAxis("x", label = "Date", drawGrid = FALSE))
```
### [Event Lines]{style="color:blue;"}
Vertical lines can be added at specific dates and can be labeled and formatted.
```{r fig.dim=c(15,4)}
#|label: dygraph with event lines
(dy3 <- dy3 |>
dyEvent("2020-3-12", label = "Theaters Closed", labelLoc = "bottom") |>
dyEvent("2021-6-15", label = "Restrictions End", labelLoc = "bottom", strokePattern = "solid"))
```
### [Shading]{style="color:blue;"}
Alternatively, it may be helpful to shade plot for a specific time range.
```{r fig.dim=c(15,4)}
#|label: dygraph with shaded region
(dy3 <- dy3 |>
dyShading(from = "2020-3-12", to = "2021-6-15", axis = "x", color = "lightgrey"))
```
::::
## Review: `bls_tidy` Function - Labor Data
- Before using our function on new data, we **ALWAYS** examine the .csv files
- The number of rows to skip for these three labor datasets is **11**.
::: fragment
```{r run bls_tidy and import labor data}
bls_tidy <- function(data_file, skip_num, var_name){
read_csv(data_file, skip = skip_num, show_col_types = F) |>
pivot_longer(cols = Jan:Dec,
names_to = "month",
values_to = "value") |>
filter(!is.na(value)) |>
rename({{var_name}} := "value")
}
labor_force <- bls_tidy("data/bls_civ_lf.csv", skip_num=11, var_name="lf")
unemp <- bls_tidy("data/bls_civ_unemp.csv", skip_num=11, var_name="unemp")
emp <- bls_tidy("data/bls_civ_emp.csv", skip_num=11, var_name="emp")
head(unemp)
```
:::
## Joining More than Two Datasets
- Last Week and in HW 4 we covered joining TWO datasets.
- The commands we covered (there are 4) all have the same limitation: **datasets must be joined two at a time.**
:::::::: columns
:::: {.column width="48%"}
::: fragment
**Joining with Piping**
```{r}
#|label: joining 3 datasets with pipes
# with piping
lf_all <- labor_force |>
full_join(emp) |>
full_join(unemp) |>
write_csv("data/labor_tidy.csv") #export
head(lf_all)
```
:::
::::
::: {.column width="4%"}
:::
:::: {.column width="48%"}
::: fragment
**Joining without Piping**
```{r}
#|label: joining 3 datasets without pipes
lf_all <- full_join(labor_force, emp)
lf_all <- full_join(lf_all, unemp)
head(lf_all)
```
:::
::::
::::::::
## Review: Dates and Plot Data
- Chunk below includes code that is similar to Parts 3 and 4 of HW 4.
- BONUS: Code modified to show how to get 'End of Month' (eom) date.
- [**Useful Link**](https://www.statology.org/lubridate-first-last-day-of-month/)
::: fragment
```{r}
#|label: dates and data mod for plot
lf_plt <- lf_all |>
mutate(date_som = ym(paste(Year, month)), # create som date var
date = ceiling_date(date_som, "month")-1, # create eom month date var
empM = (emp/1000) |> round(2), # convert counts to millions
unempM = (unemp/1000) |> round(2)) |>
select(date, empM, unempM) |> # select vars and reshape
pivot_longer(cols=empM:unempM, names_to = "type", values_to = "count") |>
mutate(type = factor(type, # create factor var for plot
levels = c("unempM", "empM"),
labels = c("Unemployed", "Employed")))
head(lf_plt, 4) # examine first 8 rows
```
:::
## Code for Polished Area Plot for Slides
- Useful for data that sum to a whole: **Employed + Unemployed = Total Labor Force**
::: fragment
```{r plot code for lf area plot}
lf_area_plt_slides <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - June 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
```
:::
## Area Plot Formatted for Slides
```{r echo=F, fig.dim=c(15,7)}
#|label: display of final area plot
lf_area_plt_slides
```
##
### Area Plot for HTML, Documents and Export
- Additional formatting in previous slides can always be added
- Plot exported using `ggsave` which by default exports last plot created
::: fragment
```{r}
#|label: simpler plot code with ggsave export
lf_area_plt <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - Jun. 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12))
ggsave("img/labor_force_area_plot.png", width=6,height=4)
```
:::
## Exported Plot
- Looks fine in HTML notes but not slides
- May be fine in Word Document or Dashboard
- If not, previous code shows additional options for formatting
::: fragment
```{r fig.dim=c(15,6), echo=F}
#|label: display of exported plot
lf_area_plt
```
:::
## Week 8 In-class Exercise
In this exercise we will:
1. Import `labor_tidy.csv` and convert variables to millions and round to 2 decimal places and select two variables. (Review)
- OPTIONAL: use provided example to create an END of Month (eom) date variable and use that.
::: fragment
```{r}
#|label: import labor_tidy and modify variables
labor_new <- read_csv("data/labor_tidy.csv", show_col_types=F) |>
mutate(date = ym(paste(Year,month)),
lfM = (lf/1000) |> round(2),
empM = (emp/1000) |> round(2))|>
select(date, lfM, empM)
```
:::
2. Convert `labor_new` to an `xts` format, `labor_xts`
::: fragment
```{r}
#|label: create labor_xts
```
```{r solution, echo=F}
#|label: create labor_xts sol'n
labor_xts <- labor_new |> xts(x=labor_new[,2:3], order.by=labor_new$date)
```
:::
##
### In-class Exercise Cont'd
4. Create an unformatted `hchart` OR a `dygraph` with two variables
- Plot `lfM` and `empM` and save it as `labor_hc` or `labor_dy`
::: fragment
```{r}
#|label: create and display labor hchart
# (labor_hc <- hchart()) or (labor_dy <- dygraph())
```
:::
## Basic `hchart`
```{r echo=F, fig.dim=c(15,5)}
#|label: create and display labor hchart sol'n
# create labor and emp plot and print to screen
(labor_hc <- hchart(labor_xts$lfM, name="Tot. Labor Force (mill.)", color="red") |>
hc_add_series(labor_xts$empM, name="Employed (mill.)", color="blue"))
```
## Basic `dygraph`
```{r echo=F, fig.dim=c(15,5)}
#|label: create and display basic dygraph sol'n
(labor_dy <- dygraph(labor_xts, main="Total Labor and Employed") |>
dySeries("lfM", label="Total Labor", color= "red") |>
dySeries("empM", label="Employed", color= "blue") |>
dyRangeSelector())
```
##
### In-class Exercise - Final Steps
5. Submit screenshots of plot from `Viewer` pane.
6. Save R code as an R Script. In the R project folder I have saved an R Script for your work (Updated October 2025).
- Copy and paste code into provided R Script and use `save as` to save the file with your name., e.g. `Week_8_In_Class_Penelope_Pooler.R`
- **R Script should include:**
- **code I provided** to import and modify data
- **tibble to xts conversion of labor dataset**
- **hchart OR dygraph plot** code with comments
- Submit final script on Blackboard (counts towards class participation for Week 8)
- Due by Friday 10/17. No late submission accepted for In-class Exercises.
## Quarto, R Markdown files and R Scripts
- Quarto and Markdown files are 'smart', i.e. aware of where they are located.
- R Scripts (older common file type) are useful BUT not aware of file location.
- User must specify working directory
- The script I provided is saved to your working directory
- To check working directory: `getwd()`
- To set working directory to code_data_output folder: (for working in an R Script)
- Click Session \> Set Working Directory \> To Source File Location
::: fragment
**NOTES:**
:::
- R users and developers do not recommend setting working directories within code which would have to be changed for each laptop.
- Whenever possible, use R Projects and 'smart' files such as `.qmd` and `.Rmd` files.
##
### Key Points from Weeks 7 and 8
::: fragment
**Time Series Data**
:::
- Importing stock data from Yahoo Finance as `xts`
- Converting between `xts` and `tibble`
- Plotting options include area plots, hcharts and dygraphs
- `dygraphs` and `hcharts` are useful tools for understanding, managing, and curating time series data.
- HW 4 due Wednesday, 10/15.
- Grace period in effect.
- TAs and I are available to assist if you have questions.
<br>
::: fragment
You may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**
:::