Week 7
Time Series, Data Formats, Output Formats, Project Introduction
Housekeeping
Quiz 1 is now graded.
10% (Submitted Quarto File) + 90% (Blackboard Answers, .csv files, and .png file)
Please don’t worry if you are not happy with your score.
Final grading in this course:
adheres to Whitman grading policy, but is fairly gentle.
takes into account assignments, course project, and class particpation.
Quiz 2 will be during Week 11 and will combine previous skills with material from weeks 6 through 10
- It will be similar to Quiz 1 but may have more questions and more steps in multi-step tasks.
If you have questions about your quiz, please let me know.
HW 4 is due on Friday, 10/11.
BUA 455 Group Dashboard Project
Group Assignments
Complete HW 4 - Part 1 TODAY! (This should only take 5 min.)
Note: If you do not complete this Survey, I will not put you in a project group and you can not pass this class.
Groups of 5 or 6 will be determined and posted (Hopefully by Monday)
If you have a request to work with someone, include that information in your survey (Not required).
Friday, 10/11, is the last day I will accept any group requests.
I cannot guarantee that requests will be honored, but I will try.
I control assignments to maintain some balance in skill level among groups.
BUA 455 Group Dashboard Project Information
-
New in Fall 2024: Students are also required to use AI tools to find data.
- I will provide a short demo of going from an obscure idea to good semi-related dataset using AI
This Fall is the first semester that the
Quarto Dashboard
option was fully functional and useable for this class project.In previous semesters, students used
flexdashboard
in RStudio and the storyboard template.- I will post one or two examples from previous semesters next week but they are not directly comparable.
Quarto provides a lot more flexibility BUT requires more patience (iterative editing)
Upcoming Dates
Groups assigned by Monday 10/21 at the latest
Thu. 10/31 at 5:00 PM: Draft Proposals Due - NO GRACE PERIOD
Proposals should be in bullet point format and include links to data sources
It should take me 5 minutes to read your proposed ideas and check your data.
Proposal Meetings:
Come with questions and be prepared to answer my questions (10-15 min. per groups)
Meetings will take place in and outside of class. See sign-up sheet.
Wed. 10/31: HW 5 - Part 1 Due
Thu. 11/7: Quiz 2
Tue. 11/12: Final Proposals Due
Not much longer than draft proposal and also in bullet point
Questions and issues discussed during meeting should be addressed
Reminders about HW 4
Chunk Headers
In Chunk 6 (Part 5), the chunk header in the the template appears as follows:
The
eval=F
prevents this chunk from being evaluated when it is knit.It was included in the template because the original code provided was incomplete and incorrect and would cause errors when rendered.
You are asked to remove the text
eval=F
There are many other chunk header options, such as
echo=F
andinclude=F
- Some options can also be included as fences, e.g.
#|label: import data
and#|echo: false
- Some options can also be included as fences, e.g.
NOTE: If two chunks are given the EXACT SAME name, e.g.
#|label: importing data
, the file will not render.
Quarto Output Formats
So far, all Quarto files in this course have been rendered as HTML (.html) files or slides
- All slides for this course are created in Quarto.
Other common formats are Word documents, PDF documents, Powerpoint Slides, and dashboards
- This Quarto Reference site shows all the possible formats and provides details.
We will use the dashboard (next slide) format in HW 5 and in your projects.
Groups will also write their two project memos in Quarto and publish them as word documents.
- Writing the memos in Quarto files simplifies formatting R, RStudio and packages citations.
Quarto Dashboards
REQUIRED: Download the latest version of Quarto here
- You will not be able to complete HW 5 without having Quarto installed on your computer.
Quarto Dashboard is a new feature of Quarto that is extremely flexible and straightforward to use.
The Quarto Dashboard Gallery includes example dashboards made with R, Python, and other langaugages.
In this course I will provide a simple template for HW 5 that can be used to build your dashboard.
Once you understand how to add pages, rows, column, tabsets, and modify as needed you are welcome to tailor the template to your project.
A Quarto dashboard is a flexible blank canvas that you can tailor to your project and future endeavors.
Types of Time Series Data in R
In recent weeks, we have worked with Box Office Mojo and Bureau of Labor Statistics Data
These datasets are time series data.
They all include a date variable and another quantitative variable that changes at each time period.
So far we have worked with data in an R format called a
tibble
.Two common data formats in R,
tibble
anddata.frame
are needed for creating ggplots of time series.tibble
is the more modern format and is more compatible withtidyverse
commands to manage data.
Today, we’ll discuss a third data format,
xts
that can be used specifically for time series data.
Importing Stock Data as xts
using tidyquant
Package
Yahoo Finance, the Federal Reserve Bank, the Wall Street Journal, and others are excellent data sources that can be directly imported into R.
The default for
getsymbols
in thetidyquant
package is Yahoo Finance.Data format is
xts
which we will cover today
Code
```{r}
#|label: importing data from yahoo finance
#|output: false
# download data from Netflix, Amazon, Disney
# time series starts day after from date specified
# time series ends day before to date specified
getSymbols("NFLX", from = "2015-01-01", to = "2024-09-30")
getSymbols("AMZN", from = "2015-01-01", to = "2024-09-30")
getSymbols("DIS", from = "2015-01-01", to = "2024-09-30")
```
[1] "NFLX"
[1] "AMZN"
[1] "DIS"
Example of hchart
for One Stock
hchart
in the highcharter
package is one way to plot xts
data
R code for Multi-Panel hcharts
display
Stocks can be shown in separate plots that can be shown side by side or in one stacked column
The command
hw_grid
is used to display them andncol
indicates how many columns.
Code
```{r separate stock plots, eval=F}
nflx_plt <- hchart(NFLX$NFLX.Adjusted, name="Adjusted", color="green") |>
hc_add_series(NFLX$NFLX.High, name="High" , color="darkgreen") |>
hc_add_series(NFLX$NFLX.Low, name="Low" , color="lightgreen")
amzn_plt <- hchart(AMZN$AMZN.Adjusted, name="Adjusted", color="blue") |>
hc_add_series(AMZN$AMZN.High, name="High" , color="darkblue") |>
hc_add_series(AMZN$AMZN.Low, name="Low" , color="lightblue")
dis_plt <- hchart(DIS$DIS.Adjusted, name="Adjusted", color="mediumpurple") |>
hc_add_series(DIS$DIS.High, name="High" , color="purple4") |>
hc_add_series(DIS$DIS.Low, name="Low" , color="plum")
```
Multi-Panel hcharts
Display
Week 7 In-class Exercises - Q1
Session ID: bua455f24
In the example above, we use the hw_grid
command to create a multi-plot composition of hcharts.
Previously, we covered another command to create a composition of non-interactive ggplots of tibble
data.
What is that other command?
Hints:
This very useful command is in the gridExtra
package which is loaded.
If gridExtra
is loaded in R, start typing grid
in the console, and the command and others will appear.
Week 7 In-class Exercises - Q2
Session ID: bua455f24
Use provided exampled of
getSymbols
code to write code to import the stock time series for Apple (AAPL
)- Use these dates: from = “2015-01-01”, to = “2024-10-01”
Open the imported
xts
file by clicking on it in theGlobal Environment
Sort the
AAPL.Adjusted
column by clicking on it.Answer Question:
- On what recent date, was AAPL at it’s highest value?
More Information about xts
When these stock datasets are imported, they are in
xts
format.xts
stands for Extensible Time Series which means they are self-aware.The key feature is that
date
is NOT a variable, but instead the dates become row IDs.Any dataset with a
date
variable can be converted to anxts
dataset.Any
xts
dataset can be converted a tibble or data.frame (two common R data formats).
NFLX.Open NFLX.High NFLX.Low NFLX.Close NFLX.Volume NFLX.Adjusted
2015-01-02 49.15143 50.33143 48.73143 49.84857 13475000 49.84857
2015-01-05 49.25857 49.25857 47.14714 47.31143 18165000 47.31143
2015-01-06 47.34714 47.64000 45.66143 46.50143 16037700 46.50143
2015-01-07 47.34714 47.42143 46.27143 46.74286 9849700 46.74286
2015-01-08 47.12000 47.83571 46.47857 47.78000 9601900 47.78000
2015-01-09 47.63143 48.02000 46.89857 47.04143 9578100 47.04143
Merging xts
datasets using merge
Converting xts to a tibble or dataframe (R data formats) is required if you want to create a ggplot or use other methods covered previously
A good first step is to create a merged
xts
dataset of the desired variables.
Code
NFLX.Adjusted AMZN.Adjusted DIS.Adjusted
2015-01-02 49.84857 15.4260 86.69247
2015-01-05 47.31143 15.1095 85.42560
2015-01-06 46.50143 14.7645 84.97248
2015-01-07 46.74286 14.9210 85.84173
2015-01-08 47.78000 15.0230 86.72945
2015-01-09 47.04143 14.8465 87.15482
Converting xts
datasets to tibble format
There are a few ways to convert an xts to a tibble.
In the code below I show the conversion and then I rename the the new date variable as
date
Code
# A tibble: 6 × 4
date NFLX.Adjusted AMZN.Adjusted DIS.Adjusted
<date> <dbl> <dbl> <dbl>
1 2015-01-02 49.8 15.4 86.7
2 2015-01-05 47.3 15.1 85.4
3 2015-01-06 46.5 14.8 85.0
4 2015-01-07 46.7 14.9 85.8
5 2015-01-08 47.8 15.0 86.7
6 2015-01-09 47.0 14.8 87.2
Converting tibble datasets to xts
- Any dataset with a date formatted variable can be converted to an
xts
dataset - This means that we can create a
hchart
ordygraph
(next topic) for any dataset with adate
variable.
Export Import HighChart (hchart
)
Dygraphs - An Alternative to hchart
dygraph
is a more flexible alternative tohchart
.- Straightforward to modify, add reference lines and shaded regions
- Both
dygraph
andhchart
allow viewer to interactively select date range
Here is the dataset we will use:
Code
AMZN.adj DIS.adj NFLX.adj
2015-01-02 15.4260 86.69247 49.84857
2015-01-05 15.1095 85.42560 47.31143
2015-01-06 14.7645 84.97248 46.50143
Basic unformatted plot of three stocks with the range selector option
Code
Two useful formatting options (shown below) to make the plot more readable are: Removing the the grid lines Formatting the axis labels
Vertical lines can be added at specific dates and can be labeled and formatted.
Review: bls_tidy
Function - Labor Data
Before using our function on new data, we ALWAYS examine the .csv files
The number of rows to skip for these three labor datasets is 11.
Code
```{r run bls_tidy and import labor data}
bls_tidy <- function(data_file, skip_num, var_name){
read_csv(data_file, skip = skip_num, show_col_types = F) |>
pivot_longer(cols = Jan:Dec,
names_to = "month",
values_to = "value") |>
filter(!is.na(value)) |>
rename({{var_name}} := "value")
}
labor_force <- bls_tidy("data/bls_civ_lf.csv", skip_num=11, var_name="lf")
unemp <- bls_tidy("data/bls_civ_unemp.csv", skip_num=11, var_name="unemp")
emp <- bls_tidy("data/bls_civ_emp.csv", skip_num=11, var_name="emp")
head(unemp)
```
# A tibble: 6 × 3
Year month unemp
<dbl> <chr> <dbl>
1 2014 Jan 10202
2 2014 Feb 10349
3 2014 Mar 10380
4 2014 Apr 9702
5 2014 May 9859
6 2014 Jun 9460
Joining More than Two Datasets
Last Week and in HW 4 we covered joining TWO datasets.
The commands we covered (there are 4) all have the same limitation: datasets must be joined two at a time.
Joining with Piping
Code
Joining with `by = join_by(Year, month)`
Joining with `by = join_by(Year, month)`
# A tibble: 6 × 5
Year month lf emp unemp
<dbl> <chr> <dbl> <dbl> <dbl>
1 2014 Jan 155352 145150 10202
2 2014 Feb 155483 145134 10349
3 2014 Mar 156028 145648 10380
4 2014 Apr 155369 145667 9702
5 2014 May 155684 145825 9859
6 2014 Jun 155707 146247 9460
Joining without Piping
Joining with `by = join_by(Year, month)`
Joining with `by = join_by(Year, month)`
# A tibble: 6 × 5
Year month lf emp unemp
<dbl> <chr> <dbl> <dbl> <dbl>
1 2014 Jan 155352 145150 10202
2 2014 Feb 155483 145134 10349
3 2014 Mar 156028 145648 10380
4 2014 Apr 155369 145667 9702
5 2014 May 155684 145825 9859
6 2014 Jun 155707 146247 9460
Review: Dates and Plot Data
Chunk below includes code that is similar to Parts 3 and 4 of HW 4.
BONUS: Code modified to show how to get ‘End of Month’ (eom) date.
Code
```{r}
#|label: dates and data mod for plot
lf_plt <- lf_all |>
mutate(date_som = ym(paste(Year, month)), # create som date var
date = ceiling_date(date_som, "month")-1, # create eom month date var
empM = (emp/1000) |> round(2), # convert counts to millions
unempM = (unemp/1000) |> round(2)) |>
select(date, empM, unempM) |> # select vars and reshape
pivot_longer(cols=empM:unempM, names_to = "type", values_to = "count") |>
mutate(type = factor(type, # create factor var for plot
levels = c("unempM", "empM"),
labels = c("Unemployed", "Employed")))
head(lf_plt, 4) # examine first 8 rows
```
# A tibble: 4 × 3
date type count
<date> <fct> <dbl>
1 2014-01-31 Employed 145.
2 2014-01-31 Unemployed 10.2
3 2014-02-28 Employed 145.
4 2014-02-28 Unemployed 10.4
Code for Polished Area Plot for Slides
- Useful for data that sum to a whole: Employed + Unemployed = Total Labor Force
Code
```{r plot code for lf area plot}
lf_area_plt_slides <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - June 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
```
Area Plot Formatted for Slides
Area Plot for HTML, Documents and Export
Additional formatting in previous slides can always be added
Plot exported using
ggsave
which by default exports last plot created
Code
```{r}
#|label: simpler plot code with ggsave export
lf_area_plt <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - Jun. 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12))
ggsave("img/labor_force_area_plot.png", width=6,height=4)
```
Exported Plot
- Looks fine in HTML notes but not slides
- May be fine in Word Document or Dashboard
- If not, previous code shows additional options for formatting
Week 7 In-class Exercise
In this exercise we will:
- Import
labor_tidy.csv
and convert variables to millions and round to 2 decimal places and select two variables. (Review)
- OPTIONAL: use provided example to create an END of Month (eom) date variable and use that.
- Convert
labor_new
to anxts
format,labor_xts
In-class Exercise Cont’d
- Create an unformatted
hchart
OR adygraph
with two variables- Plot
lfM
andempM
and save it aslabor_hc
orlabor_dy
- Plot
Basic hchart
Basic dygraph
In-class Exercise - Final Steps
Submit screenshots of plot from
Viewer
pane.Save R code as an R Script. In the R project folder I have saved an R Script for your work (Updated October 2024).
Copy and paste code into provided R Script and use
save as
to save the file with your name., e.g.Week_7_In_Class_Penelope_Pooler.R
R Script should include:
code I provided to import and modify data
tibble to xts conversion of labor dataset
hchart OR dygraph plot code with comments
Submit final script on Blackboard (counts towards class participation for Week 7)
Due by Friday 10/11. No late submission accepted for In-class Exercises.
Quarto, R Markdown files and R Scripts
Quarto and Markdown files are ‘smart’, i.e. aware of where they are located.
R Scripts (older common file type) are useful BUT not aware of file location.
User must specify working directory
The script I provided is saved to your working directory
To check working directory:
getwd()
To set working directory to code_data_output folder: (for working in an R Script)
- Click Session > Set Working Directory > To Source File Location
NOTES:
R users and developers do not recommend setting working directories within code which would have to be changed for each laptop.
Whenever possible, use R Projects and ‘smart’ files such as
.qmd
and.Rmd
files.
Key Points from This Week
Time Series Data
Importing stock data from Yahoo Finance as
xts
Converting between
xts
andtibble
Plotting options include area plots, hcharts and dygraphs
dygraphs
andhcharts
are useful tools for understanding, managing, and curating time series data.HW 4 due Friday, 10/11
Grace period in effect.
TAs and I are available to assist if you have questions.
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.