[1] "NFLX"
[1] "AMZN"
[1] "DIS"
Time Series, Data Formats, Output Formats, Project Introduction
2026-02-23
Final grading in this course:
adheres to Whitman grading policy, but is fairly gentle.
takes into account assignments, course project, and class particpation.
Quiz 2 will be during Week 11 and will combine previous skills with material from weeks 6 through 10
If you have questions about your quiz, please let me know.
HW 4 is posted and is due on Wednesday, 3/4/26.
HW 4 - Part 1 was due already but the grace period is extended until tonight 2/24 and is required in order for you to complete this course.
There are no office hours today, Tuesday, 2/24.
Group Assignments
Complete HW 4 - Part 1 TODAY, 2/24! (This should only take 5 min.)
Note: If you do not complete this Survey, I will not put you in a project group and you can not pass this class.
Groups of 3 or 4 will be determined and posted (Hopefully by Monday)
If you have a request to work with someone, include that information in your survey (Not required).
Wednesday, 2/25, is the last day I will accept any group requests.
I cannot guarantee that requests will be honored, but I will try.
I control group assignments to maintain some balance in skill level among groups.
Students are also required to use AI tools to find data.
I will provide a short demo of going from an obscure idea to good semi-related dataset using AI
I recently adapted the course to use the Quarto Dashboard because it became available in the late spring of 2024.
I have posted examples from last year to give you ideas.
Quarto provides a lot of flexibility BUT requires a little patience and iterative editing.
Groups assigned by Monday, 3/2 at the latest.
Thu. 3/26 at 5:00 PM: Draft Proposals Due - NO GRACE PERIOD
Proposals should be in bullet point format and include links to data sources
It should take me 5 minutes to read your proposed ideas and check your data.
Proposal Meetings:
Recommended but not required: Come with questions and be prepared to answer my questions (5-15 min. per groups)
Meetings will take place outside of class. See sign-up sheet when it is posted.
Wed. 3/25/26: HW 5 - Part 1 Due
Tue. 4/7: Quiz 2
Thu. 4/9: Final Proposals Due
Not much longer than draft proposal and also in bullet point.
Questions and issues discussed during meeting should be addressed.
The eval=F prevents this chunk from being evaluated when it is knit.
eval=F was included in the template because original code was incomplete.
Remember to remove the text eval=F
Other helpful chunk header options for dashboard: echo=F, include=F
Chunk options can also be included as fences:
#|label: import data and #|echo: false. See Quarto Cheat SheetNOTE: If two chunks have the SAME name or label, the file will not render.
So far, all Quarto files in this course have been rendered as HTML (.html) files or slides
Other common formats are Word documents, PDF documents, Powerpoint Slides, and dashboards
We will use the dashboard (next slide) format in HW 5 and in your projects.
Groups will also write their two project memos in Quarto and publish them as word documents.
REQUIRED: Download the latest version of Quarto here
Quarto Dashboard is a new feature of Quarto that is extremely flexible and straightforward to use.
The Quarto Dashboard Gallery includes example dashboards made with R, Python, and other langaugages.
In this course I will provide a simple template for HW 5 that can be used to build your dashboard.
Once you understand how to add pages, rows, column, tabsets, and modify as needed you are welcome to tailor the template to your project.
A Quarto dashboard is a flexible blank canvas that you can tailor to your project and future endeavors.
In recent weeks, we have worked with Box Office Mojo and Bureau of Labor Statistics Data
These datasets are time series data.
They all include a date variable and another quantitative variable that changes at each time period.
So far we have worked with data in an R format called a tibble.
Two common data formats in R, tibble and data.frame are needed for creating ggplots of time series.
tibble is the more modern format and is more compatible with tidyverse commands to manage data.Today, we’ll discuss a third data format, xts that can be used specifically for time series data.
xts using tidyquant PackageYahoo Finance, the Federal Reserve Bank, the Wall Street Journal, and others are excellent data sources that can be directly imported into R.
The default for getsymbols in the tidyquant package is Yahoo Finance.
Data format is xts which we will cover today
[1] "NFLX"
[1] "AMZN"
[1] "DIS"
hchart for One Stockhchart in the highcharter package is one way to plot xts data
This chunk not compatible with published slides or published html file but this code will work in a published dashboard (see posted examples).
hcharts displayStocks can be shown in separate plots that can be shown side by side or in one stacked column
The command hw_grid is used to display them and ncol indicates how many columns.
nflx_plt <- hchart(NFLX$NFLX.Adjusted, name="Adjusted", color="green") |>
hc_add_series(NFLX$NFLX.High, name="High" , color="darkgreen") |>
hc_add_series(NFLX$NFLX.Low, name="Low" , color="lightgreen")
amzn_plt <- hchart(AMZN$AMZN.Adjusted, name="Adjusted", color="blue") |>
hc_add_series(AMZN$AMZN.High, name="High" , color="darkblue") |>
hc_add_series(AMZN$AMZN.Low, name="Low" , color="lightblue")
dis_plt <- hchart(DIS$DIS.Adjusted, name="Adjusted", color="mediumpurple") |>
hc_add_series(DIS$DIS.High, name="High" , color="purple4") |>
hc_add_series(DIS$DIS.Low, name="Low" , color="plum")hcharts DisplayThis chunk not compatible with published slides or published html file but this code will work in a published dashboard (see posted examples).
Poll Everywhere - My User Name: penelopepoolereisenbies685
In the example above, we use the hw_grid command to create a multi-plot composition of hcharts.
Previously, we covered another command to create a composition of non-interactive ggplots of tibble data.
What is that other command?
Hints:
This very useful command is in the gridExtra package which is loaded.
If gridExtra is loaded in R, start typing grid in the console, and the command and others will appear.
Poll Everywhere - My User Name: penelopepoolereisenbies685
Use provided exampled of getSymbols code to write code to import the stock time series for Apple (AAPL)
Open the imported xts file by clicking on it in the Global Environment
Sort the AAPL.Adjusted column by clicking on it.
Answer Question:
xtsWhen these stock datasets are imported, they are in xts format.
xts stands for Extensible Time Series which means they are self-aware.
The key feature is that date is NOT a variable, but instead the dates become row IDs.
Any dataset with a date variable can be converted to an xts dataset.
Any xts dataset can be converted a tibble or data.frame (two common R data formats).
NFLX.Open NFLX.High NFLX.Low NFLX.Close NFLX.Volume NFLX.Adjusted
2016-01-04 10.900 11.000 10.521 10.996 207948000 10.996
2016-01-05 11.045 11.058 10.585 10.766 176646000 10.766
2016-01-06 10.529 11.791 10.496 11.768 330457000 11.768
2016-01-07 11.636 12.218 11.229 11.456 336367000 11.456
2016-01-08 11.633 11.772 11.110 11.139 180671000 11.139
2016-01-11 11.213 11.679 11.120 11.497 219204000 11.497
xts datasets using mergeConverting xts to a tibble or dataframe (R data formats) is required if you want to create a ggplot or use other methods covered previously
A good first step is to create a merged xts dataset of the desired variables.
NFLX.Adjusted AMZN.Adjusted DIS.Adjusted
2016-01-04 10.996 31.8495 94.92046
2016-01-05 10.766 31.6895 93.00326
2016-01-06 11.768 31.6325 92.50550
2016-01-07 11.456 30.3970 91.71281
2016-01-08 11.139 30.3525 91.48238
2016-01-11 11.497 30.8870 92.09995
xts datasets to tibble formatThere are a few ways to convert an xts to a tibble.
In the code below I show the conversion and then I rename the the new date variable as date
# A tibble: 6 × 4
date NFLX.Adjusted AMZN.Adjusted DIS.Adjusted
<date> <dbl> <dbl> <dbl>
1 2016-01-04 11.0 31.8 94.9
2 2016-01-05 10.8 31.7 93.0
3 2016-01-06 11.8 31.6 92.5
4 2016-01-07 11.5 30.4 91.7
5 2016-01-08 11.1 30.4 91.5
6 2016-01-11 11.5 30.9 92.1
xtsxts datasethchart or dygraph (next topic) for any dataset with a date variable.hchart)hchartdygraph is a more flexible alternative to hchart.
dygraph and hchart allow viewer to interactively select date rangeHere is the dataset we will use:
AMZN.adj DIS.adj NFLX.adj
2016-01-04 31.8495 94.92046 10.996
2016-01-05 31.6895 93.00326 10.766
2016-01-06 31.6325 92.50550 11.768
Basic unformatted plot of three stocks with the range selector option
Two useful formatting options (shown below) to make the plot more readable are: Removing the the grid lines Formatting the axis labels
Vertical lines can be added at specific dates and can be labeled and formatted.
bls_tidy Function - Labor DataBefore using our function on new data, we ALWAYS examine the .csv files
The number of rows to skip for these three labor datasets is 11.
bls_tidy <- function(data_file, skip_num, var_name){
read_csv(data_file, skip = skip_num, show_col_types = F) |>
pivot_longer(cols = Jan:Dec,
names_to = "month",
values_to = "value") |>
filter(!is.na(value)) |>
rename({{var_name}} := "value")
}
labor_force <- bls_tidy("data/bls_civ_lf.csv", skip_num=11, var_name="lf")
unemp <- bls_tidy("data/bls_civ_unemp.csv", skip_num=11, var_name="unemp")
emp <- bls_tidy("data/bls_civ_emp.csv", skip_num=11, var_name="emp")
head(unemp)# A tibble: 6 × 3
Year month unemp
<dbl> <chr> <dbl>
1 2014 Jan 10202
2 2014 Feb 10349
3 2014 Mar 10380
4 2014 Apr 9702
5 2014 May 9859
6 2014 Jun 9460
Last Week and in HW 4 we covered joining TWO datasets.
The commands we covered (there are 4) all have the same limitation: datasets must be joined two at a time.
Joining with Piping
# A tibble: 6 × 5
Year month lf emp unemp
<dbl> <chr> <dbl> <dbl> <dbl>
1 2014 Jan 155352 145150 10202
2 2014 Feb 155483 145134 10349
3 2014 Mar 156028 145648 10380
4 2014 Apr 155369 145667 9702
5 2014 May 155684 145825 9859
6 2014 Jun 155707 146247 9460
Joining without Piping
# A tibble: 6 × 5
Year month lf emp unemp
<dbl> <chr> <dbl> <dbl> <dbl>
1 2014 Jan 155352 145150 10202
2 2014 Feb 155483 145134 10349
3 2014 Mar 156028 145648 10380
4 2014 Apr 155369 145667 9702
5 2014 May 155684 145825 9859
6 2014 Jun 155707 146247 9460
Chunk below includes code that is similar to Parts 3 and 4 of HW 4.
BONUS: Code modified to show how to get ‘End of Month’ (eom) date.
#|label: dates and data mod for plot
lf_plt <- lf_all |>
mutate(date_som = ym(paste(Year, month)), # create som date var
date = ceiling_date(date_som, "month")-1, # create eom month date var
empM = (emp/1000) |> round(2), # convert counts to millions
unempM = (unemp/1000) |> round(2)) |>
select(date, empM, unempM) |> # select vars and reshape
pivot_longer(cols=empM:unempM, names_to = "type", values_to = "count") |>
mutate(type = factor(type, # create factor var for plot
levels = c("unempM", "empM"),
labels = c("Unemployed", "Employed")))
head(lf_plt, 4) # examine first 8 rows# A tibble: 4 × 3
date type count
<date> <fct> <dbl>
1 2014-01-31 Employed 145.
2 2014-01-31 Unemployed 10.2
3 2014-02-28 Employed 145.
4 2014-02-28 Unemployed 10.4
lf_area_plt_slides <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - June 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))Additional formatting in previous slides can always be added
Plot exported using ggsave which by default exports last plot created
#|label: simpler plot code with ggsave export
lf_area_plt <- lf_plt |>
ggplot() +
geom_area(aes(x=date, y=count, fill=type)) +
theme_classic() +
theme(legend.position="bottom") +
scale_fill_manual(values=c("red", "blue")) +
scale_x_date(date_breaks = "year", date_labels = "%Y") +
labs(x="Date", y = "Number of Peolple (Millions)", fill="",
title="Total Labor Force: Employed and Unemployed ",
subtitle="Jan. 2014 - Jun. 2024",
caption="Data Source:www.bls.gov") +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12))
ggsave("img/labor_force_area_plot.png", width=6,height=4)In this exercise we will:
Import labor_tidy.csv and convert variables to millions and round to 2 decimal places and select two variables. (Review)
labor_new to an xts format, labor_xtshchart OR a dygraph with two variables
lfM and empM and save it as labor_hc or labor_dyhchartdygraphSubmit screenshots of plot from Viewer pane.
Save R code in the prvided text file. In the R project folder I have saved an text file for your work (Updated February 2026).
Copy and paste code into provided text file and rename the file with your name, e.g. Week_7_In_Class_Penelope_Pooler.txt
text file should include:
code I provided to import and modify data
tibble to xts conversion of labor dataset
hchart OR dygraph plot code with comments
Submit final text file on Blackboard (counts towards class participation for Week 7)
Due by Friday 2/27. No late submission accepted for In-class Exercises.
Quarto, Markdown files, and R Scripts are ‘smart’, i.e. aware of where they are located.
Code from text files can be run in console.
If code from any ‘smart’ file does not run because an import file cannot be found, check your working directory.
To check working directory: getwd()
To set working directory (not recommended for data management):
NOTES:
R users and developers do not recommend setting working directories within code which would have to be changed for each laptop.
Whenever possible, use R Projects and ‘smart’ files that are aware of their respective file location.
Time Series Data
Importing stock data from Yahoo Finance as xts
Converting between xts and tibble
Plotting options include area plots, hcharts and dygraphs
dygraphs and hcharts are useful tools for understanding, managing, and curating time series data.
HW 4 due Wednesday, 3/4.
Grace period in effect.
TAs and I are available to assist if you have questions.
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.