---
title: "Weeks 6 and 7"
subtitle: "Reshaping. Cleaning, and Joining Data"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
##
### Housekeeping
```{r include=F}
#|label: setup
knitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunks
options(scipen=100) # suppress scientific notation
# install pacman if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
pacman::p_load(pacman, tidyverse, gridExtra, magrittr,
kableExtra) # install and load required packages
p_loaded() # verify loaded packages
```
**Comments About Quiz 1**
- Grading is in progress, but can not be discussed until all grades are posted.
- Checking the submitted work of all students takes time but should be completed by this weekend.
::: fragment
**Comments about HWs 1 - 3**
:::
- HW 1, HW 2, HW 3 scores and the current HW average is now posted.
- If you did not receive full credit or some of your work is missing, I added a note.
- I am willing to accept resubmitted work for partial credit.
::: fragment
**HW 4 is posted and is due on Wednesday, 10/15/25.**
:::
- **HW 4 - Part 1** is due on Wednesday 10/8/25 and is required in order for you to complete this course.
- Demo Videos are available.
## Recall from Lecture 9:
:::::::: columns
::::: {.column width="48%"}
::: fragment
#### R functions:
:::
- Useful for automating repetitive tasks
::: fragment
#### Best Practices for Writing Functions
:::
- Plan what you want to do
- Write out task steps
- Develop and refine the code to complete tasks
- Subdivide multi-step tasks into short sets of tasks.
- Convert each set of tasks to a function
- Make the function more general so it is more versatile.
:::::
::: {.column width="4%"}
:::
::: {.column width="48%"}

:::
::::::::
##
### Common Function Structure Options
::: fragment
#### Anatomy of a Function (Option 1):
:::
- Commands within function **are not saved** to an object.
- Function results are automatically 'kicked out' as output.
::: fragment
```
Function_Name <- function(input_1, input_2, etc){
command 1 to do "stuff" to inputs |>
command 2 to do "stuff" to inputs |>
command 3 to do "stuff" to inputs |> etc. # output is autmatically "kicked out"
}
```
:::
::: fragment
#### Anatomy of a Function (Option 2 - Shown in Week 5 Lecture):
:::
- Commands within function **are saved** to an object.
- Function ends with name of object so that results are 'kicked out' as output.
::: fragment
```
Function_Name <- function(input_1, input_2, etc){
output <- command 1 to do "stuff" to inputs |>
command 2 to do "stuff" to inputs |>
command 3 to do "stuff" to inputs |> etc.
output # end with name of output so that it is "kicked out" of function
}
```
:::
##
### Box Office Mojo Data Cleaning Steps
1. Examine .csv file to determine number of rows to skip
2. Import raw .csv file and skip header rows above variable names
3. Select useful columns (Columns 1, 4, 7, and 9):
- 1: `Date`
- 4: `Top 10 Gross`- `<chr>` variable with nuisance characters to be removed (`$`, `,`)
- 7: `Releases`
- 9: `Gross` - `<chr>` variable with nuisance characters to be removed (`$`, `,`)
4. Rename these four variables to easier names to work with in R
- New names should not have spaces.
- Lower case names with underscores (`_`) work well in code.
- Variable names and labels can be reformatted for plots and tables.
##
### Box Office Mojo Data Cleaning Steps Cont'd
5 . Remove non-data rows (Holidays, etc.) with a `filter` command.
6. Use `mutate` to convert variables to use-able formats:
- Use `paste` to add year text to `date` character variable and convert to a date variable.
- Convert `Releases` (`num_releases`) to an integer variable
- Use `gsub` and `as.numeric` to convert each gross variable to numeric:
- `gsub` is used to remove nuisance characters, `$` and `,`
- NOTE: multiple adjacent columns with same nuisance characters can be cleaned at the same time with `mutate(across...)`
- `as.numeric` is used to convert character to numeric decimal value (`<dbl>`) once nusaince characters are removed.
## Cleaning One Data Set
```{r}
#|label: cleaning 1 box office mojo dataset
bom2022 <- read_csv("data/box_office_mojo_2022.csv", skip=11, show_col_types = F) |>
select(1,4,7,9) |> # select columns by number (use with care)
rename("date" = "Date",
"top10gross" = "Top 10 Gross",
"num_releases" = "Releases",
"num1gross" = "Gross") |>
filter(!is.na(top10gross)) |> # filters out empty holiday rows
mutate(date = dmy(paste(date,2022)),
num_releases = as.integer(num_releases),
# gross variables cleaned one at a time
top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T),
top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |>
as.numeric(),
num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T),
num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |>
as.numeric()) |>
glimpse()
```
##
### Converting B.O.M. Cleaning Code to a Function
```{r}
#|label: bom_cln_function
bom_cln <- function(data_file, yr, skip_num){ # inputs: data_file is file name
# yr is year of data
# skip_num is number of header rows to skip
read_csv(data_file, show_col_types = F, skip=skip_num) |> # data_file and skip_num used here
select(1,4,7,9) |> # columns needed are always the same for bom
rename("date" = "Date", # column renaming always the same
"top10gross" = "Top 10 Gross",
"num_releases" = "Releases",
"num1gross" = "Gross") |>
filter(!is.na(top10gross)) |> # filter out non-data rows
mutate(date = dmy(paste(date,yr)), # paste yr input to date text and convert to date
num_releases = as.integer(num_releases),
top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T),
top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |>
as.numeric(),
num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T),
num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |>
as.numeric())
}
```
## Working with Dates using `lubridate`
- `lubridate` can convert a wide variety of text information to a date.
- User must specify order of information, e.g. `ymd` indicates year, then month, the day
- Text must include year, but day is not required (See HW 4)
:::: fragment
::: r-fit-text
```{r}
#|label: examples of lubridate
"31st of October, 2040" # text date
dmy("31st of October, 2040") # example of lubridate command dmy
paste("Hello", "Goodbye") # example of paste concatenating two text strings
"March 15" # example of month and day only
# lubridate commands require a year
```
:::
::::
## Lubridate Examples Continued
::: r-fit-text
```{r}
#|label: more lubridate examples
paste("February", 2040) # using paste to add year to month and day
my(paste("February", "2040")) # using lubridate my command with paste
paste("February", "2040") |> my() # same command using piping
"March 15th, 2039" # different form of text date (Q1 next slide)
ymd(20390315) # example of lubridate command ymd
Sys.Date() # demo of Sys.date command
```
:::
##
### Week 6 In-class Exercises - Q1
:::::: columns
::: {.column width="48%"}
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
What is the correct `lubridate` command to convert the following date text to a date value in R.
**"March 15th, 2039"**
<br>
**Hint:** Examine the [Lubridate Cheat Sheet](https://rstudio.github.io/cheatsheets/lubridate.pdf){target="_blank"} and test out commands in console to see what the output is.
**NOTE:** `lubridate` commands will only work if you have run the setup for this lecture to load the `tidyverse` suite of packages.
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}

:::
::::::
## Function Format Options
::: panel-tabset
### [Option 1]{style="color:blue;"}
No objects are saved within function and result is `kicked out`
Ideal for straightforward functions and plot functions
```{r}
#|label: bom_cln function option 1
bom_cln <- function(data_file, yr, skip_num){
read_csv(data_file, show_col_types = F, skip=skip_num) |> # data_file and skip_num used here
select(1,4,7,9) |> # columns needed are always the same for these datasets
rename("date" = "Date", # column renaming always the same
"top10gross" = "Top 10 Gross",
"num_releases" = "Releases",
"num1gross" = "Gross") |>
filter(!is.na(top10gross)) |> # filter out non-data rows
mutate(date = dmy(paste(date,yr)), # paste yr input to date text and convert to date
num_releases = as.integer(num_releases),
top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T),
top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |>
as.numeric(),
num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T),
num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |>
as.numeric())
}
bom2021_Op1 <- bom_cln(data_file = "data/box_office_mojo_2021.csv",
yr = 2021, skip_num=11) # use function
```
### [Option 2]{style="color:blue;"}
Commands within function are saved to an object, `d_out`
Function ends with `d_out` so that result gets `kicked-out` of function
```{r}
#|label: bom_cln function option 2
bom_cln <- function(data_file, yr, skip_num){
d_out <- read_csv(data_file, show_col_types = F, skip=skip_num) |> # data_file and skip_num used here
select(1,4,7,9) |> # columns needed are always the same for these datasets
rename("date" = "Date", # column renaming always the same
"top10gross" = "Top 10 Gross",
"num_releases" = "Releases",
"num1gross" = "Gross") |>
filter(!is.na(top10gross)) |> # filter out non-data rows
mutate(date = dmy(paste(date,yr)), # paste yr input to date text and convert to date
num_releases = as.integer(num_releases),
top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T),
top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |>
as.numeric(),
num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T),
num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |>
as.numeric())
d_out
}
bom2021_Op2 <- bom_cln(data_file = "data/box_office_mojo_2021.csv",
yr = 2021, skip_num=11) # use function
```
### [Option 3]{style="color:blue;"}
This option has 1 more input, `out_file`, to export clean data using `write_csv`
Useful for providing `clean` data to a client or colleague who doesn't use R
```{r}
#|label: bom_cln function option 3
bom_cln_out <- function(data_file, yr, skip_num, out_file){
read_csv(data_file, show_col_types = F, skip=skip_num) |> # data_file and skip_num used here
select(1,4,7,9) |> # columns needed are always the same for these datasets
rename("date" = "Date", # column renaming always the same
"top10gross" = "Top 10 Gross",
"num_releases" = "Releases",
"num1gross" = "Gross") |>
filter(!is.na(top10gross)) |> # filter out non-data rows
mutate(date = dmy(paste(date,yr)), # paste yr input to date text and convert to date
num_releases = as.integer(num_releases),
top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T),
top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |>
as.numeric(),
num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T),
num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |>
as.numeric()) |>
write_csv(out_file)
}
bom2021_Op3 <- bom_cln_out(data_file = "data/box_office_mojo_2021.csv",
yr = 2021, skip_num=11,
out_file = "data/box_office_mojo_2021_tidy.csv") # use function
```
:::
##
### Week 6 In-class Exercises - Q2
:::::: columns
::: {.column width="48%"}
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
When using the 'Option 3' Function, what output(s) are created?
A. An exported .csv file
B. A clean tibble dataset saved to the Global Environment
C. Both an exported .csv file and a clean tibble dataset in R
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}

:::
::::::
## More Comments on Functions
- Even if a function works for one situation, it may not work for every situation.
- Published R functions are very long because they are tested and edited and retested to account for most situations.
- For example, you can examine the text for a function we have used
- In the console, type `read_csv` without the parentheses and push `Enter`.
- You can see the text of the function which is complex.
## Using our Function
We will run the Option 3 version, which cleans the data AND exports a cleaned (tidied) dataset.
Note that there will be one error for 2020 due to the leap day (2/29/2020).
```{r}
#|label: testing our function
bom2018<- bom_cln_out(data_file = "data/box_office_mojo_2018.csv", yr = 2018, skip_num=11,
out_file = "data/box_office_mojo_2018_tidy.csv")
bom2019<- bom_cln_out(data_file = "data/box_office_mojo_2019.csv", yr = 2019, skip_num=11,
out_file = "data/box_office_mojo_2019_tidy.csv")
bom2020<- bom_cln_out(data_file = "data/box_office_mojo_2020.csv", yr = 2020, skip_num=11,
out_file = "data/box_office_mojo_2020_tidy.csv")
bom2021<- bom_cln_out(data_file = "data/box_office_mojo_2021.csv", yr = 2021, skip_num=11,
out_file = "data/box_office_mojo_2021_tidy.csv")
bom2022<- bom_cln_out(data_file = "data/box_office_mojo_2022.csv", yr = 2022, skip_num=11,
out_file = "data/box_office_mojo_2022_tidy.csv")
# fix NA due to leap day manually
# not easily fixed within function
bom2020$date[is.na(bom2020$date)] <- dmy(29022020)
write_csv(bom2020, "data/box_office_mojo_2020_tidy.csv")
```
## Join datasets vertically (stacking)
- If you have multiple identical datasets with the same columns, you can 'stack' them using the `bind_rows` command.
- **Recommended:** Import and format datasets to all have same column names so that this is seamless.
- Functions are useful for creating identically formatted datasets.
::: fragment
```{r}
#|label: create one 5 year BOMdataset
bom_2018_2022 <- bind_rows(bom2018, bom2019, bom2020, bom2021, bom2022) # join datasets
# Modify gross variables (could also be done in function)
bom_2018_2022 <- bom_2018_2022 |>
mutate(top10grossM = (top10gross/1000000) |> round(2),
num1grossM = (num1gross/1000000) |> round(2)) |>
filter(date <= "2022-12-30") |>
glimpse()
```
:::
## HW 4 Preview
We will create a time series line plot from multiple years of data in class
In HW 4 you will use data from the [Bureau of Labor Statistics](https://www.bls.gov/){target="_blank"} that is easier to clean
::::::: columns
::: {.column width="60%"}
- Much of this code is review
- Useful plot formatting code:
- Move legend to bottom: `theme(legend.position="bottom")`
- Format scale of dates: `scale_x_date(date_breaks = "3 months", date_labels = "%b")`
- HW 4 date format: `scale_x_date(date_breaks = "year", date_labels = "%Y)"`
- Format all plot labels using a `labs` command
:::
::: {.column width="2%"}
:::
:::: {.column width="38%"}
- `date_labels` formats are not intuitive.
- Helpful [reference website](https://www.statology.org/scale_x_date-in-r/){target="_blank"}
::: fragment
{fig-align="center"}
:::
::::
:::::::
##
:::: panel-tabset
### [Modify Data]{style="color:blue;"}
Reshape data and create a factor variable with plot label text (Similar to Hw 4 - Part 4)
::: r-fit-text
```{r modify data for lineplot}
bom_plot_data <- bom_2018_2022 |>
select(date, top10grossM, num1grossM) |>
pivot_longer(cols=top10grossM:num1grossM, names_to = "type", values_to="grossM") |>
mutate(type=factor(type,
levels=c("top10grossM", "num1grossM"),
labels=c("Top 10", "No. 1")))
head(bom_plot_data)
```
:::
### [Basic Plot]{style="color:blue;"}
```{r fig.dim=c(15,5)}
#|label: basic line plot
(bom_ln_plot <- bom_plot_data |>
ggplot() +
geom_line(aes(x=date, y=grossM, color=type)) +
theme_classic())
```
### [Legend & Lines]{style="color:blue;"}
```{r fig.dim=c(15,5)}
#|label: modify legend and lines
(bom_ln_plot <- bom_plot_data |>
ggplot() +
geom_line(aes(x=date, y=grossM, color=type), linewidth=0.75) +
theme_classic() +
theme(legend.position="bottom") +
scale_color_manual(values=c("lightblue","blue")))
```
### [X-axis & Labels]{style="color:blue;"}
```{r fig.dim=c(15,5)}
#|label: modify labels and axis breaks
(bom_ln_plot <- bom_ln_plot +
scale_x_date(date_breaks = "3 months", # OG code for date_labels
date_labels = "%b") +
labs(x="Date", y = "Gross ($Mill)", color="", # modify plot labels
title="Top 10 and No. 1 Movie Gross by Date",
subtitle="Jan. 1, 2018 - Dec. 30, 2022",
caption="Data Source:www.boxoffice.mojo.com"))
```
### [Text & Border]{style="color:blue;"}
```{r fig.dim=c(15,5)}
#|label: modify text and borders
(bom_ln_plot <- bom_ln_plot +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
panel.border = element_rect(colour = "lightgrey", fill=NA, size=2),
plot.background = element_rect(colour = "darkgrey", fill=NA, size=2)))
```
::::
##
::: panel-tabset
### [Complete Plot Code]{style="color:blue;"}
Similar but more complex than complete plot code for HW 4.
Additional formatting is required for slides that is not needed in a HTML display.
HW 4 plot code will end with `labs` command
`theme` options may be useful for group project.
```{r complete code}
bom_ln_plot <- bom_plot_data |>
ggplot() +
geom_line(aes(x=date, y=grossM, color=type), size=0.75) +
theme_classic() +
theme(legend.position="bottom") + # position legend at bootom
scale_color_manual(values=c("lightblue","blue")) + # modify line colors
scale_x_date(date_breaks = "3 months", date_labels = "%b") + # OG code for date_labels
labs(x="Date", y = "Gross ($Mill)", color="", # modify plot labels
title="Top 10 and No. 1 Movie Gross by Date",
subtitle="Jan. 1, 2018 - Dec. 30, 2022",
caption="Data Source:www.boxoffice.mojo.com") +
theme(plot.title = element_text(size = 20), # text size and borders
plot.subtitle = element_text(size = 15),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
panel.border = element_rect(colour = "lightgrey", fill=NA, size=2),
plot.background = element_rect(colour = "darkgrey", fill=NA, size=2))
```
### [Final Formatted Plot]{style="color:blue;"}
```{r final plot, echo = F, fig.dim=c(16,6), fig.align='center'}
bom_ln_plot
```
:::
## Bureau of Labor Statistics (BLS) Data
- Bureau of Labor Statistics is an excellent resource for economic data.
- Many (not all) BLS datasets have a similar format.
- In HW 4 you will:
- create a function from code used to clean the monthly unemployment Rate data.
- use this function to import and clean two data sets:
- **Export Price Index** and **Import Price Index**.
- join these two data-sets using a `full_join` and create a `date` variable.
- modify the dataset for a plot and follow instructions to format plot.
- Next lecture(s): We will adapt this function to examine and plot labor force variables.
- Recall: Before importing ANY data, **ALWAYS** examine the `.csv` file.
## HW 4 - Bureau of Labor Statistics Data
[Bureau of Labor Statistics](https://www.bls.gov/){target="_blank"} is a great source for economic data.
In HW 4, we will look at three data sets:
- Unemployment Rate, the percentage of people unemployed (`bls_unemp_rate.csv`)
- Import Price Index (`bls_import_index.csv`)
- Export Price Index (`bls_export_index.csv`)
::: fragment
The Import and Export Price Indices contain data on changes in the prices of nonmilitary goods and services traded between the U.S. and the rest of the world.
:::
- [Documentation on Import and Export Price Indixes](https://www.bls.gov/mxp/){target="_blank"}
- [Additional Documention provided by Investopedia](https://www.investopedia.com/terms/i/import-export-prices.asp){target="_blank"}
- Before going further, examine these three .csv files to determine number of header rows to skip when cleaning the data.
## Introduction to Joins
::::::: columns
:::: {.column width="48%"}
Earlier we discussed stacking datasets vertically with `bind_rows`.
We used `date` to stack the data in the correct order.
Alternatively, we can join datasets horizontally (side-by-side)
- This in only possible if the data sets have variables in common with matching values.
- We will cover four types of common joins in this course.
::: fragment
The [`dplyr` Cheat Sheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf){target="_blank"} is a comprehensive reference.
:::
::::
::: {.column width="4%"}
:::
::: {.column width="48%"}

:::
:::::::
## Types of Joins
- Datasets are joined two at a time, then a third can be joined to first two.
- In BUA 455, we will cover four types of joins (but others exist):
- `left_join`: include all observations in **FIRST** dataset in join command (dataset on left)
- `right_join`: include all observations in **SECOND** dataset in join command (dataset on right)
- `inner_join`: only include observations that are in **BOTH** datasets
- `full_join`: include all observations that are in **EITHER** dataset
- All of these join commands are useful, but `left_join` and `right_join` are redundant.
- I usually (not always) prefer using `full_join` to help reveal data errors or discrepancies between multiple datasets.
## A note about R colors
- There are [657 Colors in R](https://docs.google.com/spreadsheets/d/1m-UfG96DB4ZAOntVyzcjuYrX564AdLGB2f-MA6rvc2s/edit?usp=sharing){target="_blank"}
- We'll talk more about color options and palettes, but it's good to know they exist.
- We don't emphasize data visualization theory in this course, BUT managing data entails visualizations
- The goal is to manage and make sense of the data for a client, colleague, or supervisor.
- Choosing colors, symbols, size of text, size of lines, etc. wisely is an important part of data management and data curation.
- As we transition to discussing the group projects (next Week), we will talk more about how to communicate with data effectively.
- Color choices will be part of that discussion.
- In HW 4 and today's demo, I manually choose colors to provide examples.
## Outline of HW 4
- First Steps
- Create R project for HW 4.
- Create `data` and `img` folders.
- Save HW4 Quarto (`.qmd`) Template to project folder.
- Save data files to `data` folder within project.
- Part 1
- Create `bls_tidy` function from code used to import and clean `bls_unemp_rate.csv`
- Run the function and then use it to import and clean `bls_unemp_rate.csv`
- Part 2
- Use `bls_tidy` function to
- import and clean `bls_import_index.csv` and save it as `import_index`
- import and clean `bls_export_index.csv` and save it as `export_index`
## Outline of HW 4 Cont'd
- Part 3
- Use full `full_join` to join `import_index` and `export_index` into one dataset, `export_import`.
- Use correct `lubridate` command and `paste` command to create a date variable.
- Part 4
- Reshape `export_import` dataset using `pivot_longer` to create `exp_imp_plt`
- `exp_imp_plt` will be used to plot the data.
- Type of index, `type` will be converted to a `factor` with levels and labels for the plot.
## Outline of HW 4 Cont'd
- Part 5
- Create a formatted line plot with:
- one line for the Export Price Index (light blue line).
- one line for Import Price Index (blue line)
- Final Steps
- Answer all Blackboard Questions.
- Knit Quarto (.qmd) file.
- Create README FOR HW 4.
- Save all files correctly within HW 4 R Project folder.
- Zip and submit HW 4 R project folder.
## Example of HW 4 Final Plot
**NOTES:** - Final HW 4 plot should be almost identical to the one below **EXCEPT** for second line of caption.
- Slides version has more formatting. See version in `.html` lecture notes and HW 4 Instructions.
::: fragment
{fig-align="center" height="4.5in"}
:::
## Next Lectures
:::::: columns
::: {.column width="48%"}
- More Examples from [BLS](https://www.bls.gov/){target="_blank"}
- line and area plots,
- Interactive time series plots
- hcharts and dygraphs
- Converting between data formats (`xts` and `tibble`)
- Importing stock data from Yahoo Finance to R
- **Introducing the Group Project**
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
{fig-align="center"}
:::
::::::
##
### Key Points from Weeks 6 and 7
::: fragment
**Developing Functions:**
:::
- Develop code, convert to a function, test it
::: fragment
**Vertical and Horizontal Data Joins**
:::
- Stack datasets vertically using `bind_rows`
- Join datasets horizontally by matching variables and values (**HW 4:** `full_join`)
::: fragment
**Working with date variables using `lubridate`**
:::
- Command(s) to create date variables
- using `paste` to add year to text
::: fragment
You may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**
:::