Week 5

Cheat Sheets, AI, Functions, and Review

Author

Penelope Pooler Eisenbies

Published

September 23, 2025

Housekeeping

Quiz 1 on Thursday 9/25

Weeks 1 - 4 (Lectures 1 - 8)
- Quiz questions will be similar (but not identical) to Practice Questions
- Mix of R datasets and imported datasets
  - I will provide R code to import data
  - Quiz Template and data files will be provided in Zipped project
  - Review Practice Questions, HW assignments, and Demo Videos

You will be required to download, unzip, and and save a project to your computer (not in Downloads), as part of Quiz 1.

R Online Resources

Some of what we have covered (Week 4 has a more complete review.):
- R projects, file structure and Quarto files
- Working with ‘clean’ data using the dplyr package
  - common commands: read_csv, filter, select, slice, factor
  - Augmenting these commands with operators such as !, %in%, ==
  - Using pipes, |> to make data management more efficient
Reference links for R operators:
- tutorialspoint
- Quick-R
- Or google R Operators
For R Markdown and dplyr commands there are R Cheat Sheets
- Curated List of Text Resources for BUA 455

Using AI to help you write R code

AI tools became use-able in the classroom in 2023.
My current AI of choice in Copilot for Windows.
Chat-GPT and Gemini on the Google platform are also good.
On the next slide I show the result of using copilot for Question 12.
Note that in this example I had to:
- Let Copilot know what R dataset this is.

AI Prompt for Practice Question 12

Note that I added in the second line.
In Quizzes I will let you know the R dataset if that information is needed.
Students should also know which R datasets are being used from doing the practice questions

AI Response for Practice Question 12

Recommendations for using AI

DO: use AI as a search engine to find code or correct code when you are stuck.
DO: use AI iteratively to build code by asking it one question at a time
- Add suggested code to your file, test the code and then either modify question or ask a subsequent question.
DON’T: use AI in place of studing for the exam and plug exam questions into an AI application and expect it to work without your understanding of the question.
- AI can be used during the tests, but it won’t help you if you don’t know what you are looking for or how to phrase the queries correctly.
I use AI to test my quiz questions to insure that they will not provide fully correct code.
AI can be helpful, but only if you understand the code provided and can modify it correctly.

Creating a Function

Any task in R can be converted to a function.
If you are only doing something once or twice, this is not needed.
If you are doing the same tasks 4 or more times, this is very useful
Best Practice:
- Develop and refine the code to complete your tasks
- Subdivide the larger tasks into smaller shorter tasks

Aanatomy of a Function:

Function_Name <- function(input_1, input_2, etc){
   output <- command 1 to do "stuff" to inputs |>
             command 2 to do "stuff" to inputs |>
             command 3 to do "stuff" to inputs |> etc.
   output  # end with name of output so that it is "kicked out" of function
}

Example and Review:

Code below includes preview of lubridate functions to create date, month, day, and quarter variables.

Code

```{r}
#|label: bom_import
bom21_orig <- read_csv("data/box_office_mojo_2021_tidy.csv", show_col_types = F) |>
  mutate(date = ymd(date),                              # converts ymd date text to date var
         month = month(date, label = T, abbr = T),      # creates month var from date var
         day = wday(date, label=T, abbr = T),           # creates wkday var from date var
         qtr = quarter(date),                           # creates quarter var from date var
         num_releases = as.integer(num_releases),
         top10grossM = (top10gross/1000000) |> round(2),
         num1grossM = (num1gross/1000000) |> round(2))
```

Below, bom_basic is a function that completes the tasks above:

Code

```{r bom_import basic function}
bom_basic <- function(data_file) {
  d_out <- read_csv(data_file, show_col_types = F) |>
  mutate(date = ymd(date),
         month = month(date, label = T, abbr = T),
         day = wday(date, label=T, abbr = T),
         qtr = quarter(date),
         num_releases = as.integer(num_releases),
         top10grossM = (top10gross/1000000) |> round(2),
         num1grossM = (num1gross/1000000) |> round(2))
  d_out # outputs function results to screen or saved object name
}
```

What does `bom_basic` function do?

Code

```{r}
#|label: import with read_csv

b21 <- read_csv("data/box_office_mojo_2021_tidy.csv",
                show_col_types = F) |>
  glimpse(width=40)
```

Rows: 365
Columns: 4
$ date         <date> 2021-12-31, 2021…
$ top10gross   <dbl> 27601787, 3502147…
$ num_releases <dbl> 25, 26, 26, 25, 2…
$ num1gross    <dbl> 15407695, 2071790…

Code

```{r}
#|label: import with bom_basic function

bom21 <- bom_basic("data/box_office_mojo_2021_tidy.csv") |>
  glimpse(width=40)
```

Rows: 365
Columns: 9
$ date         <date> 2021-12-31, 2021…
$ top10gross   <dbl> 27601787, 3502147…
$ num_releases <int> 25, 26, 26, 25, 2…
$ num1gross    <dbl> 15407695, 2071790…
$ month        <ord> Dec, Dec, Dec, De…
$ day          <ord> Fri, Thu, Wed, Tu…
$ qtr          <int> 4, 4, 4, 4, 4, 4,…
$ top10grossM  <dbl> 27.60, 35.02, 34.…
$ num1grossM   <dbl> 15.41, 20.72, 20.…

Week 5 In-class Exercises - Q1

Poll Everywhere - My User Name: penelopepoolereisenbies685

Using lubridate commands we converted date to date format (if needed) and created month day and qtr variables from date.

By default, month and day are ordinal factor variables (<ord>).
What is the default data type for qtr (quarter)?

A. character <chr>

B. decimal (double precision) <dbl>

C. factor <fct>

D. integer <int>

Week 5 In-class Exercises - Q2

Poll Everywhere - My User Name: penelopepoolereisenbies685

Here is the line that creates qtr within the mutate statement.

The quarter command is part of the lubridate package:

qtr = quarter(date)

Fill in the blank to convert this variable to a factor variable as you create it:

qtr = _____(quarter(date))

Function Demonstration - Multiple Years

Once function code is developed and tested, we can import 2, or 5, or even 10 data sets very efficiently.

Code

```{r}
#|label: import all 5 datasets
bom22 <- bom_basic("data/box_office_mojo_2022_tidy.csv")
bom21 <- bom_basic("data/box_office_mojo_2021_tidy.csv")
bom20 <- bom_basic("data/box_office_mojo_2020_tidy.csv")
bom19 <- bom_basic("data/box_office_mojo_2019_tidy.csv")
bom18 <- bom_basic("data/box_office_mojo_2018_tidy.csv") |> glimpse( width=60)
```

Rows: 365
Columns: 9
$ date         <date> 2018-12-31, 2018-12-30, 2018-12-29, …
$ top10gross   <dbl> 36240441, 50932176, 58118460, 5666776…
$ num_releases <int> 53, 51, 51, 51, 53, 52, 53, 49, 53, 5…
$ num1gross    <dbl> 10011638, 16440551, 18632907, 1704111…
$ month        <ord> Dec, Dec, Dec, Dec, Dec, Dec, Dec, De…
$ day          <ord> Mon, Sun, Sat, Fri, Thu, Wed, Tue, Mo…
$ qtr          <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
$ top10grossM  <dbl> 36.24, 50.93, 58.12, 56.67, 51.67, 55…
$ num1grossM   <dbl> 10.01, 16.44, 18.63, 17.04, 14.62, 16…

Function to Make Repeatable Plots

A good practice is to subdivide tasks to make short functions
Recall the area plot we discussed in Week 3
This Function modifies the data for the plot:

Code

```{r}
#|label: data mgmt for area plot
bom22_line_area_orig <- bom22 |>
  select(date, top10grossM, num1grossM) |>                  # select variables
  rename(`Top 10` = top10grossM, `No. 1` = num1grossM) |>   # rename for plot
  pivot_longer(cols=`Top 10`:`No. 1`,                       # reshape data  
               names_to = "type", values_to = "grossM") |>
  mutate(type=factor(type, levels=c("Top 10", "No. 1")))    # convert type of gross to a factor
```

Code

```{r}
#|label: data mgmt function for area plot
bom_line_area <- function(data_in){
  d_out <- data_in |>
  select(date, top10grossM, num1grossM) |>                  
  rename(`Top 10` = top10grossM, `No. 1` = num1grossM) |>   
  pivot_longer(cols=`Top 10`:`No. 1`,                       
               names_to = "type", values_to = "grossM") |>
  mutate(type=factor(type, levels=c("Top 10", "No. 1"))) 
  d_out
}

bom22_line_area <- bom_line_area(bom22)   # creates plot dataset for 2022
bom21_line_area <- bom_line_area(bom21)   # creates plot dataset for 2021
```

Function for Area Plot

Functions are very useful for plots so that you don’t have to keep recreating the code for the same data.
The only text that changes from year to year is the subtitle.

Code

```{r bom area plot code}
area_plt22_orig <- bom22_line_area |>                                  
  ggplot() +                                                           
  geom_area(aes(x=date, y=grossM, fill=type), size=1) +                
  theme_classic() + 
  scale_fill_manual(values=c("blue", "lightblue")) +   
  labs(x="Date", y = "Gross ($Mill)", fill="",
       title="Top 10 and No. 1 Movie Gross by Date", 
       subtitle="Jan. 1, 2022 - Dec. 31, 2022",
       caption="Data Source:www.boxoffice.mojo.com") + 
  theme(legend.position="bottom",
        legend.text = element_text(size = 12),
        plot.title = element_text(size = 20),
        axis.title = element_text(size=18),
        axis.text = element_text(size=15),
        plot.caption = element_text(size = 10),
        plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
```

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Display of saved plot, `area_plt22_orig`

Area Plot Function

Code

```{r}
#|label: area plot function

area_plt<- function(data_in, yr){
  data_in |>                                                
  ggplot() +                                                
  geom_area(aes(x=date, y=grossM, fill=type), size=1) +     
  theme_classic() + 
  scale_fill_manual(values=c("blue", "lightblue")) +   
  labs(x="Date", y = "Gross ($Mill)", fill="",
       title="Top 10 and No. 1 Movie Gross by Date", 
       subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr),
       caption="Data Source:www.boxoffice.mojo.com") + 
  theme(legend.position="bottom",
        legend.text = element_text(size = 12),
        plot.title = element_text(size = 20),
        axis.title = element_text(size=18),
        axis.text = element_text(size=15),
        plot.caption = element_text(size = 10),
        plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
}
```

Line Plot Function

Almost identical to Area Plot Function

Code

```{r}
#|label: line plot function

line_plt<- function(data_in, yr){
  data_in |>                                                    
  ggplot() +                                                    
  geom_line(aes(x=date, y=grossM, color=type), linewidth=1) +   
  theme_classic() + 
  scale_color_manual(values=c("blue", "lightblue")) +   
  labs(x="Date", y = "Gross ($Mill)", color="",
       title="Top 10 and No. 1 Movie Gross by Date", 
       subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr),
       caption="Data Source:www.boxoffice.mojo.com") + 
  theme(legend.position="bottom",
        legend.text = element_text(size = 12),
        plot.title = element_text(size = 20),
        axis.title = element_text(size=18),
        axis.text = element_text(size=15),
        plot.caption = element_text(size = 10),
        plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
}
```

Box Office Mojo 2022 - Area Plot

Code

```{r, fig.dim=c(14,8), fig.align='center'}
#|label: data and area plot 2022
bom22_line_area <- bom_line_area(bom22) # data formating function
area_plt(bom22_line_area, "2022")       # area plot function
```

Box Office Mojo 2022 - Line Plot

Code

```{r, fig.dim=c(14,8), fig.align='center'}
#|label: line plot 2022
line_plt(bom22_line_area, "2022") # line plot function (data formatted in chunk above)
```

Box Office Mojo 2021 - Line Plot

Code

```{r, fig.dim=c(14,8), fig.align='center'}
#|label: data and line plot 2021
bom21_line_area <- bom_line_area(bom21)  # data formatting function
line_plt(bom21_line_area, "2021")        # line plot function
```

Box Office Mojo 2021 - Area Plot

Code

```{r, fig.dim=c(14,8), fig.align='center'}
#|label: area plot 2021
area_plt(bom21_line_area, "2021") # area plot function (data formatted in previous chunk)
```

Preview of Next week after Quiz 1

Cleaning Messy Data from Box Office Mojo Website
Examining/Cleaning Bureau of Labor Statistics data
Writing functions to automate data cleaning
Joining data from multiple datasets
HW 4 will be introduced

Key Points from This Week

Review for Quiz 1

Review Practice Questions
Drop into Office Hours if you have additional questions.

Automating Data Management and Plots with Functions

Anatomy of a Function is always consistent
Functions are useful for repetitive tasks e.g. data from the same data source, but multiple years
Divide task into smaller tasks and create a function for each task
Fully develop and check code to complete tasks, then convert to function.

You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.

--- title: "Week 5" subtitle: "Cheat Sheets, AI, Functions, and Review" author: "Penelope Pooler Eisenbies" date: last-modified lightbox: true toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Housekeeping ```{r include=F} #|label: setup knitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunks options(scipen=100) # suppress scientific notation # install pacman if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra) # install and load required packages p_loaded() # verify loaded packages ``` ***Quiz 1 on Thursday 9/25*** - Weeks 1 - 4 (Lectures 1 - 8) - Quiz questions will be similar (but not identical) to Practice Questions - Mix of R datasets and imported datasets - I will provide R code to import data - Quiz Template and data files will be provided in Zipped project - Review Practice Questions, HW assignments, and Demo Videos <br> - **You will be required to download, unzip, and and save a project to your computer (not in Downloads), as part of Quiz 1.** ## R Online Resources - Some of what we have covered (Week 4 has a more complete review.): - R projects, file structure and Quarto files - Working with 'clean' data using the `dplyr` package - common commands: `read_csv`, `filter`, `select`, `slice`, `factor` - Augmenting these commands with operators such as `!`, `%in%`, `==` - Using pipes, `|>` to make data management more efficient - Reference links for R operators: - [**tutorialspoint**](https://www.tutorialspoint.com/r/r_operators.htm)\ - [**Quick-R**](https://www.statmethods.net/management/operators.html) - Or google `R Operators` - For R Markdown and `dplyr` commands there are R Cheat Sheets - [**Curated List of Text Resources for BUA 455**](https://docs.google.com/document/d/1qdqO7MTq7scYhFydkJuhA7JIUVQNldNXqMBOspXlNZk/edit?usp=sharing) ## Using AI to help you write R code - AI tools became use-able in the classroom in 2023. - My current AI of choice in **Copilot** for Windows. - **Chat-GPT** and **Gemini** on the Google platform are also good. - On the next slide I show the result of using copilot for Question 12. - Note that in this example I had to: - Let Copilot know what R dataset this is. ## ### AI Prompt for Practice Question 12 - Note that I added in the second line. - In Quizzes I will let you know the R dataset if that information is needed. - Students should also know which R datasets are being used from doing the practice questions ![](img/Quiz1_PQ12_AI_prompt.png){height="4in" fig-align="center"} ## ### AI Response for Practice Question 12 ![](img/Quiz1_PQ12_AI_response.png){height="6in" fig-align="center"} ## Recommendations for using AI - DO: use AI as a search engine to find code or correct code when you are stuck. - DO: use AI iteratively to build code by asking it one question at a time - Add suggested code to your file, test the code and then either modify question or ask a subsequent question. - DON'T: use AI in place of studing for the exam and plug exam questions into an AI application and expect it to work without your understanding of the question. - AI can be used during the tests, but it won't help you if you don't know what you are looking for or how to phrase the queries correctly. - I use AI to `test` my quiz questions to insure that they will not provide fully correct code. - AI can be helpful, but only if you understand the code provided and can modify it correctly. ## Creating a Function - Any task in R can be converted to a function. - If you are only doing something once or twice, this is not needed. - If you are doing the same tasks 4 or more times, this is very useful - Best Practice: - Develop and refine the code to complete your tasks - Subdivide the larger tasks into smaller shorter tasks ::: fragment #### Aanatomy of a Function: ::: ::: fragment ``` Function_Name <- function(input_1, input_2, etc){ output <- command 1 to do "stuff" to inputs |> command 2 to do "stuff" to inputs |> command 3 to do "stuff" to inputs |> etc. output # end with name of output so that it is "kicked out" of function } ``` ::: ```{r echo=F, eval=F, include=F} #|label: bom_cln_function bom_cln <- function(data_file, yr, out_file){ d <- read_csv(data_file, show_col_types = F, skip=11) |> select(1,4,7,9) |> rename("date" = "Date", "top10gross" = "Top 10 Gross", "num_releases" = "Releases", "num1gross" = "Gross") |> filter(!is.na(top10gross)) |> mutate(date = dmy(paste(date,yr)), top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T), top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |> as.numeric(), num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T), num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |> as.numeric()) |> write_csv(out_file) } bom_cln("data/box_office_mojo_2022.csv", 2022, "data/box_office_mojo_2022_tidy.csv") bom_cln("data/box_office_mojo_2021.csv", 2021, "data/box_office_mojo_2021_tidy.csv") bom_cln("data/box_office_mojo_2020.csv", 2020, "data/box_office_mojo_2020_tidy.csv") bom_cln("data/box_office_mojo_2019.csv", 2019, "data/box_office_mojo_2019_tidy.csv") bom_cln("data/box_office_mojo_2018.csv", 2018, "data/box_office_mojo_2018_tidy.csv") ``` ## Example and Review: - Code below includes preview of `lubridate` functions to create date, month, day, and quarter variables. ::: fragment ```{r} #|label: bom_import bom21_orig <- read_csv("data/box_office_mojo_2021_tidy.csv", show_col_types = F) |> mutate(date = ymd(date), # converts ymd date text to date var month = month(date, label = T, abbr = T), # creates month var from date var day = wday(date, label=T, abbr = T), # creates wkday var from date var qtr = quarter(date), # creates quarter var from date var num_releases = as.integer(num_releases), top10grossM = (top10gross/1000000) |> round(2), num1grossM = (num1gross/1000000) |> round(2)) ``` ::: - Below, `bom_basic` is a function that completes the tasks above: ::: fragment ```{r bom_import basic function} bom_basic <- function(data_file) { d_out <- read_csv(data_file, show_col_types = F) |> mutate(date = ymd(date), month = month(date, label = T, abbr = T), day = wday(date, label=T, abbr = T), qtr = quarter(date), num_releases = as.integer(num_releases), top10grossM = (top10gross/1000000) |> round(2), num1grossM = (num1gross/1000000) |> round(2)) d_out # outputs function results to screen or saved object name } ``` ::: ## What does `bom_basic` function do? :::::: columns ::: {.column width="48%"} ```{r} #|label: import with read_csv b21 <- read_csv("data/box_office_mojo_2021_tidy.csv", show_col_types = F) |> glimpse(width=40) ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ```{r} #|label: import with bom_basic function bom21 <- bom_basic("data/box_office_mojo_2021_tidy.csv") |> glimpse(width=40) ``` ::: :::::: ## Week 5 In-class Exercises - Q1 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Using `lubridate` commands we converted `date` to date format (if needed) and created `month` `day` and `qtr` variables from `date`. - By default, `month` and `day` are ordinal factor variables (`<ord>`). - What is the default data type for `qtr` (quarter)? ::: fragment A. character `<chr>` B. decimal (double precision) `<dbl>` C. factor `<fct>` D. integer `<int>` ::: ## Week 5 In-class Exercises - Q2 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Here is the line that creates `qtr` within the mutate statement. The `quarter` command is part of the `lubridate` package: - `qtr = quarter(date)` ::: fragment Fill in the blank to convert this variable to a factor variable as you create it: ::: - `qtr = _____(quarter(date))` ## Function Demonstration - Multiple Years - Once function code is developed and tested, we can import 2, or 5, or even 10 data sets very efficiently. ::: fragment ```{r} #|label: import all 5 datasets bom22 <- bom_basic("data/box_office_mojo_2022_tidy.csv") bom21 <- bom_basic("data/box_office_mojo_2021_tidy.csv") bom20 <- bom_basic("data/box_office_mojo_2020_tidy.csv") bom19 <- bom_basic("data/box_office_mojo_2019_tidy.csv") bom18 <- bom_basic("data/box_office_mojo_2018_tidy.csv") |> glimpse( width=60) ``` ::: ## Function to Make Repeatable Plots - A good practice is to subdivide tasks to make short functions - Recall the area plot we discussed in Week 3 - This Function modifies the data for the plot: ::: fragment ```{r} #|label: data mgmt for area plot bom22_line_area_orig <- bom22 |> select(date, top10grossM, num1grossM) |> # select variables rename(`Top 10` = top10grossM, `No. 1` = num1grossM) |> # rename for plot pivot_longer(cols=`Top 10`:`No. 1`, # reshape data names_to = "type", values_to = "grossM") |> mutate(type=factor(type, levels=c("Top 10", "No. 1"))) # convert type of gross to a factor ``` <br> ```{r} #|label: data mgmt function for area plot bom_line_area <- function(data_in){ d_out <- data_in |> select(date, top10grossM, num1grossM) |> rename(`Top 10` = top10grossM, `No. 1` = num1grossM) |> pivot_longer(cols=`Top 10`:`No. 1`, names_to = "type", values_to = "grossM") |> mutate(type=factor(type, levels=c("Top 10", "No. 1"))) d_out } bom22_line_area <- bom_line_area(bom22) # creates plot dataset for 2022 bom21_line_area <- bom_line_area(bom21) # creates plot dataset for 2021 ``` ::: ## Function for Area Plot - Functions are very useful for plots so that you don't have to keep recreating the code for the same data. - The only text that changes from year to year is the subtitle. ::: fragment ```{r bom area plot code} area_plt22_orig <- bom22_line_area |> ggplot() + geom_area(aes(x=date, y=grossM, fill=type), size=1) + theme_classic() + scale_fill_manual(values=c("blue", "lightblue")) + labs(x="Date", y = "Gross ($Mill)", fill="", title="Top 10 and No. 1 Movie Gross by Date", subtitle="Jan. 1, 2022 - Dec. 31, 2022", caption="Data Source:www.boxoffice.mojo.com") + theme(legend.position="bottom", legend.text = element_text(size = 12), plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) ``` ::: ## ### Display of saved plot, `area_plt22_orig` ```{r display of area plot, echo=F, fig.dim=c(14,8), fig.align='center'} area_plt22_orig ``` ## Area Plot Function ```{r} #|label: area plot function area_plt<- function(data_in, yr){ data_in |> ggplot() + geom_area(aes(x=date, y=grossM, fill=type), size=1) + theme_classic() + scale_fill_manual(values=c("blue", "lightblue")) + labs(x="Date", y = "Gross ($Mill)", fill="", title="Top 10 and No. 1 Movie Gross by Date", subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr), caption="Data Source:www.boxoffice.mojo.com") + theme(legend.position="bottom", legend.text = element_text(size = 12), plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) } ``` ## Line Plot Function Almost identical to Area Plot Function ```{r} #|label: line plot function line_plt<- function(data_in, yr){ data_in |> ggplot() + geom_line(aes(x=date, y=grossM, color=type), linewidth=1) + theme_classic() + scale_color_manual(values=c("blue", "lightblue")) + labs(x="Date", y = "Gross ($Mill)", color="", title="Top 10 and No. 1 Movie Gross by Date", subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr), caption="Data Source:www.boxoffice.mojo.com") + theme(legend.position="bottom", legend.text = element_text(size = 12), plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) } ``` ## Box Office Mojo 2022 - Area Plot ```{r, fig.dim=c(14,8), fig.align='center'} #|label: data and area plot 2022 bom22_line_area <- bom_line_area(bom22) # data formating function area_plt(bom22_line_area, "2022") # area plot function ``` ## Box Office Mojo 2022 - Line Plot ```{r, fig.dim=c(14,8), fig.align='center'} #|label: line plot 2022 line_plt(bom22_line_area, "2022") # line plot function (data formatted in chunk above) ``` ## Box Office Mojo 2021 - Line Plot ```{r, fig.dim=c(14,8), fig.align='center'} #|label: data and line plot 2021 bom21_line_area <- bom_line_area(bom21) # data formatting function line_plt(bom21_line_area, "2021") # line plot function ``` ## Box Office Mojo 2021 - Area Plot ```{r, fig.dim=c(14,8), fig.align='center'} #|label: area plot 2021 area_plt(bom21_line_area, "2021") # area plot function (data formatted in previous chunk) ``` ## Preview of Next week after Quiz 1 :::::: columns ::: {.column width="48%"} - Cleaning Messy Data from Box Office Mojo Website - Examining/Cleaning Bureau of Labor Statistics data - Writing functions to automate data cleaning - Joining data from multiple datasets - HW 4 will be introduced ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ![](img/owl.png){fig-align="center"} ::: :::::: ## ### Key Points from This Week ::: fragment **Review for Quiz 1** ::: - Review Practice Questions - Drop into Office Hours if you have additional questions. ::: fragment **Automating Data Management and Plots with Functions** ::: - Anatomy of a Function is always consistent - Functions are useful for repetitive tasks e.g. data from the same data source, but multiple years - Divide task into smaller tasks and create a function for each task - Fully develop and check code to complete tasks, then convert to function. ::: fragment You may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.** :::

Housekeeping

R Online Resources

Using AI to help you write R code

AI Prompt for Practice Question 12

AI Response for Practice Question 12

Recommendations for using AI

Creating a Function

Aanatomy of a Function:

Example and Review:

What does bom_basic function do?

Week 5 In-class Exercises - Q1

Week 5 In-class Exercises - Q2

Function Demonstration - Multiple Years

Function to Make Repeatable Plots

Function for Area Plot

Display of saved plot, area_plt22_orig

Area Plot Function

Line Plot Function

Box Office Mojo 2022 - Area Plot

Box Office Mojo 2022 - Line Plot

Box Office Mojo 2021 - Line Plot

Box Office Mojo 2021 - Area Plot

Preview of Next week after Quiz 1

Key Points from This Week

What does `bom_basic` function do?

Display of saved plot, `area_plt22_orig`