paraphrase the question to use the R dataset name.
provide additional information after the first AI response to specify using select in the tidyverse package.
Recommendations for using AI
DO: use AI as a search engine to find code or correct code when you are stuck.
DO: use AI iteratively to build code by asking it one question at a time
Add suggested code to your file, test the code and then either modify question or ask a subsequent question.
DON’T: use AI in place of studing for the exam and plug exam questions into an AI application and expect it to work without your understanding of the question.
AI can be used during the tests, but it won’t help you if you don’t know what you are looking for or how to phrase the queries correctly.
I use AI to test my quiz questions to insure that they will not provide fully correct code.
AI can be helpful, but only if you understand the code provided and can modify it correctly.
Creating a Function
Any task in R can be converted to a function.
If you are only doing something once or twice, this is not needed.
If you are doing the same tasks 4 or more times, this is very useful
Best Practice:
Develop and refine the code to complete your tasks
Subdivide the larger tasks into smaller shorter tasks
Aanatomy of a Function:
Function_Name <- function(input_1, input_2, etc){
output <- command 1 to do "stuff" to inputs |>
command 2 to do "stuff" to inputs |>
command 3 to do "stuff" to inputs |> etc.
output # end with name of output so that it is "kicked out" of function
}
Example and Review:
Code below includes preview of lubridate functions to create date, month, day, and quarter variables.
Code
```{r}#|label: bom_importbom21_orig <- read_csv("data/box_office_mojo_2021_tidy.csv", show_col_types = F) |> mutate(date = ymd(date), # converts ymd date text to date var month = month(date, label = T, abbr = T), # creates month var from date var day = wday(date, label=T, abbr = T), # creates wkday var from date var qtr = quarter(date), # creates quarter var from date var num_releases = as.integer(num_releases), top10grossM = (top10gross/1000000) |> round(2), num1grossM = (num1gross/1000000) |> round(2))```
Below, bom_basic is a function that completes the tasks above:
```{r, fig.dim=c(14,8), fig.align='center'}#|label: data and area plot 2022bom22_line_area <- bom_line_area(bom22) # data formating functionarea_plt(bom22_line_area, "2022") # area plot function```
Box Office Mojo 2022 - Line Plot
Code
```{r, fig.dim=c(14,8), fig.align='center'}#|label: line plot 2022line_plt(bom22_line_area, "2022") # line plot function (data formatted in chunk above)```
Box Office Mojo 2021 - Line Plot
Code
```{r, fig.dim=c(14,8), fig.align='center'}#|label: data and line plot 2021bom21_line_area <- bom_line_area(bom21) # data formatting functionline_plt(bom21_line_area, "2021") # line plot function```
Box Office Mojo 2021 - Area Plot
Code
```{r, fig.dim=c(14,8), fig.align='center'}#|label: area plot 2021area_plt(bom21_line_area, "2021") # area plot function (data formatted in previous chunk)```
Preview of Next week after Quiz 1
Cleaning Messy Data from Box Office Mojo Website
Examining/Cleaning Bureau of Labor Statistics data
Writing functions to automate data cleaning
Joining data from multiple datasets
HW 4 will be introduced
Key Points from This Week
Review for Quiz 1
Review Practice Questions
Drop into Office Hours if you have additional questions.
Automating Data Management and Plots with Functions
Anatomy of a Function is always consistent
Functions are useful for repetitive tasks e.g. data from the same data source, but multiple years
Divide task into smaller tasks and create a function for each task
Fully develop and check code to complete tasks, then convert to function.
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.
Source Code
---title: "Week 5"subtitle: "Cheat Sheets, AI, Functions, and Review"author: "Penelope Pooler Eisenbies"date: last-modifiedtoc: truetoc-depth: 3toc-location: lefttoc-title: "Table of Contents"toc-expand: 1format: html: code-line-numbers: true code-fold: true code-tools: trueexecute: echo: fenced---## Housekeeping```{r include=F}#|label: setupknitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunksoptions(scipen=100) # suppress scientific notation # install pacman if neededif (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra) # install and load required packages p_loaded() # verify loaded packages```***Quiz 1 on Thursday 2/13***- Weeks 1 - 4 (Lectures 1 - 8) - Quiz questions will be similar (but not identical) to Practice Questions - Mix of R datasets and imported datasets - I will provide R code to import data - Quiz Template and data files will be provided in Zipped project - Review Practice Questions, HW assignments, and Demo Videos<br>- **You will be required to download, unzip, and and save a project to your computer (not in Downloads), as part of Quiz 1.**## R Online Resources- Some of what we have covered (Week 4 has a more complete review.): - R projects, file structure and Quarto files - Working with 'clean' data using the `dplyr` package - common commands: `read_csv`, `filter`, `select`, `slice`, `factor` - Augmenting these commands with operators such as `!`, `%in%`, `==` - Using pipes, `|>` to make data management more efficient- Reference links for R operators: - [**tutorialspoint**](https://www.tutorialspoint.com/r/r_operators.htm)\ - [**Quick-R**](https://www.statmethods.net/management/operators.html) - Or google `R Operators`- For R Markdown and `dplyr` commands there are R Cheat Sheets - [**Curated List of Text Resources for BUA 455**](https://docs.google.com/document/d/1qdqO7MTq7scYhFydkJuhA7JIUVQNldNXqMBOspXlNZk/edit?usp=sharing)## Using AI to help you write R code- AI tools became use-able in the classroom in 2023.- My current AI of choice in **Co-Pilot** for Windows.- **Chat-GPT** and **Gemini** on the Google platform are also good. - Here is an example of [using **Co-Pilot** to answer Practice Question 12.](https://copilot.microsoft.com/sl/emLAL7jU2zA) - [Here's a PDF of this same example](https://drive.google.com/file/d/16R7CSJ8pWWqw7Dh5zfM77jDEOPfzB6Ku/view?usp=sharing)- Note that in this example I had to: - paraphrase the question to use the R dataset name. - provide additional information after the first AI response to specify using `select` in the `tidyverse` package.## Recommendations for using AI- DO: use AI as a search engine to find code or correct code when you are stuck.- DO: use AI iteratively to build code by asking it one question at a time - Add suggested code to your file, test the code and then either modify question or ask a subsequent question.- DON'T: use AI in place of studing for the exam and plug exam questions into an AI application and expect it to work without your understanding of the question. - AI can be used during the tests, but it won't help you if you don't know what you are looking for or how to phrase the queries correctly.- I use AI to `test` my quiz questions to insure that they will not provide fully correct code.- AI can be helpful, but only if you understand the code provided and can modify it correctly.## Creating a Function- Any task in R can be converted to a function.- If you are only doing something once or twice, this is not needed.- If you are doing the same tasks 4 or more times, this is very useful- Best Practice: - Develop and refine the code to complete your tasks - Subdivide the larger tasks into smaller shorter tasks::: fragment#### Aanatomy of a Function::::::: fragment``` Function_Name <- function(input_1, input_2, etc){ output <- command 1 to do "stuff" to inputs |> command 2 to do "stuff" to inputs |> command 3 to do "stuff" to inputs |> etc. output # end with name of output so that it is "kicked out" of function}```:::```{r echo=F, eval=F, include=F}#|label: bom_cln_functionbom_cln <- function(data_file, yr, out_file){ d <- read_csv(data_file, show_col_types = F, skip=11) |> select(1,4,7,9) |> rename("date" = "Date", "top10gross" = "Top 10 Gross", "num_releases" = "Releases", "num1gross" = "Gross") |> filter(!is.na(top10gross)) |> mutate(date = dmy(paste(date,yr)), top10gross = gsub(pattern="$", replacement="", x=top10gross, fixed=T), top10gross = gsub(pattern=",", replacement="", x=top10gross, fixed=T) |> as.numeric(), num1gross = gsub(pattern="$", replacement="", x=num1gross, fixed=T), num1gross = gsub(pattern=",", replacement="", x=num1gross, fixed=T) |> as.numeric()) |> write_csv(out_file)}bom_cln("data/box_office_mojo_2022.csv", 2022, "data/box_office_mojo_2022_tidy.csv")bom_cln("data/box_office_mojo_2021.csv", 2021, "data/box_office_mojo_2021_tidy.csv")bom_cln("data/box_office_mojo_2020.csv", 2020, "data/box_office_mojo_2020_tidy.csv")bom_cln("data/box_office_mojo_2019.csv", 2019, "data/box_office_mojo_2019_tidy.csv")bom_cln("data/box_office_mojo_2018.csv", 2018, "data/box_office_mojo_2018_tidy.csv")```## Example and Review:- Code below includes preview of `lubridate` functions to create date, month, day, and quarter variables.::: fragment```{r}#|label: bom_importbom21_orig <-read_csv("data/box_office_mojo_2021_tidy.csv", show_col_types = F) |>mutate(date =ymd(date), # converts ymd date text to date varmonth =month(date, label = T, abbr = T), # creates month var from date varday =wday(date, label=T, abbr = T), # creates wkday var from date varqtr =quarter(date), # creates quarter var from date varnum_releases =as.integer(num_releases),top10grossM = (top10gross/1000000) |>round(2),num1grossM = (num1gross/1000000) |>round(2))```:::- Below, `bom_basic` is a function that completes the tasks above:::: fragment```{r bom_import basic function}bom_basic <- function(data_file) { d_out <- read_csv(data_file, show_col_types = F) |> mutate(date = ymd(date), month = month(date, label = T, abbr = T), day = wday(date, label=T, abbr = T), qtr = quarter(date), num_releases = as.integer(num_releases), top10grossM = (top10gross/1000000) |> round(2), num1grossM = (num1gross/1000000) |> round(2)) d_out # outputs function results to screen or saved object name}```:::## What does `bom_basic` function do?:::::: columns::: {.column width="48%"}```{r}#|label: import with read_csvb21 <-read_csv("data/box_office_mojo_2021_tidy.csv",show_col_types = F) |>glimpse(width=40)```:::::: {.column width="4%"}:::::: {.column width="48%"}```{r}#|label: import with bom_basic functionbom21 <-bom_basic("data/box_office_mojo_2021_tidy.csv") |>glimpse(width=40)```:::::::::## bua455s25 Week 5 In-class Exercises - Q1 bua455s25***Session ID: bua455s25***Using `lubridate` commands we converted `date` to date format (if needed) and created `month``day` and `qtr` variables from `date`.- By default, `month` and `day` are ordinal factor variables (`<ord>`).- What is the default data type for `qtr` (quarter)?::: fragmentA. character `<chr>`B. decimal (double precision) `<dbl>`C. factor `<fct>`D. integer `<int>`:::## bua455s25 Week 5 In-class Exercises - Q2 bua455s25***Session ID: bua455s25***Here is the line that creates `qtr` within the mutate statement.The `quarter` command is part of the `lubridate` package:- `qtr = quarter(date)`::: fragmentFill in the blank to convert this variable to a factor variable as you create it::::- `qtr = _____(quarter(date))`## Function Demonstration - Multiple Years- Once function code is developed and tested, we can import 2, or 5, or even 10 data sets very efficiently.::: fragment```{r}#|label: import all 5 datasetsbom22 <-bom_basic("data/box_office_mojo_2022_tidy.csv")bom21 <-bom_basic("data/box_office_mojo_2021_tidy.csv")bom20 <-bom_basic("data/box_office_mojo_2020_tidy.csv")bom19 <-bom_basic("data/box_office_mojo_2019_tidy.csv")bom18 <-bom_basic("data/box_office_mojo_2018_tidy.csv") |>glimpse( width=60)```:::## Function to Make Repeatable Plots- A good practice is to subdivide tasks to make short functions- Recall the area plot we discussed in Week 3- This Function modifies the data for the plot:::: fragment```{r}#|label: data mgmt for area plotbom22_line_area_orig <- bom22 |>select(date, top10grossM, num1grossM) |># select variablesrename(`Top 10`= top10grossM, `No. 1`= num1grossM) |># rename for plotpivot_longer(cols=`Top 10`:`No. 1`, # reshape data names_to ="type", values_to ="grossM") |>mutate(type=factor(type, levels=c("Top 10", "No. 1"))) # convert type of gross to a factor```<br>```{r}#|label: data mgmt function for area plotbom_line_area <-function(data_in){ d_out <- data_in |>select(date, top10grossM, num1grossM) |>rename(`Top 10`= top10grossM, `No. 1`= num1grossM) |>pivot_longer(cols=`Top 10`:`No. 1`, names_to ="type", values_to ="grossM") |>mutate(type=factor(type, levels=c("Top 10", "No. 1"))) d_out}bom22_line_area <-bom_line_area(bom22) # creates plot dataset for 2022bom21_line_area <-bom_line_area(bom21) # creates plot dataset for 2021```:::## Function for Area Plot- Functions are very useful for plots so that you don't have to keep recreating the code for the same data.- The only text that changes from year to year is the subtitle.::: fragment```{r bom area plot code}area_plt22_orig <- bom22_line_area |> ggplot() + geom_area(aes(x=date, y=grossM, fill=type), size=1) + theme_classic() + scale_fill_manual(values=c("blue", "lightblue")) + labs(x="Date", y = "Gross ($Mill)", fill="", title="Top 10 and No. 1 Movie Gross by Date", subtitle="Jan. 1, 2022 - Dec. 31, 2022", caption="Data Source:www.boxoffice.mojo.com") + theme(legend.position="bottom", legend.text = element_text(size = 12), plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::## ### Display of saved plot, `area_plt22_orig````{r display of area plot, echo=F, fig.dim=c(14,8), fig.align='center'}area_plt22_orig```## Area Plot Function```{r}#|label: area plot functionarea_plt<-function(data_in, yr){ data_in |>ggplot() +geom_area(aes(x=date, y=grossM, fill=type), size=1) +theme_classic() +scale_fill_manual(values=c("blue", "lightblue")) +labs(x="Date", y ="Gross ($Mill)", fill="",title="Top 10 and No. 1 Movie Gross by Date", subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr),caption="Data Source:www.boxoffice.mojo.com") +theme(legend.position="bottom",legend.text =element_text(size =12),plot.title =element_text(size =20),axis.title =element_text(size=18),axis.text =element_text(size=15),plot.caption =element_text(size =10),plot.background =element_rect(colour ="darkgrey", fill=NA, linewidth=2))}```## Line Plot FunctionAlmost identical to Area Plot Function```{r}#|label: line plot functionline_plt<-function(data_in, yr){ data_in |>ggplot() +geom_line(aes(x=date, y=grossM, color=type), linewidth=1) +theme_classic() +scale_color_manual(values=c("blue", "lightblue")) +labs(x="Date", y ="Gross ($Mill)", color="",title="Top 10 and No. 1 Movie Gross by Date", subtitle=paste("Jan. 1,", yr,"- Dec. 31,", yr),caption="Data Source:www.boxoffice.mojo.com") +theme(legend.position="bottom",legend.text =element_text(size =12),plot.title =element_text(size =20),axis.title =element_text(size=18),axis.text =element_text(size=15),plot.caption =element_text(size =10),plot.background =element_rect(colour ="darkgrey", fill=NA, linewidth=2))}```## Box Office Mojo 2022 - Area Plot```{r, fig.dim=c(14,8), fig.align='center'}#|label: data and area plot 2022bom22_line_area <- bom_line_area(bom22) # data formating functionarea_plt(bom22_line_area, "2022") # area plot function```## Box Office Mojo 2022 - Line Plot```{r, fig.dim=c(14,8), fig.align='center'}#|label: line plot 2022line_plt(bom22_line_area, "2022") # line plot function (data formatted in chunk above)```## Box Office Mojo 2021 - Line Plot```{r, fig.dim=c(14,8), fig.align='center'}#|label: data and line plot 2021bom21_line_area <- bom_line_area(bom21) # data formatting functionline_plt(bom21_line_area, "2021") # line plot function```## Box Office Mojo 2021 - Area Plot```{r, fig.dim=c(14,8), fig.align='center'}#|label: area plot 2021area_plt(bom21_line_area, "2021") # area plot function (data formatted in previous chunk)```## Preview of Next week after Quiz 1:::::: columns::: {.column width="48%"}- Cleaning Messy Data from Box Office Mojo Website- Examining/Cleaning Bureau of Labor Statistics data- Writing functions to automate data cleaning- Joining data from multiple datasets- HW 4 will be introduced:::::: {.column width="4%"}:::::: {.column width="48%"}{fig-align="center"}:::::::::## ### Key Points from This Week::: fragment**Review for Quiz 1**:::- Review Practice Questions- Drop into Office Hours if you have additional questions.::: fragment**Automating Data Management and Plots with Functions**:::- Anatomy of a Function is always consistent- Functions are useful for repetitive tasks e.g. data from the same data source, but multiple years- Divide task into smaller tasks and create a function for each task- Fully develop and check code to complete tasks, then convert to function.::: fragmentYou may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**:::