Dealing with dates used to be much more difficult prior to development of the lubridate package.
Dates are still troublesome in other software environments.
Below we create a date variable from the provided character variable, create other variables, examine data, and export the dataset with write_csv.
Code
```{r}#|label: date example with lubridatebom23 <- bom23 |> mutate(date = dmy(paste(Date,"2023")), # year is required # we paste it (add it as text) to each date month = month(date, label=T, abbr=T), # month shown as 3 letter abbr. day = wday(date, label=T, abbr=T), # weekday shown as 3 letter abbr. quart = quarter(date)) |> # quarter shown as number select(date, month, day, quart, top10gross:num1) |> # select and reorder variables glimpse() |> # examine data write_csv("data/Box_Office_Mojo_Week3_HW3.csv") # export using write_csv```
Notice that in the prior chunk, we use the command read_csv
True or False:
read_csv and read.csv are the same and can be used interchangeably to import data.
Hint: Here are three ways to determine this:
R help: In console type ?read_csv and/or type ?read.csv and look through documentation
Google R read_csv and read.csv
Ask ‘Chat GPT’, ‘Copilot’, or another AI search engine.
Note: R help files are sometimes hard to decipher and Googling often requires time and effort but both are excellent resources. AI search engines are getting better, but are not always 100% accurate.
[1] Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
[1] Su Sa F Th W T M
Levels: M T W Th F Sa Su
month monthF day wkdayF
Length:365 Jan : 31 Length:365 M :52
Class :character Mar : 31 Class :character T :52
Mode :character May : 31 Mode :character W :52
Jul : 31 Th:52
Aug : 31 F :52
Oct : 31 Sa:52
(Other):179 Su:53
# A tibble: 6 × 10
date monthF wkdayF quart num_releases num1gross num1grossM top10gross
<date> <fct> <fct> <dbl> <int> <dbl> <dbl> <dbl>
1 2023-12-31 Dec Su 4 43 5208897 5.21 23078184
2 2023-12-30 Dec Sa 4 44 8637841 8.64 40050370
3 2023-12-29 Dec F 4 44 8630268 8.63 37348409
4 2023-12-28 Dec Th 4 46 7988504 7.99 33261609
5 2023-12-27 Dec W 4 45 8135639 8.14 33892628
6 2023-12-26 Dec T 4 45 8970413 8.97 41788862
# ℹ 2 more variables: top10grossM <dbl>, num1pct <dbl>
Week 3 In-class Exercises - Q2
Session ID: bua455s25
This is BB Question 2 in HW 3
The correct command used to convert a numeric variable to an integer variable is
____().
When you glimpse the data after Part 2 (Chunk 3) in HW 3, the type for the num_releases variable is shown as
<____> instead of <dbl>.
Grouping and Filtering Data
We can filter data by value within each group.
R command group_by allows us to group data before we filter.
Data are filtered by value WITHIN each specified group
Ungrouping data afterwards using ungroup is not required, but often helpful.
The example below is not used in the subsequent summary but can be very useful.
Code
```{r}#|label: filter to last day of monthmojo_23_mnth_end <- mojo_23_mod |> select(date, monthF, top10grossM) |> group_by(monthF) |> # doesn't change data appearance filter(date == max(date)) |> ungroup() |> # ungroup not required but helpful glimpse()```
Rows: 12
Columns: 3
$ date <date> 2023-12-31, 2023-11-30, 2023-10-31, 2023-09-30, 2023-08-3…
$ monthF <fct> Dec, Nov, Oct, Sep, Aug, Jul, Jun, May, Apr, Mar, Feb, Jan
$ top10grossM <dbl> 23.08, 5.28, 9.82, 30.32, 5.27, 30.83, 41.92, 14.13, 27.13…
Grouping and Summarizing Data
We will summarize data and then reshape it for a summary table.
R commands group_by and summarize allow us to summarize the data by category
When summarizing data, it is easier to select the variables you want first.
Plan what you want to do
Code
```{r group and summarize}mojo_23_smry <- mojo_23_mod |> select(monthF, wkdayF, top10grossM) |> group_by(monthF, wkdayF) |> # doesn't change data appearance summarize(avg_top10gross = mean(top10grossM, na.rm=T), mdn_top10gross = median(top10grossM, na.rm=T), max_top10gross = max(top10grossM, na.rm=T)) |> ungroup() |> glimpse() # ungroup not required but helpful```
`summarise()` has grouped output by 'monthF'. You can override using the
`.groups` argument.
# A tibble: 10 × 3
Month Day max_top10gross
<fct> <chr> <dbl>
1 Jan M 32.6
2 Jan T 17.0
3 Jan W 12.1
4 Jan Th 10.9
5 Jan F 31.0
6 Jan Sa 44.6
7 Jan Su 36.2
8 Feb M 21.2
9 Feb T 12.4
10 Feb W 6.49
```{r}#|label: area plot codearea_plt <- mojo_23_line_area |> ggplot() + # changed to geom_area geom_area(aes(x=date, y=grossM, fill=type), size=1) + # changed color to fill theme_classic() + theme(legend.position="bottom") + scale_fill_manual(values=c("blue", "lightblue")) + # changed color to fill labs(x="Date", y = "Gross ($Mill)", fill="", # changed color to fill title="Top 10 and No. 1 Movie Gross by Date", subtitle="Jan. 1, 2023 - Dec. 31, 2023", caption="Data Source:www.boxoffice.mojo.com") + theme(plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```
Week 3 In-class Exercises
Lecture 6 - Q1 - NOT ON PointSolutions
In class we will practice:
Running chunks and exporting a table.
Preview for 1 Question in Quiz 1 where you will:
Select variables from a provided dataset
Group and summarize data
Export a summary table as a .csv file and submit it.
Instructions for In-class Exercise
Save Week 3 R project to your computer.
Open this project by clicking on .Rproj file.
Open .Rmd file within open R project.
Run all chunks above this exercise.
Modify the following chunk below to:
Round all values in columns 2-4 of mojo_23_fall_wknd to 1 decimal place using round.
Export mojo_23_fall_wknd as a .csv file with your name.
Submit this .csv file with your name in the Week 3 In-class Exercise in the In-class Exercises folder on Blackboard.
NOTE: This counts as part of your in-class participation for the Week 3 lectures (due Fri. at midnight).
R Code Chunk for In-class Exercise
Remove , eval=F from chunk header. This will allow code in chunk to run when it is rendered.
Remove the # and complete round command to round numeric columns (columns 2 - 4) to 1 decimal place.
Choose EITHER of the write_csv commands and edit it so dataset will be exported to the data folder with your name.
Delete write_csv command you don’t edit or put # symbols in front of it.
Submit .csv file with your name in the filename
Code
```{r eval = F}#|label: round and export summary datasetmojo_23_fall_wknd |> glimpse() # examine data with glimpse# round columns 2, 3 and 4 only# export summary dataset using write_csv without pipingwrite_csv(mojo_23_fall_wknd, "data/Movie_Gross_Fall_2023_Weekends_FirstName_Last_Name.csv")# export summary dataset using write_csv with pipingmojo_23_fall_wknd |> write_csv("data/Movie_Gross_Fall_2023_Weekends_FirstName_Last_Name.csv")```
Week 3 In-class Exercises
Lecture 6 - Q2 - NOT ON PointSolutions
Practice:
If all the columns in a dataset are numeric, you can round the whole dataset at once with the command round(<name of dataset>).
Why wouldn’t that work for the dataset in the previous exercise, mojo_23_fall_wknd?
Hint: To answer this question, you are encourage to
try running the command round(mojo_23_fall_wknd).
examine the data using glimpse.
Week 3 In-class Exercises - Q5
Session ID: bua455s25
Which of the following commands should NOT be used within a mutate command or a summarize command?
as.integer
factor
mean
filter
HW 3 Introduction
Purpose
This assignment will give you experience with:
Creating an R Project Directory folder with data and img folders. (Review)
Creating, saving, using a Quarto file (Review)
Importing data
Rendering a Quarto file to create an HTML file (Review)
Creating a README file (Review)
Using the dplyr commands along with commands to reshape and summarize data
Creating plots with some formatting
Week 3 In-class Exercises - Q6
Session ID: bua455s25
In HW 3, you will group the data by quarter and week day. This is Part 4 of HW 3 and is very similar to the group_by and summarize code covered in Lecture 5.
This is BB Question 3 in HW 3
Your grouped and summarized dataset, mojo_qtr_smry, has
____ rows and
____ columns
____ summary numeric variables
Key Points from This Week
Summarizing Data by Group
Use group_by to specify grouping variables followed by summarize
Within summarize specify type, .e.g. mean, median, max, etc.
Reshaping Data for Different Purposes
pivot_wider is useful for display tables
pivot_longer is useful for plots
Plotting Data
grouped barplots (stacked and side-by-side)
line plots and area plots
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.
Source Code
---title: "Week 3"subtitle: "Reshaping and Summarizing Data"author: "Penelope Pooler Eisenbies"date: last-modifiedtoc: truetoc-depth: 3toc-location: lefttoc-title: "Table of Contents"toc-expand: 1format: html: code-line-numbers: true code-fold: true code-tools: trueexecute: echo: fenced---```{r include=F}#|label: setupknitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunksoptions(scipen=100) # suppress scientific notation # install pacman if neededif (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra) # install and load required packages p_loaded() # verify loaded packages```#### Reminders from Week 2 and HW 2#### HW 2 is Due Wednesday, 1/29/2025::: fragment**`dplyer` commands:**:::- `select` - used to select variables (columns) of a dataset- `slice` - used to select rows by row number- `filter` - used to filter data rows by values of a variable- `mutate` - to create or transform a variable::: fragment**`ggplot` introduction:**:::- basic syntax and aesthetics statements (`aes`)- creating a basic boxplot (`geom_boxplot`) or scatterplot (`geom_point`)- removing default background by modifying the theme- adding a third categorical variable to color the data by category## Reordering variables- In class and HW 2 we used `select` to reorder variables.- Another option in the `dplyr` package is [`relocate`](https://dplyr.tidyverse.org/reference/relocate.html)::::::: fragment:::::: columns::: {.column width="48%"}```{r}#|label: starwars numeric vars firstmy_starwars <- starwars |>select(1:11) |>relocate(where(is.numeric)) |>glimpse(width=40)```:::::: {.column width="4%"}:::::: {.column width="48%"}```{r}#|label: starwars character vars firstmy_starwars <- starwars |>select(1:11) |>relocate(where(is.character)) |>glimpse(width=40)```::::::::::::::::## New Skills in Week 3 (and HW 3)- Importing a 'clean' dataset - After Quiz 1 we'll cover how to clean 'messy' data- Creating a character or factor variable- Coercing data to be a new data type - e.g. character to numeric- Grouping, summarizing, and filtering data- Reshaping data for a summary table **OR** reshaping data for a plot## Preview of 'cleaning' messy- This week, we will introduce data from [Box Office Mojo](https://www.boxofficemojo.com/)- We will work with the cleaned (usable data)- First, a quick preview of one way to acquire and clean data with no `download` option. - These are proprietary data, but they can be used for educational purposes according to the fair use doctrine of the U.S. copyright statute.- Steps: - Select data from website and save as .csv file. - Examine raw 'messy\` data in .csv file. - Remove non-data rows at the top with skip. - Select variables and filter data rows. - Remove nuisance characters like `$` and `,`. - Clean and convert date information variables, if present. - Export and save a clean dataset.## ::::::::: panel-tabset### [Website]{style="color:blue;"}Online Data are often formatted for viewing, not using.Details that make online data viewing easier, have to be removed for data management.{fig-align="center"} [Data Source: Box Office Mojo](https://www.boxofficemojo.com/daily/2023/?view=year)### [Raw Data (.csv)]{style="color:blue;"}Copying data from a website and saving them as a .csv file (CSV UTF-8) removes most of the formatting, but data cleaning is still required.{fig-align="center"}### [Import,Select,Filter]{style="color:blue;"}- `read_csv` imports the raw data and skips the first 11 rows (above the var names).- `filter` is used to filter out rows that don't contain data.- `select` is used to select only the variables we need.- `rename` (new command) is used to make the variable names easier to work with.- `head` is one of many options for examining the data.:::: fragment::: r-fit-text```{r}#|label: import, select, filter, renamebom23 <-read_csv("data/box_office_mojo_2023.csv", skip=11, show_col_types =FALSE) |>filter(!is.na(Day)) |>select(Date, `Top 10 Gross`, Gross, Releases, `#1 Release`) |>rename(top10gross =`Top 10 Gross`, num_releases=Releases, num1gross=Gross, num1 =`#1 Release`) head(bom23)```:::::::### [Clean Numeric Data]{style="color:blue;"}- The two **Gross** variables both contained `$` and `,` symbols that were removed with `gsub` and `across`.- Each variable was then converted to numeric with `as.numeric`.:::: fragment::: r-fit-text```{r}#|label: clean numeric variablesbom23 <- bom23 |>mutate(across(.cols=top10gross:num1gross, ~gsub(pattern="$", replacement="", fixed=T, .)), # removes $ from 2 varsacross(.cols=top10gross:num1gross, ~gsub(pattern=",", replacement="", fixed=T, .)) |># removes , from 2 varsmutate_at(vars(top10gross,num1gross), as.numeric)) # converts to numerichead(bom23)```:::::::### [Dates]{style="color:blue;"}- Dealing with dates used to be much more difficult prior to development of the [lubridate](https://lubridate.tidyverse.org/) package. - Dates are still troublesome in other software environments.- Below we create a date variable from the provided character variable, create other variables, examine data, and export the dataset with `write_csv`.:::: fragment::: r-fit-text```{r}#|label: date example with lubridatebom23 <- bom23 |>mutate(date =dmy(paste(Date,"2023")), # year is required# we paste it (add it as text) to each datemonth =month(date, label=T, abbr=T), # month shown as 3 letter abbr.day =wday(date, label=T, abbr=T), # weekday shown as 3 letter abbr.quart =quarter(date)) |># quarter shown as numberselect(date, month, day, quart, top10gross:num1) |># select and reorder variablesglimpse() |># examine datawrite_csv("data/Box_Office_Mojo_Week3_HW3.csv") # export using write_csv```::::::::::::::::## Importing Clean Data- `read_csv` is used in this class- External datasets should be saved as `.csv` files to your project folder - There are many CSV file options. - Select **CSV UTF-8** when saving Excel datasets as `.csv` files.- `show_col_types=F` suppresses the output message from importing data - This option will be required when you create a dashboard.:::: fragment::: r-fit-text```{r}#|label: import clean datamojo_23 <-read_csv("data/Box_Office_Mojo_Week3_HW3.csv", show_col_types=F) |>glimpse(width=60)```:::::::## Week 3 In-class Exercises - Q1***Session ID: bua455s25***Notice that in the prior chunk, we use the command `read_csv`**True or False:**`read_csv` and `read.csv` are the same and can be used interchangeably to import data.::: fragment**Hint:** Here are three ways to determine this:1. R help: In console type ?read_csv and/or type ?read.csv and look through documentation2. Google **R read_csv and read.csv**3. Ask 'Chat GPT', 'Copilot', or another AI search engine.:::::: fragment**Note:** R help files are sometimes hard to decipher and **Googling** often requires time and effort but both are excellent resources. AI search engines are getting better, but are not always 100% accurate.:::## ::::::: panel-tabset### [Categorical Data]{style="color:blue;"}This data set is **ALMOST** ready to work with BUT there are few additional tasks to cover:- Select all variables in dataset EXCEPT **`num1`** (name of number 1 movie) - We will work with text (character) variables after Quiz 1- Convert `month` to an ordinal factor, `monthF`- Convert `day` (of the week) to an ordinal factor, `wkdayF`, with Monday as 1st Day - Change `wkdayF` labels to be `M, T, W, Th, F, Sa, Su`- Convert quart (Quarter) to an ordinal factor with text labels (HW 3): - In HW 3 you will: - create a factor variable **`quartF`** with - levels: 1,2,3,4. - labels: "1st Qtr", "2nd Qtr", "3rd Qtr", "4th Qtr" . - create a publication quality table showing data by week day and quarter.### [Exclude `num1`]{style="color:blue;"}Recall: We use `!` to exclude a variable or filter out observations::: r-fit-text```{r}#|label: exclude a variablemojo_23_mod <- mojo_23 |># save as new dataset select(!num1) |># excludes text variable num1glimpse()```:::### [Create Factors]{style="color:blue;"}The `factor` command is used with `mutate` to create **TWO** factor variables - `levels` option specifies **order**. - `labels` option specifies **appearance of values**.::: r-fit-text```{r}#|label: creating factor variablesmojo_23_mod <- mojo_23_mod |>mutate(monthF =factor(month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec")),wkdayF =factor(day, levels=c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"),labels=c("M", "T", "W", "Th", "F", "Sa", "Su"))) |>glimpse()```:::### [Examine Factors]{style="color:blue;"}We can use `unique` or `summary` to examine the new variables `monthF` and `wkdayF`.- `unique` lists the levels (categories) in the specified order- `summary` of a factor variable shows the number of observations in each level (category).:::: fragment::: r-fit-text```{r }#|label: Examine factor variables mojo_23_mod |> pull(monthF) |> unique()mojo_23_mod |> pull(wkdayF) |> unique()mojo_23_mod |> select(month, monthF, day, wkdayF) |> summary()```::::::::::::::## ::::: panel-tabset### [Numerical Data]{style="color:blue;"}- The `mutate` command can contain many separate statements.- **Good practice:** Subdivide data management tasks into multiple chunks so that each chunk is easily understood.::: fragmentIn the next chunk we will::::- modify `top10gross` and `num1gross`: - divide by `1000000` and `round` for presentation purposes.- create percent of top 10 gross earned by number 1 film (HW 3), rounded to 2 decimal places. - `pctnum1 = (num1gross/top10gross * 100) |> round(2)`- convert `num_releases` to an integer (HW 3). - `num_releases = as.integer(num_releases)`### [R code for Numerical Data]{style="color:blue;"}**Note:** Variables are rounded to two decimal values by using piping and `round(2)`::: r-fit-text```{r}#|label: numerical data managementmojo_23_mod <- mojo_23_mod |>mutate(top10grossM = (top10gross/1000000) |>round(2), # change scale and roundnum1grossM = (num1gross/1000000) |>round(2), # change scale and roundnum1pct = (num1gross/top10gross *100) |>round(2), # create rounded pct varnum_releases =as.integer(num_releases)) |># converts num_releases to integer select(date, monthF, wkdayF, quart, num_releases, num1gross, num1grossM, top10gross, top10grossM, num1pct)head(mojo_23_mod)```::::::::## Week 3 In-class Exercises - Q2***Session ID: bua455s25*****This is BB Question 2 in HW 3**The correct command used to convert a numeric variable to an integer variable is`____()`.When you **`glimpse`** the data after Part 2 (Chunk 3) in HW 3, the type for the **`num_releases`** variable is shown as`<____>` instead of `<dbl>`.## Grouping and Filtering Data- We can filter data by value within each group. - R command `group_by` allows us to group data before we filter. - Data are filtered by value **WITHIN** each specified group - Ungrouping data afterwards using `ungroup` is not required, but often helpful.- The example below is not used in the subsequent summary but can be very useful.:::: fragment::: r-fit-text```{r}#|label: filter to last day of monthmojo_23_mnth_end <- mojo_23_mod |>select(date, monthF, top10grossM) |>group_by(monthF) |># doesn't change data appearancefilter(date ==max(date)) |>ungroup() |># ungroup not required but helpfulglimpse()```:::::::## Grouping and Summarizing Data- We will summarize data and then reshape it for a summary table. - R commands `group_by` and `summarize` allow us to summarize the data by category- When summarizing data, it is easier to select the variables you want first.- Plan what you want to do:::: fragment::: r-fit-text```{r group and summarize}mojo_23_smry <- mojo_23_mod |> select(monthF, wkdayF, top10grossM) |> group_by(monthF, wkdayF) |> # doesn't change data appearance summarize(avg_top10gross = mean(top10grossM, na.rm=T), mdn_top10gross = median(top10grossM, na.rm=T), max_top10gross = max(top10grossM, na.rm=T)) |> ungroup() |> glimpse() # ungroup not required but helpful```:::::::## Reshape Data using `pivot_wider`- A common task in data management is reshaping data- Display data tables must be compact for presentation:::: fragment::: r-fit-text```{r}#|label: reshape data with pivot_widermojo_23_wide <- mojo_23_smry |>pivot_wider(id_cols=monthF, names_from=wkdayF, values_from=max_top10gross) |>rename(Month = monthF)head(mojo_23_wide)```:::::::## Creating Tables for PresentationBelow are two options for for displaying a small dataset in tabular formats.- **Note:** Appearance of kable tables varies for slides, documents, and html files:::::::::: columns::::: {.column width="48%"}:::: fragment#### Basic Table with `kable`::: r-fit-text```{r}#|label: filter select present datamojo_23_fall_wknd <- mojo_23_wide |>select(Month, F, Sa, Su) |>filter(Month %in%c("Sep", "Oct", "Nov", "Dec"))mojo_23_fall_wknd |>kable()```::::::::::::::: {.column width="4%"}:::::::: {.column width="48%"}:::: fragment#### `kable` Table with styling::: r-fit-text```{r}#|label: modifying alignment and stylingmojo_23_fall_wknd |>kable(align="lccc", caption="Max. Fall `23 Top 10 Gross") |>kable_styling(full_width = F)```::::::::::::::::::::::## Reshaping Data using `pivot_longer`The longer data format is often needed for efficient data visualization:::::: columns::: {.column width="48%"}#### `pivot_longer` R code```{r}#|label: pivot_longer codemojo_23_long <- mojo_23_wide |>pivot_longer(cols=M:Su, names_to="Day", values_to="max_top10gross") head(mojo_23_long, 10)```:::::: {.column width="4%"}:::::: {.column width="48%"}#### basic `geom_bar` barplot R code```{r fig.dim=c(5,4)}#|label: stacked barplot(mojo_barplot <- mojo_23_long |> ggplot() + geom_bar(aes(x=Month, y=max_top10gross, fill=Day), stat="identity"))```:::::::::## ::: panel-tabset### [Stacked Barplot]{style="color:blue;"}```{r fig.dim=c(8,5), fig.align='center'}#|label: stacked no backgroundmojo_23_long <- mojo_23_long |> # Day converted to factor to specify order mutate(Day = factor(Day, levels=c("M", "T", "W", "Th", "F", "Sa", "Su")))(mojo_barplot <- mojo_23_long |> ggplot() + geom_bar(aes(x=Month, y=max_top10gross, fill=Day), stat="identity") + theme_classic())```### [Side-by-side]{style="color:blue;"}```{r fig.dim=c(12,5), fig.align='center'}#|label: side by side(mojo_barplot <- mojo_23_long |> ggplot() + geom_bar(aes(x=Month, y=max_top10gross, fill=Day), stat="identity", position="dodge") + theme_classic())```### [Labels Formatted]{style="color:blue;"}We can add on to the plot which is a saved object in the Global Environment.```{r fig.dim=c(12,5), fig.align='center'}#|label: label formatting(mojo_barplot <- mojo_barplot + theme(legend.position ="bottom") + guides(fill = guide_legend(nrow = 1)) + labs(x="", y="Maximum Daily Gross ($M)", title = "Maximum Daily Gross of Top 10 Films by Month and Day of Week", caption = "Data Source: www.boxofficemojo.com"))```### [Format Palette and Text]{style="color:blue;"}```{r fig.dim=c(12,5), fig.align='center'}#|label: spectral palette and text resized(mojo_barplot <- mojo_barplot + scale_fill_brewer(palette = "Spectral") + theme(plot.title = element_text(size = 20), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), plot.background = element_rect(colour = "darkgrey", fill=NA, size=2)))```:::## Week 3 In-class Exercises - Q2***Session ID: bua455s25*****This is part of BB Question 5 in HW 3**If you want a grouped barplot with **side-by-side bars**, what is the correct option to include in the **`geom_bar`** statement?[**Here is some additional information about geom_bar barplots.**](https://ggplot2.tidyverse.org/reference/geom_bar.html)## `pivot_longer` for Line and Area Plots- An alternative to summarizing the data is to show the data as a time series. - Two ways to do this are a **line plot** or an **area plot** - These plots are an effective data management and presentation tool.- To make a line plot with multiple variables, we use pivot_longer to reshape the data.:::: fragment::: r-fit-text```{r}#|label: reshape for line plotmojo_23_line_area <- mojo_23_mod |>select(date, top10grossM, num1grossM) |># select variablesrename(`Top 10`= top10grossM, `No. 1`= num1grossM) |># rename for plotpivot_longer(cols=`Top 10`:`No. 1`, # reshape data names_to ="type", values_to ="grossM") |>mutate(type=factor(type, levels=c("Top 10", "No. 1"))) # convert gross type to factorhead(mojo_23_line_area, 4)```:::::::## ::: panel-tabset### [Line Plot]{style="color:blue;"}```{r fig.dim=c(14,4), fig.align='center'}#|label: basic line plot(line_plt <- mojo_23_line_area |> ggplot() + geom_line(aes(x=date, y=grossM, color=type), size=1) + theme_classic())```### [Labels & Colors]{style="color:blue;"}```{r fig.dim=c(14,5), fig.align='center'}#|label: labels and colors formatted(line_plt <- line_plt + theme(legend.position="bottom") + # legend at bottom scale_color_manual(values=c("blue", "lightblue")) + # specify colors labs(x="Date", y = "Gross ($Mill)", color="", title="Top 10 and No. 1 Movie Gross by Date", subtitle="Jan. 1, 2023 - Dec. 31, 2023", caption="Data Source:www.boxoffice.mojo.com"))```### [Resize Text]{style="color:blue;"}```{r fig.dim=c(14,5), fig.align='center'}#|label: adjust text size(line_plt <- line_plt + theme(plot.title = element_text(size = 20), plot.caption = element_text(size = 10), axis.text = element_text(size=15), axis.title = element_text(size=18), legend.text = element_text(size = 12), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth = 2)))```### [Area Plot Code]{style="color:blue;"}Change `geom_line` to `geom_area` and `color` to `fill````{r}#|label: area plot codearea_plt <- mojo_23_line_area |>ggplot() +# changed to geom_areageom_area(aes(x=date, y=grossM, fill=type), size=1) +# changed color to filltheme_classic() +theme(legend.position="bottom") +scale_fill_manual(values=c("blue", "lightblue")) +# changed color to filllabs(x="Date", y ="Gross ($Mill)", fill="", # changed color to filltitle="Top 10 and No. 1 Movie Gross by Date", subtitle="Jan. 1, 2023 - Dec. 31, 2023",caption="Data Source:www.boxoffice.mojo.com") +theme(plot.title =element_text(size =20),axis.title =element_text(size=18),axis.text =element_text(size=15),plot.caption =element_text(size =10),legend.text =element_text(size =12),plot.background =element_rect(colour ="darkgrey", fill=NA, linewidth=2))```### [Area Plot]{style="color:blue;"}```{r fig.dim=c(14,7), echo=F, fig.align='center'}#|label: area plot displayedarea_plt```:::## Week 3 In-class Exercises***Lecture 6 - Q1 - NOT ON PointSolutions***::: fragment**In class we will practice:**:::- Running chunks and exporting a table.- **Preview for 1 Question in Quiz 1 where you will:** - Select variables from a provided dataset - Group and summarize data - Export a summary table as a .csv file and submit it.## Instructions for In-class Exercise1. Save Week 3 R project to your computer.2. Open this project by clicking on .Rproj file.3. Open .Rmd file within open R project.4. Run all chunks above this exercise.5. Modify the following chunk below to: i. Round all values in columns 2-4 of `mojo_23_fall_wknd` to 1 decimal place using `round`. ii. Export `mojo_23_fall_wknd` as a `.csv` file with your name.6. Submit this .csv file with your name in the **Week 3 In-class Exercise** in the **In-class Exercises** folder on Blackboard.::: fragment**NOTE:** This counts as part of your in-class participation for the Week 3 lectures (due Fri. at midnight).:::## R Code Chunk for In-class Exercise0. Remove `, eval=F` from chunk header. This will allow code in chunk to run when it is rendered.1. Remove the `#` and complete `round` command to round numeric columns (columns 2 - 4) to 1 decimal place.2. Choose EITHER of the `write_csv` commands and edit it so dataset will be exported to the `data` folder with your name.3. Delete `write_csv` command you don't edit or put `#` symbols in front of it.4. Submit `.csv file` with your name in the filename::: fragment```{r eval = F}#|label: round and export summary datasetmojo_23_fall_wknd |> glimpse() # examine data with glimpse# round columns 2, 3 and 4 only# export summary dataset using write_csv without pipingwrite_csv(mojo_23_fall_wknd, "data/Movie_Gross_Fall_2023_Weekends_FirstName_Last_Name.csv")# export summary dataset using write_csv with pipingmojo_23_fall_wknd |> write_csv("data/Movie_Gross_Fall_2023_Weekends_FirstName_Last_Name.csv")```:::## Week 3 In-class Exercises***Lecture 6 - Q2 - NOT ON PointSolutions*****Practice:**If all the columns in a dataset are numeric, you can round the whole dataset at once with the command `round(<name of dataset>)`.Why wouldn't that work for the dataset in the previous exercise, `mojo_23_fall_wknd`?Hint: To answer this question, you are encourage to- try running the command `round(mojo_23_fall_wknd)`.- examine the data using `glimpse`.## Week 3 In-class Exercises - Q5***Session ID: bua455s25***Which of the following commands should **NOT** be used within a `mutate` command or a `summarize` command?- `as.integer`- `factor`- `mean`- `filter`## HW 3 Introduction### Purpose::: fragmentThis assignment will give you experience with::::- Creating an R Project Directory folder with `data` and `img` folders. (Review)- Creating, saving, using a Quarto file (Review)- Importing data- Rendering a Quarto file to create an HTML file (Review)- Creating a README file (Review)- Using the dplyr commands along with commands to reshape and summarize data- Creating plots with some formatting## Week 3 In-class Exercises - Q6***Session ID: bua455s25***In HW 3, you will group the data by quarter and week day. This is Part 4 of HW 3 and is very similar to the group_by and summarize code covered in Lecture 5.**This is BB Question 3 in HW 3**Your grouped and summarized dataset, **`mojo_qtr_smry`**, has`____` rows and`____` columns`____` summary numeric variables## {background-image="img/tired_panda_faded.png"}### Key Points from This Week::: fragment**Summarizing Data by Group**:::- Use `group_by` to specify grouping variables followed by `summarize` - Within summarize specify type, .e.g. `mean`, `median`, `max`, etc.::: fragment**Reshaping Data for Different Purposes**:::- `pivot_wider` is useful for display tables- `pivot_longer` is useful for plots::: fragment**Plotting Data**:::- grouped barplots (stacked and side-by-side)- line plots and area plots::: fragmentYou may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**:::