Week 10 - Part 1

HW 5 - Part 1 and Related Skills

Author

Penelope Pooler Eisenbies

Published

October 27, 2025

Dashboard Project - Data Exploration

Groups have been assigned. Please meet and explore data resources this week.
- Proposal Rough draft due Thursday, 10/30 at 6:00 PM
- Course website has many data sources and I recommend using AI for data searches.

You are building a dashboard to achieve TWO GOALS:

Answer questions proposed.
Display your data management skills that include, but are not limited to:

acquiring data.
combining data from different sources.
managing data for different goals and curating data in visual and tabular formats.
documenting your work product.

Keep these goals in mind as you look for data and develop your proposals.

Collaborating on an R Project using Posit Cloud

Groups have been created on Posit Cloud.
Group membership invitations expire within 7 days.
- Create your account and accept invitation ASAP
- $5 Student subscription required before you can contribute to online group workspace.
Create multiple projects with your Workspace
- Two laptops cannot easily edit same file simultaneously.
  - Usually, Posit Cloud will make a ‘forked’ copy so you can both keep working.
- Collaboration is still doable but it’s not as seamless as Google Docs.
Course TAs and I will be members of all groups so we can help you as needed.
Demo Using HW 5.1 was done during Lecture 18 (see recording)

Tour of Dashboard Template (Review)

For HW 5, I have provided you with a partially complete dashboard.
I am also providing you with an empty template that you can upload to Posit Cloud and use for your project.
- This format is extremely flexible.
- You also can make multiple temporary projects with simple ’.qmd` files within your group workspace.
Note that the course websites for all of my courses were created using Quarto Dashboard, because the format is so flexible and easy to use.
When working with your data, you can start out working in an unformatted .qmd file and then transfer work to your dashbard page.

New Skills Covered in Previous Lecture

Using ifelse to create a categorical variable
- Can be combined with factor to create a factor variable
- Factor variables are useful for plots, tables, and analyses
  - Character (text variable) are easier to work with when managing and manipulating data.
  - Factor variables are preferred when plotting or presenting data in tables.
Modifying and cleaning text variables
- tolower and toupper
- str_trim (as well as str_squish and str_pad)
- gsub and ifelse

Data Mgmt for HW 5 - Part 1 Dashboard

Review of Previous Data Mgmt skills used

filter and select
group_by and summarize
pivot_wider and pivot_longer

Note: This week we will cover how to use pivot_wider and pivot_longer to complete and incomplete dataset.

In HW 5 - Part 1:
- Not needed for Netflix Movie data even though these data are not fully complete.
- Only the first year is incomplete and there are relatively little data in the first year.

HW 5 - Part 1 - Page 1

Value Boxes and Interactive Time Series Plots
Review of importing, managing, and plotting xts data.
- xts stands for extensible time series
Value Boxes do not have to be edited.
Multiple xts plot objects can be displayed using hw_grid
- in HW 5 plots are shown seprately without hw_grid.
You should also review (Not in Page 1):
- Converting a tibble to an xts
- Converting an xts to a tibble
New Skills
- Using links to cite data sources

HW 5 - Part 1 - Page 2 - Data Mgmt.

Managing data for Grouped Stacked Barplots with Facets and Summary Table
Previously Covered Skills
- pivot_longerto reshape nflx_mv data
- ifelse is used to simplify release period data
- mutate and factor is used to create two factor variables with specified levels
- filter is used to subset the data to release periods with enough data.
New Skills - Page 2 Plot is a grouped stack bar plot using facet_grid.
- facet_grid(~release_period) specifies side-by-side plots for each release period.
- scale_fill_brewer(palette = "Spectral") specifies the color palette.
- Theme and palette options (Part of Optional Extra Credit)
  - ggtheme options
  - Palette options and another link with a comprehensive list of palettes

HW 5 - Part 1 - Page 2 - Bar Plot and Table

Page 2 - Bar Plot

Be aware of how the barplot is specified in the geom_barplot statement.
Stacked bars were used to save space.

Page 2 Summary Table

The skills used to create to select and summarize data are review.
The table is presented using kable (also review).
Note that for projects, you can refine your table further, if needed, using commands from kableExtra package.
An alternative option for publishing tables is the gt package.

HW 5 - Part 1 - Page 2 - Area Plot

Useful new Skill:
- summing across rows: mutate(total = sum(c_across(comedies:kids)))
- sometimes used with rowwise() and ungroup but that is not needed here
Area Plot (Review)
- Similar to line plot but uses
  - geom_area(aes(x = , y = , fill= ))
- scale_x_continuous(breaks = ) used to show all years in data
- scale_fill_brewer used to specify same palette as Page 2

Week 10 In-class Exercises

This short exercise demonstrates why a complete dataset with all combination levels is often needed for area plots.

Save R gtcars dataset to Global Environment and examine the data.
- Note: This dataset comes from the gt package.

Code

```{r}
#|label: save and examine data
my_gtcars <- gtcars |> glimpse()
```

Rows: 47
Columns: 15
$ mfr         <chr> "Ford", "Ferrari", "Ferrari", "Ferrari", "Ferrari", "Ferra…
$ model       <chr> "GT", "458 Speciale", "458 Spider", "458 Italia", "488 GTB…
$ year        <dbl> 2017, 2015, 2015, 2014, 2016, 2015, 2017, 2015, 2015, 2015…
$ trim        <chr> "Base Coupe", "Base Coupe", "Base", "Base Coupe", "Base Co…
$ bdy_style   <chr> "coupe", "coupe", "convertible", "coupe", "coupe", "conver…
$ hp          <dbl> 647, 597, 562, 562, 661, 553, 680, 652, 731, 949, 573, 545…
$ hp_rpm      <dbl> 6250, 9000, 9000, 9000, 8000, 7500, 8250, 8000, 8250, 9000…
$ trq         <dbl> 550, 398, 398, 398, 561, 557, 514, 504, 509, 664, 476, 436…
$ trq_rpm     <dbl> 5900, 6000, 6000, 6000, 3000, 4750, 5750, 6000, 6000, 6750…
$ mpg_c       <dbl> 11, 13, 13, 13, 15, 16, 12, 11, 11, 12, 21, 16, 11, 16, 12…
$ mpg_h       <dbl> 18, 17, 17, 17, 22, 23, 17, 16, 16, 16, 22, 22, 18, 20, 20…
$ drivetrain  <chr> "rwd", "rwd", "rwd", "rwd", "rwd", "rwd", "awd", "awd", "r…
$ trsmn       <chr> "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a"…
$ ctry_origin <chr> "United States", "Italy", "Italy", "Italy", "Italy", "Ital…
$ msrp        <dbl> 447000, 291744, 263553, 233509, 245400, 198973, 298000, 29…

Week 10 In-class Exercises Cont’d

Select the following variables
- year, bdy_style (body style), drivetrain (drive train)
Group data by year, body style, and drive train and count number of observations
- command to summarize number of observations is n()

Code

```{r}
#|label: select vars then group data and summarize
my_gtcars <- my_gtcars |>
  select(year, bdy_style, drivetrain) |>
  group_by(year, bdy_style, drivetrain) |>
  summarize(n = n())                              # new command n()
```

`summarise()` has grouped output by 'year', 'bdy_style'. You can override using
the `.groups` argument.

Code

```{r}
head(my_gtcars)                                   # incomplete dataset
```

# A tibble: 6 × 4
# Groups:   year, bdy_style [5]
   year bdy_style   drivetrain     n
  <dbl> <chr>       <chr>      <int>
1  2014 coupe       rwd            2
2  2015 convertible rwd            2
3  2015 coupe       awd            3
4  2015 coupe       rwd            4
5  2016 convertible rwd            2
6  2016 coupe       awd            3

Week 10 In-class Exercises Con’t

Use pivot_wider to create a column for each drivetrain
Replace missing values with 0

Code

```{r reshape and replace NAs with 0s}
my_gtcars <- my_gtcars |> 
  pivot_wider(id_cols= c("year", "bdy_style"), names_from = drivetrain, values_from = n)
my_gtcars[is.na(my_gtcars)] <- 0  # convert missing values to 0
head(my_gtcars)
```

# A tibble: 6 × 4
# Groups:   year, bdy_style [6]
   year bdy_style     rwd   awd
  <dbl> <chr>       <int> <int>
1  2014 coupe           2     0
2  2015 convertible     2     0
3  2015 coupe           4     3
4  2016 convertible     2     0
5  2016 coupe          13     3
6  2016 hatchback       0     2

Week 10 In-class Exercises - Q1

Poll Everywhere - My User Name: penelopepoolereisenbies685

Sum across rows to total data from both drive trains

Q1: What is the command to sum across rows?

Code

```{r}
#|label: sum across rows to create total
# simple version
my_gtcars <- my_gtcars |>                      
   mutate(total = sum(c_across(rwd:awd))) 

# more complete version with rowwise and ungroup
my_gtcars <- my_gtcars |>                      
   rowwise() |>
   mutate(total = sum(c_across(rwd:awd))) |>   
   ungroup()
```

Data with total column

# A tibble: 10 × 5
    year bdy_style     rwd   awd total
   <dbl> <chr>       <int> <int> <int>
 1  2014 coupe           2     0     2
 2  2015 convertible     2     0     2
 3  2015 coupe           4     3     7
 4  2016 convertible     2     0     2
 5  2016 coupe          13     3    16
 6  2016 hatchback       0     2     2
 7  2016 sedan           5     2     7
 8  2017 convertible     1     0     1
 9  2017 coupe           5     2     7
10  2017 sedan           0     1     1

Week 10 In-class Exercises - Q2-Q3

Poll Everywhere - My User Name: penelopepoolereisenbies685

Select year, body style and total

Q2: How many observations do you have?

Q3: If every body style was made in every year, how many observations would you have?

There are 4 body styles and 4 years represented in these data.
Also in this chunk: Command to capitalize first letter of words in a string: str_to_title

Code

```{r}
#|label: select vars of interest and examine

my_gtcars <- my_gtcars |> select(year, bdy_style, total) |>
  mutate(bdy_style = str_to_title(bdy_style)) |>    # Capitalize first letters
  glimpse()
```

Rows: 10
Columns: 3
$ year      <dbl> 2014, 2015, 2015, 2016, 2016, 2016, 2016, 2017, 2017, 2017
$ bdy_style <chr> "Coupe", "Convertible", "Coupe", "Convertible", "Coupe", "Ha…
$ total     <int> 2, 2, 7, 2, 16, 2, 7, 1, 7, 1

Fill in incomplete dataset using

# A tibble: 10 × 3
    year bdy_style   total
   <dbl> <chr>       <int>
 1  2014 Coupe           2
 2  2015 Convertible     2
 3  2015 Coupe           7
 4  2016 Convertible     2
 5  2016 Coupe          16
 6  2016 Hatchback       2
 7  2016 Sedan           7
 8  2017 Convertible     1
 9  2017 Coupe           7
10  2017 Sedan           1

Reshaping Data creates a 4x4 matrix with NAs

Code

```{r}
#|label:  pivot wider to reshape and include NAs
(my_gtcars1 <- my_gtcars |>
   pivot_wider(id_cols= year, names_from = bdy_style, values_from = total))
```

# A tibble: 4 × 5
   year Coupe Convertible Hatchback Sedan
  <dbl> <int>       <int>     <int> <int>
1  2014     2          NA        NA    NA
2  2015     7           2        NA    NA
3  2016    16           2         2     7
4  2017     7           1        NA     1

Reshaping data back to long format keeps NAs, categories with no observations

Code

```{r}
#|label: long form with NAs
my_gtcars1 <- my_gtcars1 |>
   pivot_longer(cols=Coupe:Sedan, 
                names_to = "bdy_style", 
                values_to = "total")
```

# A tibble: 16 × 3
    year bdy_style   total
   <dbl> <chr>       <int>
 1  2014 Coupe           2
 2  2014 Convertible    NA
 3  2014 Hatchback      NA
 4  2014 Sedan          NA
 5  2015 Coupe           7
 6  2015 Convertible     2
 7  2015 Hatchback      NA
 8  2015 Sedan          NA
 9  2016 Coupe          16
10  2016 Convertible     2
11  2016 Hatchback       2
12  2016 Sedan           7
13  2017 Coupe           7
14  2017 Convertible     1
15  2017 Hatchback      NA
16  2017 Sedan           1

These data values are number of observations so NAs can be replaced with 0s.

Code

```{r}
#|label:  convert NAs to 0s
my_gtcars1[is.na(my_gtcars1)] <- 0
```

# A tibble: 16 × 3
    year bdy_style   total
   <dbl> <chr>       <int>
 1  2014 Coupe           2
 2  2014 Convertible     0
 3  2014 Hatchback       0
 4  2014 Sedan           0
 5  2015 Coupe           7
 6  2015 Convertible     2
 7  2015 Hatchback       0
 8  2015 Sedan           0
 9  2016 Coupe          16
10  2016 Convertible     2
11  2016 Hatchback       2
12  2016 Sedan           7
13  2017 Coupe           7
14  2017 Convertible     1
15  2017 Hatchback       0
16  2017 Sedan           1

Complete code is relatively short and efficient.

Factors should be created AFTER this process, if needed.

Code

```{r complete conversion code, eval=FALSE}
my_gtcars1 <- my_gtcars |>
  
  pivot_wider(id_cols= year, 
              names_from = bdy_style, 
              values_from = total) |>
  
  pivot_longer(cols=Coupe:Sedan, 
               names_to = "bdy_style", 
               values_to = "total") |>
my_gtcars1[is.na(my_gtcars1)] <- 0
```

# A tibble: 16 × 3
    year bdy_style   total
   <dbl> <chr>       <int>
 1  2014 Coupe           2
 2  2014 Convertible     0
 3  2014 Hatchback       0
 4  2014 Sedan           0
 5  2015 Coupe           7
 6  2015 Convertible     2
 7  2015 Hatchback       0
 8  2015 Sedan           0
 9  2016 Coupe          16
10  2016 Convertible     2
11  2016 Hatchback       2
12  2016 Sedan           7
13  2017 Coupe           7
14  2017 Convertible     1
15  2017 Hatchback       0
16  2017 Sedan           1

Comparison of Area Plots

The plot on the left was made with the incomplete data.
The plot on the right was made with complete data where all levels of the variables shown are in the dataset.

Key Points from Today

Data Management Skills - Some Review and Some New

summarizing number of observations by category using n()
Using sum(c_across()) to sum across rows
Knowing how to complete data that is incomplete

Reminders:

Rough Draft Proposals due by 10/30 at 6:00 PM
HW 5 - Part 1 is due Wednesday, 10/29.
Quiz is on Thu. 11/6. Practice questions for Quiz 2 will be posted by 10/30.
Managing and presenting geographic data in choropleth maps will be covered on Thursday and is not on Quiz 2.

You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.

--- title: "Week 10 - Part 1" subtitle: "HW 5 - Part 1 and Related Skills" author: "Penelope Pooler Eisenbies" date: last-modified lightbox: true toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Dashboard Project - Data Exploration ```{r include=F} #|label: setup knitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunks options(scipen=100) # suppress scientific notation # install pacman if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") pacman::p_load(pacman, tidyverse, gridExtra, magrittr, gt, kableExtra, tidyquant, highcharter, dygraphs) # install and load required packages p_loaded() # verify loaded packages ``` - Groups have been assigned. Please meet and explore data resources this week. - Proposal Rough draft due Thursday, 10/30 at 6:00 PM - Course website has many data sources and I recommend using AI for data searches. ::: fragment **You are building a dashboard to achieve TWO GOALS:** ::: 1. Answer questions proposed. 2. Display your data management skills that include, but are not limited to: - acquiring data. - combining data from different sources. - managing data for different goals and curating data in visual and tabular formats. - documenting your work product. ::: fragment **Keep these goals in mind as you look for data and develop your proposals.** ::: ## ### Collaborating on an R Project using Posit Cloud - Groups have been created on Posit Cloud. - **Group membership invitations expire within 7 days.** - Create your account and accept invitation ASAP - \$5 Student subscription required before you can contribute to online group workspace. - Create multiple projects with your Workspace - Two laptops cannot easily edit same file simultaneously. - Usually, Posit Cloud will make a 'forked' copy so you can both keep working. - Collaboration is still doable but it's not as seamless as Google Docs. - Course TAs and I will be members of all groups so we can help you as needed. - Demo Using HW 5.1 was done during Lecture 18 (see recording) ## Tour of Dashboard Template (Review) - For HW 5, I have provided you with a partially complete dashboard. - I am also providing you with an empty template that you can upload to Posit Cloud and use for your project. - This format is extremely flexible. - You also can make multiple temporary projects with simple '.qmd\` files within your group workspace. - Note that the course websites for all of my courses were created using `Quarto Dashboard`, because the format is so flexible and easy to use. - When working with your data, you can start out working in an unformatted `.qmd` file and then transfer work to your dashbard page. ## ### New Skills Covered in Previous Lecture - Using `ifelse` to create a categorical variable - Can be combined with `factor` to create a factor variable - Factor variables are useful for plots, tables, and analyses - Character (text variable) are easier to work with when managing and manipulating data. - Factor variables are preferred when plotting or presenting data in tables. - Modifying and cleaning text variables - `tolower` and `toupper` - `str_trim` (as well as `str_squish` and `str_pad`) - `gsub` and `ifelse` ## Data Mgmt for HW 5 - Part 1 Dashboard **Review of Previous Data Mgmt skills used** - `filter` and `select` - `group_by` and `summarize` - `pivot_wider` and `pivot_longer` ::: fragment **Note:** This week we will cover how to use `pivot_wider` and `pivot_longer` to ***complete*** and incomplete dataset. ::: - In HW 5 - Part 1: - Not needed for Netflix Movie data even though these data are not fully complete. - Only the first year is incomplete and there are relatively little data in the first year. ## HW 5 - Part 1 - Page 1 :::::: columns ::: {.column width="68%"} - **Value Boxes and Interactive Time Series Plots** - Review of importing, managing, and plotting `xts` data. - `xts` stands for extensible time series - Value Boxes do not have to be edited. - Multiple `xts` plot objects can be displayed using `hw_grid` - in HW 5 plots are shown seprately without `hw_grid`. - You should also review (Not in Page 1): - Converting a `tibble` to an `xts` - Converting an `xts` to a `tibble` - New Skills - [Using links to cite data sources](https://quarto.org/docs/authoring/markdown-basics.html#links-images) ::: ::: {.column width="4%"} ::: ::: {.column width="28%"} ![](img/owl.png) ::: :::::: ## HW 5 - Part 1 - Page 2 - Data Mgmt. - **Managing data for Grouped Stacked Barplots with Facets and Summary Table** - **Previously Covered Skills** - `pivot_longer`to reshape `nflx_mv` data - `ifelse` is used to simplify `release period` data - `mutate` and `factor` is used to create two factor variables with specified levels - `filter` is used to subset the data to release periods with enough data. - **New Skills** - Page 2 Plot is a grouped stack bar plot using `facet_grid`. - `facet_grid(~release_period)` specifies side-by-side plots for each release period. - `scale_fill_brewer(palette = "Spectral")` specifies the color palette. - Theme and palette options (Part of Optional Extra Credit) - [ggtheme options](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/){target="_blank"} - [Palette options](http://www.sthda.com/english/wiki/colors-in-r#use-wes-anderson-color-palettes){target="_blank"} and another link with a [comprehensive list of palettes](https://github.com/EmilHvitfeldt/r-color-palettes){target="_blank"} ## ### HW 5 - Part 1 - Page 2 - Bar Plot and Table :::::::: columns ::::: {.column width="68%"} ::: fragment **Page 2 - Bar Plot** ::: - Be aware of how the barplot is specified in the `geom_barplot` statement. - Stacked bars were used to save space. ::: fragment **Page 2 Summary Table** ::: - The skills used to create to select and summarize data are review. - The table is presented using `kable` (also review). - Note that for projects, you can refine your table further, if needed, using commands from `kableExtra` package. - An alternative option for publishing tables is the `gt` package. ::::: ::: {.column width="4%"} ::: ::: {.column width="28%"} ![](img/beaver.png) ::: :::::::: ## ### HW 5 - Part 1 - Page 2 - Area Plot - **Useful new Skill:** - summing across rows: `mutate(total = sum(c_across(comedies:kids)))` - sometimes used with `rowwise()` and `ungroup` but that is not needed here - **Area Plot (Review)** - Similar to line plot but uses - `geom_area(aes(x = , y = , fill= ))` - `scale_x_continuous(breaks = )` used to show all years in data - `scale_fill_brewer` used to specify same palette as Page 2 ## Week 10 In-class Exercises - This short exercise demonstrates why a **complete** dataset with all combination levels is often needed for area plots. 1. Save R `gtcars` dataset to Global Environment and examine the data. - Note: This dataset comes from the [**gt**](https://gt.rstudio.com/){target="_blank"} package. ::: fragment ```{r} #|label: save and examine data my_gtcars <- gtcars |> glimpse() ``` ::: ## Week 10 In-class Exercises Cont'd 2. Select the following variables - year, bdy_style (body style), drivetrain (drive train) 3. Group data by year, body style, and drive train and count number of observations - command to summarize number of observations is `n()` ::: fragment ```{r} #|label: select vars then group data and summarize my_gtcars <- my_gtcars |> select(year, bdy_style, drivetrain) |> group_by(year, bdy_style, drivetrain) |> summarize(n = n()) # new command n() head(my_gtcars) # incomplete dataset ``` ::: ## Week 10 In-class Exercises Con't 4. Use pivot_wider to create a column for each drivetrain 5. Replace missing values with 0 ::: fragment ```{r reshape and replace NAs with 0s} my_gtcars <- my_gtcars |> pivot_wider(id_cols= c("year", "bdy_style"), names_from = drivetrain, values_from = n) my_gtcars[is.na(my_gtcars)] <- 0 # convert missing values to 0 head(my_gtcars) ``` ::: ## ### Week 10 In-class Exercises - Q1 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** :::::::: columns :::: {.column width="48%"} 6. Sum across rows to total data from both drive trains ::: fragment **Q1:** What is the command to sum across rows? ```{r} #|label: sum across rows to create total # simple version my_gtcars <- my_gtcars |> mutate(total = sum(c_across(rwd:awd))) # more complete version with rowwise and ungroup my_gtcars <- my_gtcars |> rowwise() |> mutate(total = sum(c_across(rwd:awd))) |> ungroup() ``` ::: :::: ::: {.column width="4%"} ::: :::: {.column width="48%"} ::: fragment **Data with `total` column** ```{r echo=F} #|label: show data with total my_gtcars ``` ::: :::: :::::::: ## ### Week 10 In-class Exercises - Q2-Q3 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** 7. Select year, body style and total ::: fragment **Q2:** How many observations do you have? ::: ::: fragment **Q3:** If every body style was made in every year, how many observations would you have? ::: - There are 4 body styles and 4 years represented in these data. - **Also in this chunk:** Command to capitalize first letter of words in a string: `str_to_title` ::: fragment ```{r} #|label: select vars of interest and examine my_gtcars <- my_gtcars |> select(year, bdy_style, total) |> mutate(bdy_style = str_to_title(bdy_style)) |> # Capitalize first letters glimpse() ``` ::: ## Fill in incomplete dataset using ::::::::::::::::::: panel-tabset ### [Incomplete Data]{style="color:blue;"} :::::: columns ::: {.column width="48%"} ```{r incomplete data, echo=F} my_gtcars ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ![](img/beaver.png) ::: :::::: ### [`pivot_wider`]{style="color:blue;"} Reshaping Data creates a 4x4 matrix with `NA`s ```{r} #|label: pivot wider to reshape and include NAs (my_gtcars1 <- my_gtcars |> pivot_wider(id_cols= year, names_from = bdy_style, values_from = total)) ``` ### [`pivot_longer`]{style="color:blue;"} :::::: columns ::: {.column width="48%"} Reshaping data back to long format keeps `NA`s, categories with no observations ```{r} #|label: long form with NAs my_gtcars1 <- my_gtcars1 |> pivot_longer(cols=Coupe:Sedan, names_to = "bdy_style", values_to = "total") ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ```{r long format with NAs, echo=F} my_gtcars1 ``` ::: :::::: ### [Replace `NA`s]{style="color:blue;"} :::::: columns ::: {.column width="48%"} These data values are number of observations so `NA`s can be replaced with `0`s. ```{r} #|label: convert NAs to 0s my_gtcars1[is.na(my_gtcars1)] <- 0 ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ```{r show final dataset, echo=FALSE} my_gtcars1 ``` ::: :::::: ### [All Code]{style="color:blue;"} :::::: columns ::: {.column width="48%"} Complete code is relatively short and efficient. Factors should be created **AFTER** this process, if needed. ```{r complete conversion code, eval=FALSE} my_gtcars1 <- my_gtcars |> pivot_wider(id_cols= year, names_from = bdy_style, values_from = total) |> pivot_longer(cols=Coupe:Sedan, names_to = "bdy_style", values_to = "total") |> my_gtcars1[is.na(my_gtcars1)] <- 0 ``` ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ```{r reshow final dataset, echo=FALSE} my_gtcars1 ``` ::: :::::: ::::::::::::::::::: ## Comparison of Area Plots - The plot on the **left** was made with the **incomplete** data. - The plot on the **right** was made with **complete** data where all levels of the variables shown are in the dataset. ::: fragment ```{r area plot code, echo=F, fig.dim=c(15,7), fig.align='center'} gt_car_area_plot <- my_gtcars |> ggplot() + geom_area(aes(x=year, y=total, fill=bdy_style)) + theme_classic() + theme(legend.position="bottom") + scale_x_continuous(breaks=seq(2014,2017,1)) + scale_fill_brewer(palette = "Spectral") + labs(x="Year", y="Number of cars", fill="", title="Number of Cars of Each Body Type", subtitle="Incomplete Data for 2014 - 2017", caption="Data Source: https://CRAN.R-project.org/package=gt") + theme(plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) gt_car_area_plot1 <- my_gtcars1 |> ggplot() + geom_area(aes(x=year, y=total, fill=bdy_style)) + theme_classic() + theme(legend.position="bottom") + scale_x_continuous(breaks=seq(2014,2017,1)) + scale_fill_brewer(palette = "Spectral") + labs(x="Year", y="Number of cars", fill="", title="Number of Cars of Each Body Type", subtitle="Complete Data for 2014 - 2017", caption="Data Source: https://CRAN.R-project.org/package=gt") + theme(plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), axis.title = element_text(size=18), axis.text = element_text(size=15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) grid.arrange(gt_car_area_plot, gt_car_area_plot1, ncol=2) ``` ::: ## ### Key Points from Today ::: fragment **Data Management Skills - Some Review and Some New** ::: - summarizing number of observations by category using `n()` - Using `sum(c_across())` to sum across rows - Knowing how to ***complete*** data that is incomplete ::: fragment **Reminders:** ::: - **Rough Draft Proposals due by 10/30 at 6:00 PM** - HW 5 - Part 1 is due Wednesday, 10/29. - **Quiz is on Thu. 11/6. Practice questions for Quiz 2 will be posted by 10/30.** - **Managing and presenting geographic data** in choropleth maps will be covered on Thursday and **is not on Quiz 2**. ::: fragment You may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.** :::