Rows: 47
Columns: 15
$ mfr <chr> "Ford", "Ferrari", "Ferrari", "Ferrari", "Ferrari", "Ferra…
$ model <chr> "GT", "458 Speciale", "458 Spider", "458 Italia", "488 GTB…
$ year <dbl> 2017, 2015, 2015, 2014, 2016, 2015, 2017, 2015, 2015, 2015…
$ trim <chr> "Base Coupe", "Base Coupe", "Base", "Base Coupe", "Base Co…
$ bdy_style <chr> "coupe", "coupe", "convertible", "coupe", "coupe", "conver…
$ hp <dbl> 647, 597, 562, 562, 661, 553, 680, 652, 731, 949, 573, 545…
$ hp_rpm <dbl> 6250, 9000, 9000, 9000, 8000, 7500, 8250, 8000, 8250, 9000…
$ trq <dbl> 550, 398, 398, 398, 561, 557, 514, 504, 509, 664, 476, 436…
$ trq_rpm <dbl> 5900, 6000, 6000, 6000, 3000, 4750, 5750, 6000, 6000, 6750…
$ mpg_c <dbl> 11, 13, 13, 13, 15, 16, 12, 11, 11, 12, 21, 16, 11, 16, 12…
$ mpg_h <dbl> 18, 17, 17, 17, 22, 23, 17, 16, 16, 16, 22, 22, 18, 20, 20…
$ drivetrain <chr> "rwd", "rwd", "rwd", "rwd", "rwd", "rwd", "awd", "awd", "r…
$ trsmn <chr> "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a", "7a"…
$ ctry_origin <chr> "United States", "Italy", "Italy", "Italy", "Italy", "Ital…
$ msrp <dbl> 447000, 291744, 263553, 233509, 245400, 198973, 298000, 29…
Week 9
HW 5 - Part 1 - Quarto Dashboard and Data Mgmt
Dashboard Project - Data Exploration
Groups have been assigned so it is time to meet and explore data resources.
The course website has many data sources and I recommend search for data with AI
When deciding on a topic and data, keep the project parameters in mind.
You are building a dashboard to achieve TWO GOALS:
Answer questions proposed.
Display your data management skills that include, but are not limited to:
acquiring data.
combining data from different sources.
managing data for different goals and curating data in visual and tabular formats.
documenting your work product.
Keep these goals in mind as you look for data and develop your proposals.
Collaborating on an R Project using Posit Cloud
Groups have been created on Posit Cloud.
Group membership invitations expire within 7 days.
Accept Invitation ASAP
Create multiple projects with your Workspace
Two laptops cannot easily edit same project simultaneously.
Collaboration is still doable but it’s not as seamless as Google Docs.
Course TAs and I will be members of all groups so we can help you as needed.
Demo Using HW 5.1
Tour of Dashboard Template
For HW 5, I have provided you with a partially complete dashboard.
I am also providing you with an empty template that you can upload to Posit Cloud and use for your project.
This format is extremely flexible.
You also can make multiple temporary projects with simple ’.qmd` files within your group workspace.
Note that the course websites for all of my courses were created using
Quarto Dashboard
, because the format is so flexible and easy to use.When working with your data, you can start out working in an unformatted
.qmd
file and then transfer work to your dashbard page.
New Skills Covered in Previous Lecture
Using
ifelse
to create a categorical variableCan be combined with
factor
to create a factor variableFactor variables are useful for plots, tables, and analyses
- Character (text variable) are easier to work with when managing and manipulating data.
- Factor variables are preferred when plotting or presenting data in tables.
Modifying and cleaning text variables
tolower
andtoupper
str_trim
(as well asstr_squish
andstr_pad
)gsub
andifelse
Data Mgmt for HW 5 - Part 1 Dashboard
Review of Previous Data Mgmt skills used
filter
andselect
group_by
andsummarize
pivot_wider
andpivot_longer
Note: This week we will cover how to use pivot_wider
and pivot_longer
to complete and incomplete dataset.
In HW 5 - Part 1:
Not needed for Netflix Movie data even though these data are not fully complete.
Only the first year is incomplete and there are relatively little data in the first year.
HW 5 - Part 1 - Page 1
Value Boxes and Interactive Time Series Plots
Review of importing, managing, and plotting
xts
data.xts
stands for extensible time series
Value Boxes do not have to be edited.
Multiple
xts
plot objects can be displayed usinghw_grid
- in HW 5 plots are shown seprately without
hw_grid
.
- in HW 5 plots are shown seprately without
You should also review (Not in Page 1):
Converting a
tibble
to anxts
Converting an
xts
to atibble
New Skills
HW 5 - Part 1 - Page 2 - Data Mgmt.
Managing data for Grouped Stacked Barplots with Facets and Summary Table
Previously Covered Skills
pivot_longer
to reshapenflx_mv
dataifelse
is used to simplifyrelease period
datamutate
andfactor
is used to create two factor variables with specified levelsfilter
is used to subset the data to release periods with enough data.
New Skills - Page 2 Plot is a grouped stack bar plot using
facet_grid
.facet_grid(~release_period)
specifies side-by-side plots for each release period.scale_fill_brewer(palette = "Spectral")
specifies the color palette.Theme and palette options (Part of Optional Extra Credit)
Palette options and another link with a comprehensive list of palettes
HW 5 - Part 1 - Page 2 - Bar Plot and Table
Page 2 - Bar Plot
Be aware of how the barplot is specified in the
geom_barplot
statement.Stacked bars were used to save space.
Page 2 Summary Table
The skills used to create to select and summarize data are review.
The table is presented using
kable
(also review).Note that for projects, you can refine your table further, if needed, using commands from
kableExtra
package.An alternative option for publishing tables is the
gt
package.
HW 5 - Part 1 - Page 2 - Area Plot
Useful new Skill:
summing across rows:
mutate(total = sum(c_across(comedies:kids)))
sometimes used with
rowwise()
andungroup
but that is not needed here
Area Plot (Review)
Similar to line plot but uses
geom_area(aes(x = , y = , fill= ))
scale_x_continuous(breaks = )
used to show all years in datascale_fill_brewer
used to specify same palette as Page 2
Week 9 In-class Exercises
- This short exercise demonstrates why a complete dataset with all combination levels is often needed for area plots.
Save R
gtcars
dataset to Global Environment and examine the data.- Note: This dataset comes from the gt package.
Week 9 In-class Exercises Cont’d
Select the following variables
- year, bdy_style (body style), drivetrain (drive train)
Group data by year, body style, and drive train and count number of observations
- command to summarize number of observations is
n()
- command to summarize number of observations is
Code
`summarise()` has grouped output by 'year', 'bdy_style'. You can override using
the `.groups` argument.
# A tibble: 6 × 4
# Groups: year, bdy_style [5]
year bdy_style drivetrain n
<dbl> <chr> <chr> <int>
1 2014 coupe rwd 2
2 2015 convertible rwd 2
3 2015 coupe awd 3
4 2015 coupe rwd 4
5 2016 convertible rwd 2
6 2016 coupe awd 3
Week 9 In-class Exercises Con’t
Use pivot_wider to create a column for each drivetrain
Replace missing values with 0
Code
# A tibble: 6 × 4
# Groups: year, bdy_style [6]
year bdy_style rwd awd
<dbl> <chr> <int> <int>
1 2014 coupe 2 0
2 2015 convertible 2 0
3 2015 coupe 4 3
4 2016 convertible 2 0
5 2016 coupe 13 3
6 2016 hatchback 0 2
Week 9 In-class Exercises - Q1
Session ID: bua455s25
- Sum across rows to total data from both drive trains
Q1: What is the command to sum across rows?
Data with total
column
# A tibble: 10 × 5
year bdy_style rwd awd total
<dbl> <chr> <int> <int> <int>
1 2014 coupe 2 0 2
2 2015 convertible 2 0 2
3 2015 coupe 4 3 7
4 2016 convertible 2 0 2
5 2016 coupe 13 3 16
6 2016 hatchback 0 2 2
7 2016 sedan 5 2 7
8 2017 convertible 1 0 1
9 2017 coupe 5 2 7
10 2017 sedan 0 1 1
Week 9 In-class Exercises - Q2
Session ID: bua455s25
- Select year, body style and total
Q2: How many observations do you have?
Q3: If every body style was made in every year, how many observations would you have?
There are 4 body styles and 4 years represented in these data.
Also in this chunk: Command to capitalize first letter of words in a string:
str_to_title
Code
Rows: 10
Columns: 3
$ year <dbl> 2014, 2015, 2015, 2016, 2016, 2016, 2016, 2017, 2017, 2017
$ bdy_style <chr> "Coupe", "Convertible", "Coupe", "Convertible", "Coupe", "Ha…
$ total <int> 2, 2, 7, 2, 16, 2, 7, 1, 7, 1
Fill in incomplete dataset using
# A tibble: 10 × 3
year bdy_style total
<dbl> <chr> <int>
1 2014 Coupe 2
2 2015 Convertible 2
3 2015 Coupe 7
4 2016 Convertible 2
5 2016 Coupe 16
6 2016 Hatchback 2
7 2016 Sedan 7
8 2017 Convertible 1
9 2017 Coupe 7
10 2017 Sedan 1
Reshaping Data creates a 4x4 matrix with NA
s
Code
# A tibble: 4 × 5
year Coupe Convertible Hatchback Sedan
<dbl> <int> <int> <int> <int>
1 2014 2 NA NA NA
2 2015 7 2 NA NA
3 2016 16 2 2 7
4 2017 7 1 NA 1
Reshaping data back to long format keeps NA
s, categories with no observations
# A tibble: 16 × 3
year bdy_style total
<dbl> <chr> <int>
1 2014 Coupe 2
2 2014 Convertible NA
3 2014 Hatchback NA
4 2014 Sedan NA
5 2015 Coupe 7
6 2015 Convertible 2
7 2015 Hatchback NA
8 2015 Sedan NA
9 2016 Coupe 16
10 2016 Convertible 2
11 2016 Hatchback 2
12 2016 Sedan 7
13 2017 Coupe 7
14 2017 Convertible 1
15 2017 Hatchback NA
16 2017 Sedan 1
These data values are number of observations so NA
s can be replaced with 0
s.
# A tibble: 16 × 3
year bdy_style total
<dbl> <chr> <int>
1 2014 Coupe 2
2 2014 Convertible 0
3 2014 Hatchback 0
4 2014 Sedan 0
5 2015 Coupe 7
6 2015 Convertible 2
7 2015 Hatchback 0
8 2015 Sedan 0
9 2016 Coupe 16
10 2016 Convertible 2
11 2016 Hatchback 2
12 2016 Sedan 7
13 2017 Coupe 7
14 2017 Convertible 1
15 2017 Hatchback 0
16 2017 Sedan 1
Complete code is relatively short and efficient.
Factors should be created AFTER this process, if needed.
# A tibble: 16 × 3
year bdy_style total
<dbl> <chr> <int>
1 2014 Coupe 2
2 2014 Convertible 0
3 2014 Hatchback 0
4 2014 Sedan 0
5 2015 Coupe 7
6 2015 Convertible 2
7 2015 Hatchback 0
8 2015 Sedan 0
9 2016 Coupe 16
10 2016 Convertible 2
11 2016 Hatchback 2
12 2016 Sedan 7
13 2017 Coupe 7
14 2017 Convertible 1
15 2017 Hatchback 0
16 2017 Sedan 1
Comparison of Area Plots
The plot on the left was made with the incomplete data.
The plot on the right was made with complete data where all levels of the variables shown are in the dataset.
Key Points from This Week
Data Management Skills - Some Review and Some New
summarizing number of observations by category using
n()
Using
sum(c_across())
to sum across rowsKnowing how to complete data that is incomplete
Next Week:
HW 5 - Part 1 is due Wednesday, 3/26.
Practice questions for Quiz 2 are posted.
- Quiz is on Thu. 4/3.
Managing and presenting geographic data in choropleth maps (Not on Quiz 2).
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.