---
title: "Week 12"
subtitle: "More on Geographic Data, Project Management, and Publishing a Dashboard"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r include=F}
#|label: setup
knitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunks
options(scipen=100) # suppress scientific notation
# install pacman if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
pacman::p_load(pacman, tidyverse, ggthemes, gridExtra, magrittr,
kableExtra, RColorBrewer, maps, usdata, countrycode,
mapproj, shadowtext, grid)
# install and load required packages
p_loaded() # verify loaded packages
```
- Final Proposals and Group Projects
- Your participation in the final proposal and plan
will affect your final grade on the project.
- Group members that don't contribute will not get
credit for the work done by others in the group.
- If you have data management questions, reach out to
myself or a course TA.
- We are here to help with tasks where you might be
stymied, but don't wait until the last day.
- Presentations will be on Thursday 12/4 and Tuesday 12/9.
- Only presentation dashboards are due on 12/4.
- All students are required to attend and provide
feedback.
## More Housekeeping
- HW 5 - Part 2 is now posted. This assignment is done on
Posit Cloud to give you experience with this platform.
- There is a 2 day grace period, if needed.
- This is a short assignment that covers some final
essential skills for your dashboard project.
- Quiz 2 grading is almost done. There are a couple
students taking a make-up this evening.
## Plans for this week
- Two lectures on Geographic Data have been streamlined so
that students also have time for group work this week.
- I will not cover them all in detail
- Rather than deleting notes and code that might be useful
to some students all notes are provided.
::: fragment
**Topics Covered**
:::
- Geographic Data: world data, state data, and filtering
map data to a region
- Publishing work: More tips for good project management
- Posting HTML files for free using
[**Rpubs**](https://rpubs.com/)
- Note: [**Rpubs**](https://rpubs.com/) is the
most common option for presenting and submitting
your dashboard.
- HW 5 - Part 2 Demo on Posit Cloud
- **There will be time to work on projects in class this
week and next week.**
## Importing and Joining World Datasets
**World Data**
```{r world data prep}
world <- map_data("world") |> select(!subregion) |> # world geo info
mutate(region=ifelse(region=="UK", "United Kingdom", region))
intbxo <- read_csv("data/intl_bxo.csv", show_col_types = F, skip=6) |> # import/tidy bxo
select(1,6) |>
rename("region" = "Area", "wknd_gross" = "Weekend Gross") |>
filter(!is.na(wknd_gross)) |>
mutate(wknd_gross = gsub("$", "", wknd_gross, fixed = T),
wknd_gross = gsub(",", "", wknd_gross, fixed = T) |> as.numeric())
world_bxo_data <- full_join(intbxo, world) |> # join datasets
filter(!is.na(wknd_gross))
world_bxo_data$continent = countrycode(sourcevar = world_bxo_data$region, # retrieve continents
origin = "country.name",
destination = "continent")
head(world_bxo_data, 3)
```
## Choropleth Country Plot w/ Labels
::: fragment
**Example - Asia**
:::
- **Most** of the plot code that follows is review
- There are a few new details:
- `shadowtext` labels (see below)
- modifying size of text elements (mentioned but
not emphasized)
- **NOTES:**
- The R package `shadowtext` includes the command
`geom_shadowtext`
- `shadowtext` is useful for creating visible labels
for all countries regardless of map fill color
- Deciding on units (\$1000) and transformation
(`log`) took some trial and error.
## Managing Data for Asia Chropleth Map
**This R code creates the Asia Map dataset.**
```{r asia data for map}
asia_bxo_data <- world_bxo_data |> # create asia box office dataset
filter(continent=="Asia") |>
mutate(Gross = as.integer(wknd_gross),
wknd_gross = wknd_gross/1000)
asia_nms <- asia_bxo_data |> # create dataset of country names
select(region, long, lat, group, continent) |> # median lat and long
# used for label positions
group_by(continent, region) |>
summarize(nm_x=median(long, na.rm=T),
nm_y=median(lat, na.rm=T)) |>
filter(!is.na(nm_x) | !is.na(nm_y))
asia_bxo_data <- full_join(asia_bxo_data, asia_nms) # merge datasets using an inner_join
```
## R code for Asia Choropleth Map
Data are shown on log scale to improve interpretability.
```{r asia static map code}
asia_bxo_map <- asia_bxo_data |> # Creates the map that follows
ggplot(aes(x=long, y=lat, group=group, fill=wknd_gross)) +
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
labs(fill= "Gross ($1K)",
title="Weekend Gross ($ Thousands) in Asian Countries",
subtitle="Weekend Data Updated 11/10/25 - Data are Log-transformed",
caption="Data Source: https://www.boxofficemojo.com") +
scale_fill_continuous(type = "viridis", trans="log",
breaks =c(1,10,100,1000,10000)) +
geom_shadowtext(aes(x=nm_x, y=nm_y,label=region),
color="white",check_overlap = T,
show.legend = F, size=4) +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
```
##
### Asia Map with Log (LN) Transformation
```{r fig.dim=c(15,7), echo=F, warning=F}
asia_bxo_map
```
## Europe Map Data
Creates data for Europe Map
```{r europe data for map}
euro_bxo_data <- world_bxo_data |> # create Europe box office dataset
filter(continent=="Europe" & region != "Russia") |>
mutate(Gross = as.integer(wknd_gross),
wknd_gross = wknd_gross/1000)
euro_nms <- euro_bxo_data |> # create dataset of country names
select(region, long, lat, group, continent) |> # median lat and long used for position
group_by(continent, region) |>
summarize(nm_x=median(long, na.rm=T),
nm_y=median(lat, na.rm=T)) |>
filter(!is.na(nm_x) | !is.na(nm_y))
euro_bxo_data <- full_join(euro_bxo_data, euro_nms) # merge datasets using an inner_join
```
## R code for Europe Choropleth Map
Data are shown on log scale to improve interpretability.
```{r europe static map code}
euro_bxo_map <- euro_bxo_data |>
ggplot(aes(x=long, y=lat,
group=group,
fill=wknd_gross)) +
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
labs(fill= "Gross ($1K)",
title="Weekend Gross ($ Thousands) in European Countries",
subtitle="Weekend Data Updated 11/10/25 - Data are Log-transformed",
caption="Data Source: https://www.boxofficemojo.com") +
scale_fill_continuous(type = "viridis", trans="log",
breaks =c(1,10,100,1000,10000)) +
geom_shadowtext(aes(x=nm_x, y=nm_y,label=region),
color="white",check_overlap = T,
show.legend = F, size=4) +
theme(plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15))
```
##
### Europe Map with Log (LN) Transformation
```{r fig.dim=c(15,7), echo=F, warning=F, fig.align='center', warning=F}
euro_bxo_map
```
##
### Week 12 In-class Exercises - Q1-Q3
[***Poll
Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} -
My User Name: **penelopepoolereisenbies685**
<br>
**Question 1.** What option is used in `geom_polygon()` to
create the outlines of each country?
<br>
**Question 2.** How many different geometries (`geom_...`)
are used to create these multi-layer maps?
<br>
**Question 3.** When using multiple geometry layers, where
do you place the aesthetic, (`aes`) so that it will apply to
all of the geometries (all of the map layers)?
## US State Data Example
- Examples of Data that can be plotted by state
- Average costs and expenditures by state of specific
goods or services
- Demographic data
- Voting and tex information
- Sports/Arts/Entertainment/Education investments and
expenditures
- Will also show a map of data filtered by region
## US State Map Data
```{r combine state polygons with state population data from R}
us_states <- map_data("state") |> # state polygons (from R)
select(long:region) |>
rename("state" = "region")
state_abbr <- state_stats |> # many useful variables in this dataset
select(state, abbr) |>
mutate(state = tolower(state))
state_pop <- county_2019 |> # data by county (aggregated by state)
select(state, pop) |>
mutate(state=tolower(state),
popM = pop/1000000) |>
group_by(state) |>
summarize(st_popM = sum(popM, na.rm=T)) |>
full_join(state_abbr)
statepop_map <- left_join(us_states, state_pop) # used left join to filter to lower 48 states
# lat/long not available for Hi and AK
```
##
### Adding State Midpoint (centroid) Lat and Long
- In the previous maps (by country) country labels were
added to the static map using each polygon's (country)
median latitude and longitude
- Medians don't work well for U.S. because many states are
oddly shaped and small.
- Alternative: [use centroid for each state
polygon](https://www.latlong.net/category/states-236-14.html)
- Centroid is another term for midpoint
- Saved data as .csv file named `state_coords.csv`
(included)
- Data did not include D.C. but those coordinates
were found elsewhere
- D.C. data is appended to other states using
`bind_rows`
- `state_coords` (centroids) were joined with
state demographics data, `statepop_map`.
- Final dataset for plot created: `statepop_map`
## Code for Addings Centroids to data
```{r add lat and long of state midpoints (centroid)}
state_coords <- read_csv("data/state_coords.csv", show_col_types = F,
col_names = c("state", "m_lat", "m_long")) |>
mutate(state = gsub(", USA", "", state, fixed=T),
state = gsub(", the USA", "", state, fixed=T),
state = gsub(", the US", "", state, fixed=T),
state = tolower(state))
state <- "district of columbia" # save values for dc
m_lat <- 38.9072
m_long <- -77.0369
dc <- tibble(state, m_lat, m_long) # create dataset of dc data ( 1 obs)
state_coords <- bind_rows(state_coords, dc) # add dc to state_coords
rm(dc, state, m_lat, m_long) # remove temporary values from global
statepop_map <- left_join(statepop_map, state_coords) # centroids to data
```
## State Population Plot
- Similar to previous plots with a few changes
- Added borders to states by adding `color="darkgrey"`
to `geom_polygon` command.
- Used State abbreviations for state labels.
- Made State text labels smaller (Size = 2)
- Changed breaks for log scaled population legend
- These details seem minor but they take time and trial
and error.
##
### R Code for US State Pop. Map (no transformation)
```{r code for us states pop map no transformation}
st_pop <- statepop_map |>
ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) +
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_continuous(type = "viridis") +
geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr),
color="white", check_overlap = T,
show.legend = F, size=4) +
labs(fill= "Pop. in Millions", title="Population by State",
subtitle="Unit is 1 Million People",
caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million
Data Source: https://CRAN.R-project.org/package=usdata") +
theme(legend.position = "bottom",
legend.key.width = unit(1, "cm"),
plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 15),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15))
```
##
### US State Pop. Map (no transformation)
```{r us states pop map no transformation, echo=F, fig.dim=c(15,7), fig.align='center'}
st_pop
```
##
### R Code for US State Pop. Map (log transformed)
```{r code for us states pop map with log transformation}
st_lpop <- statepop_map |>
ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) +
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_continuous(type = "viridis", trans="log",
breaks=c(0,1,2,3,5,10,20,35)) +
geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr),
color="white", check_overlap = T,
show.legend = F, size=4) +
labs(fill= "Pop. in Millions", title="Population by State",
subtitle="Unit is 1 Million People - Log Transformed",
caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million
Data Source: https://CRAN.R-project.org/package=usdata") +
theme(legend.position = "bottom",
legend.key.width = unit(1, "cm"),
plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 15),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15))
```
##
### US State Pop. Map (log transfomed)
```{r us states pop map log transformation, echo=F, fig.dim=c(15,7), fig.align='center'}
st_lpop
```
## To log or not to log
:::::: columns
::: {.column width="48%"}
In this course, we visualize data using `ggplot`, `hchart`,
`dygraph`
If you want to explore but (not present) data, you can also
use base graphics for quick plots
- Base graphics could also be used to make polished
visualizations but the code is much longer and more
tedious than `ggplot`
:::
::: {.column width="4%"}
:::
::: {.column width="48%"}
```{r base graphics plots, fig.dim=c(5, 6), fig.align='center', out.extra='style="background-color: #3D3D3D; padding:1px;"'}
par(mfrow=c(2,1)) # stacks base graph plots
hist(statepop_map$st_popM, main="")
hist(log(statepop_map$st_popM), main="")
par(mfrow=c(1,1)) # resets base graph options
```
:::
::::::
## Filtering a Map to a Region
- Map techniques above can also be used for a region
- Demo that follows uses an education dataset with data
filtered to 10 Northeastern states
::: fragment
```{r import modify filter education data}
edu <- read_csv("data/education by state.csv", skip=3, show_col_types = F, # import data
col_names = c("state", "pop_over_25", "pop_hs", "pct_hs",
"pop_bachelor", "pct_bachelor",
"pop_advanced","pct_advanced"))
edu1 <- edu |>
select(state, pop_bachelor, pct_bachelor) |>
mutate(state = str_trim(state) |> tolower(),
pop_bachelor1K = pop_bachelor/1000,
pct_bachelor = gsub("%","", pct_bachelor, fixed = T) |> as.numeric()) |>
filter(state %in% c("maine", "massachusetts", "connecticut" , "rhode island",
"vermont", "new hampshire", "new york", "new jersey", "pennsylvania",
"delaware")) |> glimpse()
```
:::
## Exploratory Bachelor Degree Data Plots
::::::: columns
:::: {.column width="45%"}
::: r-fit-text
**Base R Plot Code:**
```
# par command for two rows no margins
par(mar = c(0, 0, 0, 0), mfrow=c(2,1))
hist(edu1$pop_bachelor1K, main="")
plot(edu1$pop_bachelor1K, edu1$pct_bachelor, main="")
```
<br>
- `hist` is used to create an exploratory histogram.
- `plot` is used to create an exploratory scatterplot.
- `main=""` suppressed the default title
- Formatting base graphics plots is much more tedious than
using ggplot.
:::
::::
::: {.column width="2%"}
:::
::: {.column width="53%"}
```{r base R scatterplot and histogram, echo=FALSE, fig.dim=c(6,6)}
par(mar = c(0, 0, 0, 0), mfrow=c(2,1)) #two rows
hist(edu1$pop_bachelor1K, main="")
plot(edu1$pop_bachelor1K, edu1$pct_bachelor, main="")
```
:::
:::::::
##
### Week 12 In-class Exercises - Q4-Q5
[***Poll
Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} -
My User Name: **penelopepoolereisenbies685**
<br>
**Question 4.** What exploratory plot command (base R code
shown) is good for checking if the variable you want to plot
is right skewed and might need to be log transformed?
<br>
**Question 5.** Based on the histogram for the northeastern
area of the U.S, which includes only 10 states, do these
data appear skewed?
## Add Education Data to Map Data
In the chunk below we start from scratch with state data.
This chunk does not depend on the data being imported and
managed in a previous chunk.
```{r join edu data with state map and state abbr data}
us_states <- map_data("state") |> # state polygons (from R)
select(long:region) |> rename("state" = "region")
state_abbr <- state_stats |> # state abbreviations
select(state, abbr) |> mutate(state = tolower(state))
edu1 <- left_join(edu1, state_abbr) # left join to maintain filter to NE states
edu_NE_map <- left_join(edu1, us_states) # left join to maintain filter to NE states
state_coords <- read_csv("data/state_coords.csv", show_col_types = F, # add in state midpoints (centroids)
col_names = c("state", "m_lat", "m_long")) |>
mutate(state = gsub(", USA", "", state, fixed=T),
state = gsub(", the USA", "", state, fixed=T),
state = gsub(", the US", "", state, fixed=T),
state = tolower(state))
edu_NE_map <- left_join(edu_NE_map, state_coords) # left join to maintain filter to NE states
```
## Code for Regional Map 1
**Population with Bachelor's Degree**\
[Data Source -
Wikipedia](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_educational_attainment)
```{r NE edu map pop}
ne_edu_pop <- edu_NE_map |>
ggplot(aes(x=long, y=lat, group=group, fill=pop_bachelor1K)) + # pop in 1000s
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_continuous(type = "viridis", trans="log", # log transformation
breaks = c(100, 500, 1000, 5000)) +
geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr),
color="white", check_overlap = T, show.legend = F, size=4) +
labs(fill= "Unit: 1000 People",
title="NE States: Pop. with a Bachelor's Degree") +
theme(legend.position = "bottom",
legend.key.width = unit(1, "cm"),
plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 15),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15))
```
## Code for Regional Map 2
Percentage of People with Bachelor's Degree [Data Source -
Wikipedia](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_educational_attainment)
```{r NE edu map pct}
ne_edu_pct <- edu_NE_map |>
ggplot(aes(x=long, y=lat, group=group, fill=pct_bachelor)) + # percent data
geom_polygon(color="darkgrey") +
theme_map() +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_continuous(type = "viridis", # no transformation needed
breaks = c(32, 34, 36, 38, 40, 42, 44)) +
geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr),
color="white", check_overlap = T, show.legend = F, size=4) +
labs(fill= "Unit: %", title="NE States: Percent with a Bachelor's Degree") +
theme(legend.position = "bottom",
legend.key.width = unit(1, "cm"),
plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 15),
plot.caption = element_text(size = 15),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15))
```
## Pop. and Pcnt. Plots Side by Side
```{r display of NE pop and pct maps, fig.dim=c(15,7), fig.align='center', echo=FALSE}
grid.arrange(ne_edu_pop, ne_edu_pct, ncol=2)
grid.rect(width = .98, height = .98, gp = gpar(lwd = 2, col = "darkgrey", fill = NA))
```
## Managing Projects
- Some of this should be review
- next week, we will talk about managing a long term
consulting project
- Managing files over time
- Segmenting and rejoining poorly formatted data
- Documenting steps as you progress
- Addressing client needs as they eveolve and update
requests
- Documentation is key
- Take good notes and keep README file updated
- I use Markdown or Quarto files for everything, even work
I don't present to client.
- Ideal format for writing notes between code chunks
## BUA 455 Project File Conventions
- **Main Project Folder:**
- Dashboard `qmd` file (Quarto file)
- Dashboard .html file (Dashboard presentation)
- Project `.rproj` file that makes folder into an R
project.
- `README.txt` file that includes an organized of all
files you created or saved.
- Other files created when when .qmd file is rendered.
- These 'byproduct' files do not need to be listed
in the README.
- **Data (`data`) Folder:**
- All raw .csv files needed (No data management should be
done in Excel!)
- **Images (`img`) Folder:**
- Any .png or other graphics files needed
- **OPTIONAL:** Extraneous **useful** code can be saved in
a separate folder within the project.
## Rpubs Exercise
- **RPubs** (mentioned earlier in this set of slides)
- If you want to publish your dashboard or any HTML file
you create in R, you can do so for free.
- R has a public online repository called
[**RPubs**](https://rpubs.com/).
- **Rpubs** is very useful if you want post an html file
online and provide the link to it.
- I Use **RPubs** for slides in this course and it is
useful if for work like the project dashboards.
- As an in class exercise, I will ask you each to create
an account and publish your HW 5 - Part 1 dashboard html
file.
- This exercise will be useful because it allows you
to see how this publication process works.
- You will see how publishing changes the
appearance of your panels and text.
- Once you post your final dashboard you may want
to include it as a link in your resume and/or
LinkedIn profile.
## In-class Exercise
1. Open your HW 5 - Part 1.Rmd file and render it to create
your dashboard.
- Make sure this file has your name in the header.
- If you don't have HW 5 - Part 1 done, you can use the
[Posit Cloud version of `HW 5 - Part 1` provided for HW
5 - Part
2](https://posit.cloud/content/10125821){target="_blank"}
2. Click the **Publish Icon** ,
create a free account, and publish your html file.
- If RStudio asks to install additional packages to
complete the publishing process, click `Yes`.
3. Submit the link to your published file on Blackboard.
- A Link to your published file must be submitted by
Friday at midnight to count for class participation
for today's lecture.
## Next Week - Additional Topics
- Ask me questions about your project (Others may benefit)
- I have some short essential and some optional topics
including:
- details and recommendations for writing both project
memos.
- Memos will be written as word documents in
Quarto (`.qmd`).
- managing a consulting project from beginning to end.
- formatting complex tables using the `gt` package.
- Rendering Quarto files to different formats: word,
Powerpoint, etc.
## Additional Topics Continued
**Review of Skillset Terminology**
::: fragment
Now that you are (almost) done with BUA 455, and more so
when you graduate, you have a very useful set of skills.
:::
- Explaining these skills to others is a challenge.
- I will spend a little time talking about how to explain
those skills to other people
- Preview: It took me decades to figure out how to talk
about what I do, in part, because this discipline was
more obscure.
- Increased interest in Data Science and Analytics has
resulted in better terminology.
- [White Paper from DataCamp provides an excellent
blueprint](https://drive.google.com/file/d/1_VoM3D6tPftjZpXCnTL8SKYBlOM_4KjG/view?usp=sharing)
##
### Key Points from This Week
::: fragment
**More with Geographic Data**
:::
- Adding Shadow Text
- Filtering Map Data and Comparing Variables
::: fragment
**Project Management**
:::
- Review of skills covered throughout course
- Managing data projects this way is beneficial
::: fragment
**Publishing Work on RPubs**
:::
- Useful for publishing and linking to work
::: fragment
You may submit an 'Engagement Question' about each lecture
until midnight on the day of the lecture. **A minimum of
four submissions are required during the semester.**
:::