Warning: There was 1 warning in `mutate()`.
ℹ In argument: `wknd_gross = as.numeric(gsub(",", "", wknd_gross, fixed = T))`.
Caused by warning:
! NAs introduced by coercion
Code
```{r world data prep}world_bxo_data <- full_join(intbxo, world) |> # join datasets filter(!is.na(wknd_gross))```
Joining with `by = join_by(region)`
Code
```{r world data prep}world_bxo_data$continent = countrycode(sourcevar = world_bxo_data$region, # retrieve continents origin = "country.name", destination = "continent") ```
Warning: Some values were not matched unambiguously: Central America, Middle East Other, Serbia and Montenegro
Code
```{r world data prep}head(world_bxo_data, 3)```
# A tibble: 3 × 7
region wknd_gross long lat group order continent
<chr> <dbl> <dbl> <dbl> <dbl> <int> <chr>
1 United Kingdom 19000000 -1.07 50.7 570 40057 Europe
2 United Kingdom 19000000 -1.15 50.7 570 40058 Europe
3 United Kingdom 19000000 -1.18 50.6 570 40059 Europe
Choropleth Country Plot w/ Labels
Example - Asia
Most of the plot code that follows is review
There are a few new details:
shadowtext labels (see below)
modifying size of text elements (mentioned but not emphasized)
NOTES:
The R package shadowtext includes the command geom_shadowtext
shadowtext is useful for creating visible labels for all countries regardless of map fill color
Deciding on units ($1000) and transformation (log) took some trial and error.
Managing Data for Asia Chropleth Map
This R code creates the Asia Map dataset.
Code
```{r asia data for map}asia_bxo_data <- world_bxo_data |> # create asia box office dataset filter(continent=="Asia") |> mutate(Gross = as.integer(wknd_gross), wknd_gross = wknd_gross/1000) asia_nms <- asia_bxo_data |> # create dataset of country names select(region, long, lat, group, continent) |> # median lat and long # used for label positions group_by(continent, region) |> summarize(nm_x=median(long, na.rm=T), nm_y=median(lat, na.rm=T)) |> filter(!is.na(nm_x) | !is.na(nm_y))```
`summarise()` has grouped output by 'continent'. You can override using the
`.groups` argument.
Code
```{r asia data for map}asia_bxo_data <- full_join(asia_bxo_data, asia_nms) # merge datasets using an inner_join```
Joining with `by = join_by(region, continent)`
R code for Asia Choropleth Map
Data are shown on log scale to improve interpretability.
```{r europe data for map}euro_bxo_data <- world_bxo_data |> # create Europe box office dataset filter(continent=="Europe" & region != "Russia") |> mutate(Gross = as.integer(wknd_gross), wknd_gross = wknd_gross/1000) euro_nms <- euro_bxo_data |> # create dataset of country names select(region, long, lat, group, continent) |> # median lat and long used for position group_by(continent, region) |> summarize(nm_x=median(long, na.rm=T), nm_y=median(lat, na.rm=T)) |> filter(!is.na(nm_x) | !is.na(nm_y))```
`summarise()` has grouped output by 'continent'. You can override using the
`.groups` argument.
Code
```{r europe data for map}euro_bxo_data <- full_join(euro_bxo_data, euro_nms) # merge datasets using an inner_join```
Joining with `by = join_by(region, continent)`
R code for Europe Choropleth Map
Data are shown on log scale to improve interpretability.
Question 1. What option is used in geom_polygon() to create the outlines of each country?
Question 2. How many different geometries (geom_...) are used to create these multi-layer maps?
Question 3. When using multiple geometry layers, where do you place the aesthetic, (aes) so that it will apply to all of the geometries (all of the map layers)?
US State Data Example
Examples of Data that can be plotted by state
Average costs and expenditures by state of specific goods or services
Demographic data
Voting and tex information
Sports/Arts/Entertainment/Education investments and expenditures
Will also show a map of data filtered by region
US State Map Data
Code
```{r combine state polygons with state population data from R}us_states <- map_data("state") |> # state polygons (from R) select(long:region) |> rename("state" = "region")state_abbr <- state_stats |> # many useful variables in this dataset select(state, abbr) |> mutate(state = tolower(state))state_pop <- county_2019 |> # data by county (aggregated by state) select(state, pop) |> mutate(state=tolower(state), popM = pop/1000000) |> group_by(state) |> summarize(st_popM = sum(popM, na.rm=T)) |> full_join(state_abbr)```
Joining with `by = join_by(state)`
Code
```{r combine state polygons with state population data from R}statepop_map <- left_join(us_states, state_pop) # used left join to filter to lower 48 states```
Joining with `by = join_by(state)`
Code
```{r combine state polygons with state population data from R}# lat/long not available for Hi and AK```
Adding State Midpoint (centroid) Lat and Long
In the previous maps (by country) country labels were added to the static map using each polygon’s (country) median latitude and longitude
Medians don’t work well for U.S. because many states are oddly shaped and small.
Saved data as .csv file named state_coords.csv (included)
Data did not include D.C. but those coordinates were found elsewhere
D.C. data is appended to other states using bind_rows
state_coords (centroids) were joined with state demographics data, statepop_map.
Final dataset for plot created: statepop_map
Code for Addings Centroids to data
Code
```{r add lat and long of state midpoints (centroid)}state_coords <- read_csv("data/state_coords.csv", show_col_types = F, col_names = c("state", "m_lat", "m_long")) |> mutate(state = gsub(", USA", "", state, fixed=T), state = gsub(", the USA", "", state, fixed=T), state = gsub(", the US", "", state, fixed=T), state = tolower(state))state <- "district of columbia" # save values for dcm_lat <- 38.9072m_long <- -77.0369dc <- tibble(state, m_lat, m_long) # create dataset of dc data ( 1 obs)state_coords <- bind_rows(state_coords, dc) # add dc to state_coordsrm(dc, state, m_lat, m_long) # remove temporary values from globalstatepop_map <- left_join(statepop_map, state_coords) # centroids to data```
Joining with `by = join_by(state)`
State Population Plot
Similar to previous plots with a few changes
Added borders to states by adding color="darkgrey" to geom_polygon command.
Used State abbreviations for state labels.
Made State text labels smaller (Size = 2)
Changed breaks for log scaled population legend
These details seem minor but they take time and trial and error.
R Code for US State Pop. Map (no transformation)
Code
```{r code for us states pop map no transformation}st_pop <- statepop_map |> ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis") + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Pop. in Millions", title="Population by State", subtitle="Unit is 1 Million People", caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million Data Source: https://CRAN.R-project.org/package=usdata") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15)) ```
US State Pop. Map (no transformation)
R Code for US State Pop. Map (log transformed)
Code
```{r code for us states pop map with log transformation}st_lpop <- statepop_map |> ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis", trans="log", breaks=c(0,1,2,3,5,10,20,35)) + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Pop. in Millions", title="Population by State", subtitle="Unit is 1 Million People - Log Transformed", caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million Data Source: https://CRAN.R-project.org/package=usdata") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15)) ```
US State Pop. Map (log transfomed)
To log or not to log
In this course, we visualize data using ggplot, hchart, dygraph
If you want to explore but (not present) data, you can also use base graphics for quick plots
Base graphics could also be used to make polished visualizations but the code is much longer and more tedious than ggplot
Code
```{r base graphics plots, fig.dim=c(5, 6), fig.align='center', out.extra='style="background-color: #3D3D3D; padding:1px;"'}par(mfrow=c(2,1)) # stacks base graph plotshist(statepop_map$st_popM, main="")hist(log(statepop_map$st_popM), main="")par(mfrow=c(1,1)) # resets base graph options```
Filtering a Map to a Region
Map techniques above can also be used for a region
Demo that follows uses an education dataset with data filtered to 10 Northeastern states
Question 4. What exploratory plot command (base R code shown) is good for checking if the variable you want to plot is right skewed and might need to be log transformed?
Question 5. Based on the histogram for the northeastern area of the U.S, which includes only 10 states, do these data appear skewed?
Add Education Data to Map Data
In the chunk below we start from scratch with state data. This chunk does not depend on the data being imported and managed in a previous chunk.
Code
```{r join edu data with state map and state abbr data}us_states <- map_data("state") |> # state polygons (from R) select(long:region) |> rename("state" = "region")state_abbr <- state_stats |> # state abbreviations select(state, abbr) |> mutate(state = tolower(state))edu1 <- left_join(edu1, state_abbr) # left join to maintain filter to NE states```
Joining with `by = join_by(state)`
Code
```{r join edu data with state map and state abbr data}edu_NE_map <- left_join(edu1, us_states) # left join to maintain filter to NE states```
Joining with `by = join_by(state)`
Code
```{r join edu data with state map and state abbr data}state_coords <- read_csv("data/state_coords.csv", show_col_types = F, # add in state midpoints (centroids) col_names = c("state", "m_lat", "m_long")) |> mutate(state = gsub(", USA", "", state, fixed=T), state = gsub(", the USA", "", state, fixed=T), state = gsub(", the US", "", state, fixed=T), state = tolower(state))edu_NE_map <- left_join(edu_NE_map, state_coords) # left join to maintain filter to NE states```
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.
Source Code
---title: "Week 12"subtitle: "More on Geographic Data, Project Management, and Publishing a Dashboard"author: "Penelope Pooler Eisenbies"date: last-modifiedlightbox: truetoc: truetoc-depth: 3toc-location: lefttoc-title: "Table of Contents"toc-expand: 1format: html: code-line-numbers: true code-fold: true code-tools: trueexecute: echo: fenced---## Housekeeping```{r include=F}#|label: setupknitr::opts_chunk$set(echo=T, highlight=T) # specifies default options for all chunksoptions(scipen=100) # suppress scientific notation # install pacman if neededif (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")pacman::p_load(pacman, tidyverse, ggthemes, gridExtra, magrittr, kableExtra, RColorBrewer, maps, usdata, countrycode, mapproj, shadowtext, grid) # install and load required packages p_loaded() # verify loaded packages```## Housekeeping- Final Proposals and Group Projects - Your participation in the final proposal and plan will affect your final grade on the project. - Group members that don't contribute will not get credit for the work done by others in the group. - If you have data management questions, reach out to myself or a course TA. - We are here to help with tasks where you might be stymied, but don't wait until the last day.- Presentations will be on 4/28. - Only presentation dashboards are due on 4/28. - All students are required to attend and provide feedback.## More Housekeeping- HW 5 - Part 2 is now posted and I am updating the demo videos because this assignment will be done on Posit Cloud. - There is a 2 day grace period, if needed. - This a short assignment that covers some final essential skills for your dashboard project.- Quiz 2 grading is almost done. There are a couple students taking a make-up this evening.## Plans for this week- Two lectures on Geographic Data have been streamlined so that students also have time for group work this week.- I will not cover them all in detail- Rather than deleting notes and code that might be useful to some students all notes are provided.::: fragment**Topics Covered**:::- Geographic Data: world data, state data, and filtering map data to a region- Publishing work: More tips for good project management - Posting HTML files for free using [**Rpubs**](https://rpubs.com/) - Note: [**Rpubs**](https://rpubs.com/) is the recommended option for presenting and submitting your dashboard- HW 5 - Part 2 Demo on Posit Cloud- **There will be time to work on projects in class this week and next week.**## Importing and Joining World Datasets**World Data**```{r world data prep}world <- map_data("world") |> select(!subregion) |> # world geo info mutate(region=ifelse(region=="UK", "United Kingdom", region))intbxo <- read_csv("data/intl_bxo.csv", show_col_types = F, skip=7) |> # import/tidy bxo select(1,6) |> rename("region" = "Area", "wknd_gross" = "Weekend Gross") |> filter(!is.na(wknd_gross)) |> mutate(wknd_gross = gsub("$", "", wknd_gross, fixed = T), wknd_gross = gsub(",", "", wknd_gross, fixed = T) |> as.numeric())world_bxo_data <- full_join(intbxo, world) |> # join datasets filter(!is.na(wknd_gross))world_bxo_data$continent = countrycode(sourcevar = world_bxo_data$region, # retrieve continents origin = "country.name", destination = "continent") head(world_bxo_data, 3)```## Choropleth Country Plot w/ Labels::: fragment**Example - Asia**:::- **Most** of the plot code that follows is review - There are a few new details: - `shadowtext` labels (see below) - modifying size of text elements (mentioned but not emphasized)- **NOTES:** - The R package `shadowtext` includes the command `geom_shadowtext` - `shadowtext` is useful for creating visible labels for all countries regardless of map fill color - Deciding on units (\$1000) and transformation (`log`) took some trial and error.## Managing Data for Asia Chropleth Map**This R code creates the Asia Map dataset.**```{r asia data for map}asia_bxo_data <- world_bxo_data |> # create asia box office dataset filter(continent=="Asia") |> mutate(Gross = as.integer(wknd_gross), wknd_gross = wknd_gross/1000) asia_nms <- asia_bxo_data |> # create dataset of country names select(region, long, lat, group, continent) |> # median lat and long # used for label positions group_by(continent, region) |> summarize(nm_x=median(long, na.rm=T), nm_y=median(lat, na.rm=T)) |> filter(!is.na(nm_x) | !is.na(nm_y))asia_bxo_data <- full_join(asia_bxo_data, asia_nms) # merge datasets using an inner_join```## R code for Asia Choropleth MapData are shown on log scale to improve interpretability.```{r asia static map code}asia_bxo_map <- asia_bxo_data |> # Creates the map that follows ggplot(aes(x=long, y=lat, group=group, fill=wknd_gross)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "Gross ($1K)", title="Weekend Gross ($ Thousands) in Asian Countries", subtitle="Weekend Data Updated 4/7/25 - Data are Log-transformed", caption="Data Source: https://www.boxofficemojo.com") + scale_fill_continuous(type = "viridis", trans="log", breaks =c(1,10,100,1000,10000)) + geom_shadowtext(aes(x=nm_x, y=nm_y,label=region), color="white",check_overlap = T, show.legend = F, size=4) + theme(plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), legend.title = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)) ```## ### Asia Map with Log (LN) Transformation```{r fig.dim=c(15,7), echo=F, warning=F}asia_bxo_map```## Europe Map DataCreates data for Europe Map```{r europe data for map}euro_bxo_data <- world_bxo_data |> # create Europe box office dataset filter(continent=="Europe" & region != "Russia") |> mutate(Gross = as.integer(wknd_gross), wknd_gross = wknd_gross/1000) euro_nms <- euro_bxo_data |> # create dataset of country names select(region, long, lat, group, continent) |> # median lat and long used for position group_by(continent, region) |> summarize(nm_x=median(long, na.rm=T), nm_y=median(lat, na.rm=T)) |> filter(!is.na(nm_x) | !is.na(nm_y))euro_bxo_data <- full_join(euro_bxo_data, euro_nms) # merge datasets using an inner_join```## R code for Europe Choropleth MapData are shown on log scale to improve interpretability.```{r europe static map code}euro_bxo_map <- euro_bxo_data |> ggplot(aes(x=long, y=lat, group=group, fill=wknd_gross)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "Gross ($1K)", title="Weekend Gross ($ Thousands) in European Countries", subtitle="Weekend Ending 11/10/24 - Data are Log-transformed", caption="Data Source: https://www.boxofficemojo.com") + scale_fill_continuous(type = "viridis", trans="log", breaks =c(1,10,100,1000,10000)) + geom_shadowtext(aes(x=nm_x, y=nm_y,label=region), color="white",check_overlap = T, show.legend = F, size=4) + theme(plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 10), legend.text = element_text(size = 12), legend.title = element_text(size = 15))```## ### Europe Map with Log (LN) Transformation```{r fig.dim=c(15,7), echo=F, warning=F, fig.align='center', warning=F}euro_bxo_map```## ### Week 12 In-class Exercises - Q1-Q3***Session ID: bua455f24***<br>**Question 1.** What option is used in `geom_polygon()` to create the outlines of each country?<br>**Question 2.** How many different geometries (`geom_...`) are used to create these multi-layer maps?<br>**Question 3.** When using multiple geometry layers, where do you place the aesthetic, (`aes`) so that it will apply to all of the geometries (all of the map layers)?## US State Data Example- Examples of Data that can be plotted by state - Average costs and expenditures by state of specific goods or services - Demographic data - Voting and tex information - Sports/Arts/Entertainment/Education investments and expenditures- Will also show a map of data filtered by region## US State Map Data```{r combine state polygons with state population data from R}us_states <- map_data("state") |> # state polygons (from R) select(long:region) |> rename("state" = "region")state_abbr <- state_stats |> # many useful variables in this dataset select(state, abbr) |> mutate(state = tolower(state))state_pop <- county_2019 |> # data by county (aggregated by state) select(state, pop) |> mutate(state=tolower(state), popM = pop/1000000) |> group_by(state) |> summarize(st_popM = sum(popM, na.rm=T)) |> full_join(state_abbr)statepop_map <- left_join(us_states, state_pop) # used left join to filter to lower 48 states # lat/long not available for Hi and AK```## ### Adding State Midpoint (centroid) Lat and Long- In the previous maps (by country) country labels were added to the static map using each polygon's (country) median latitude and longitude- Medians don't work well for U.S. because many states are oddly shaped and small.- Alternative: [use centroid for each state polygon](https://www.latlong.net/category/states-236-14.html) - Centroid is another term for midpoint - Saved data as .csv file named `state_coords.csv` (included) - Data did not include D.C. but those coordinates were found elsewhere - D.C. data is appended to other states using `bind_rows` - `state_coords` (centroids) were joined with state demographics data, `statepop_map`.- Final dataset for plot created: `statepop_map`## Code for Addings Centroids to data```{r add lat and long of state midpoints (centroid)}state_coords <- read_csv("data/state_coords.csv", show_col_types = F, col_names = c("state", "m_lat", "m_long")) |> mutate(state = gsub(", USA", "", state, fixed=T), state = gsub(", the USA", "", state, fixed=T), state = gsub(", the US", "", state, fixed=T), state = tolower(state))state <- "district of columbia" # save values for dcm_lat <- 38.9072m_long <- -77.0369dc <- tibble(state, m_lat, m_long) # create dataset of dc data ( 1 obs)state_coords <- bind_rows(state_coords, dc) # add dc to state_coordsrm(dc, state, m_lat, m_long) # remove temporary values from globalstatepop_map <- left_join(statepop_map, state_coords) # centroids to data```## State Population Plot- Similar to previous plots with a few changes - Added borders to states by adding `color="darkgrey"` to `geom_polygon` command. - Used State abbreviations for state labels. - Made State text labels smaller (Size = 2) - Changed breaks for log scaled population legend- These details seem minor but they take time and trial and error.## ### R Code for US State Pop. Map (no transformation)```{r code for us states pop map no transformation}st_pop <- statepop_map |> ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis") + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Pop. in Millions", title="Population by State", subtitle="Unit is 1 Million People", caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million Data Source: https://CRAN.R-project.org/package=usdata") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15)) ```## ### US State Pop. Map (no transformation)```{r us states pop map no transformation, echo=F, fig.dim=c(15,7), fig.align='center'}st_pop```## ### R Code for US State Pop. Map (log transformed)```{r code for us states pop map with log transformation}st_lpop <- statepop_map |> ggplot(aes(x=long, y=lat, group=group, fill=st_popM)) + geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis", trans="log", breaks=c(0,1,2,3,5,10,20,35)) + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Pop. in Millions", title="Population by State", subtitle="Unit is 1 Million People - Log Transformed", caption= "Not Shown: HI: 1.42 Million AK: 0.74 Million Data Source: https://CRAN.R-project.org/package=usdata") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15)) ```## ### US State Pop. Map (log transfomed)```{r us states pop map log transformation, echo=F, fig.dim=c(15,7), fig.align='center'}st_lpop```## To log or not to log:::::: columns::: {.column width="48%"}In this course, we visualize data using `ggplot`, `hchart`, `dygraph`If you want to explore but (not present) data, you can also use base graphics for quick plots- Base graphics could also be used to make polished visualizations but the code is much longer and more tedious than `ggplot`:::::: {.column width="4%"}:::::: {.column width="48%"}```{r base graphics plots, fig.dim=c(5, 6), fig.align='center', out.extra='style="background-color: #3D3D3D; padding:1px;"'}par(mfrow=c(2,1)) # stacks base graph plotshist(statepop_map$st_popM, main="")hist(log(statepop_map$st_popM), main="")par(mfrow=c(1,1)) # resets base graph options```:::::::::## Filtering a Map to a Region- Map techniques above can also be used for a region- Demo that follows uses an education dataset with data filtered to 10 Northeastern states::: fragment```{r import modify filter education data}edu <- read_csv("data/education by state.csv", skip=3, show_col_types = F, # import data col_names = c("state", "pop_over_25", "pop_hs", "pct_hs", "pop_bachelor", "pct_bachelor", "pop_advanced","pct_advanced")) edu1 <- edu |> select(state, pop_bachelor, pct_bachelor) |> mutate(state = str_trim(state) |> tolower(), pop_bachelor1K = pop_bachelor/1000, pct_bachelor = gsub("%","", pct_bachelor, fixed = T) |> as.numeric()) |> filter(state %in% c("maine", "massachusetts", "connecticut" , "rhode island", "vermont", "new hampshire", "new york", "new jersey", "pennsylvania", "delaware")) |> glimpse()```:::## Exploratory Bachelor Degree Data Plots<center>```{r base R scatterplot and histogram, out.extra='style="background-color: #3D3D3D; padding:1px;"', fig.dim=c(12,7), echo=FALSE}par(mfrow=c(1,2))hist(edu1$pop_bachelor1K, main="")plot(edu1$pop_bachelor1K, edu1$pct_bachelor, main="")par(mfrow=c(1,1))```</center>## ### Week 12 In-class Exercises - Q1-Q3***Session ID: bua455f24***<br>**Question 4.** What exploratory plot command (base R code shown) is good for checking if the variable you want to plot is right skewed and might need to be log transformed?<br>**Question 5.** Based on the histogram for the northeastern area of the U.S, which includes only 10 states, do these data appear skewed?## Add Education Data to Map DataIn the chunk below we start from scratch with state data. This chunk does not depend on the data being imported and managed in a previous chunk.```{r join edu data with state map and state abbr data}us_states <- map_data("state") |> # state polygons (from R) select(long:region) |> rename("state" = "region")state_abbr <- state_stats |> # state abbreviations select(state, abbr) |> mutate(state = tolower(state))edu1 <- left_join(edu1, state_abbr) # left join to maintain filter to NE statesedu_NE_map <- left_join(edu1, us_states) # left join to maintain filter to NE statesstate_coords <- read_csv("data/state_coords.csv", show_col_types = F, # add in state midpoints (centroids) col_names = c("state", "m_lat", "m_long")) |> mutate(state = gsub(", USA", "", state, fixed=T), state = gsub(", the USA", "", state, fixed=T), state = gsub(", the US", "", state, fixed=T), state = tolower(state))edu_NE_map <- left_join(edu_NE_map, state_coords) # left join to maintain filter to NE states```## Code for Regional Map 1**Population with Bachelor's Degree**\[Data Source - Wikipedia](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_educational_attainment)```{r NE edu map pop}ne_edu_pop <- edu_NE_map |> ggplot(aes(x=long, y=lat, group=group, fill=pop_bachelor1K)) + # pop in 1000s geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis", trans="log", # log transformation breaks = c(100, 500, 1000, 5000)) + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Unit: 1000 People", title="NE States: Pop. with a Bachelor's Degree") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15))```## Code for Regional Map 2Percentage of People with Bachelor's Degree [Data Source - Wikipedia](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_educational_attainment)```{r NE edu map pct}ne_edu_pct <- edu_NE_map |> ggplot(aes(x=long, y=lat, group=group, fill=pct_bachelor)) + # percent data geom_polygon(color="darkgrey") + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + scale_fill_continuous(type = "viridis", # no transformation needed breaks = c(32, 34, 36, 38, 40, 42, 44)) + geom_shadowtext(aes(x=m_long, y=m_lat, label=abbr), color="white", check_overlap = T, show.legend = F, size=4) + labs(fill= "Unit: %", title="NE States: Percent with a Bachelor's Degree") + theme(legend.position = "bottom", legend.key.width = unit(1, "cm"), plot.title = element_text(size = 20), plot.subtitle = element_text(size = 15), plot.caption = element_text(size = 15), legend.text = element_text(size = 15), legend.title = element_text(size = 15))```## Pop. and Pcnt. Plots Side by Side```{r display of NE pop and pct maps, fig.dim=c(15,7), fig.align='center', echo=FALSE}grid.arrange(ne_edu_pop, ne_edu_pct, ncol=2)grid.rect(width = .98, height = .98, gp = gpar(lwd = 2, col = "darkgrey", fill = NA))```## Managing Projects- Some of this should be review- next week, we will talk about managing a long term consulting project - Managing files over time - Segmenting and rejoining poorly formatted data - Documenting steps as you progress - Addressing client needs as they eveolve and update requests- Documentation is key - Take good notes and keep README file updated- I use Markdown or Quarto files for everything, even work I don't present to client. - Ideal format for writing notes between code chunks## BUA 455 Project File Conventions- **Main Project Folder:** - Dashboard `qmd` file (Quarto file) - Dashboard .html file (Dashboard presentation) - Project `.rproj` file that makes folder into an R project. - `README.txt` file that includes an organized of all files you created or saved. - Other files created when when .qmd file is rendered. - These 'byproduct' files do not need to be listed in the README.- **Data (`data`) Folder:**- All raw .csv files needed (No data management should be done in Excel!)- **Images (`img`) Folder:** - Any .png or other graphics files needed- **OPTIONAL:** Extraneous **useful** code can be saved in a separate folder within the project.## Rpubs Exercise- **RPubs** (mentioned earlier in this set of slides)- If you want to publish your dashboard or any HTML file you create in R, you can do so for free.- R has a public online repository called [**RPubs**](https://rpubs.com/).- **Rpubs** is very useful if you want post an html file online and provide the link to it.- I Use **RPubs** for slides in this course and it is useful if for work like the project dashboards.- As an in class exercise, I will ask you each to create an account and publish your HW 5 - Part 1 dashboard html file. - This exercise will be useful because it allows you to see how this publication process works. - You will see how publishing changes the appearance of your panels and text. - Once you post your final dashboard you may want to include it as a link in your resume and/or LinkedIn profile.## In-class Exercise1. Open your HW 5 - Part 1.Rmd file and knit it to create your dashboard.- Make sure this file has your name in the header.- If you don't have HW 5 - Part 1 done, you can use the [Posit Cloud version of `HW 5 - Part 1` provided for HW 5 - Part 2](https://posit.cloud/content/10125821){target="_blank"}2. Click the **Publish Icon** , create a free account, and publish your html file.- If RStudio asks to install additional packages to complete the publishing process, click `Yes`.3. Submit the link to your published file on Blackboard.- A Link to your published file must be submitted by Friday 4/11 at midnight to count for class participation for today's lecture.## Next Week - Additional Topics- Ask me questions about your project (Others may benefit)- I have some short essential and some optional topics including: - details and recommendations for writing both project memos. - Memos will be written as word documents in Quarto (`.qmd`). - managing a consulting project from beginning to end. - formatting complex tables using the `gt` package. - knitting Quarto files to different formats: word, Powerpoint, etc.## Additional Topics Continued**Review of Skillset Terminology**::: fragmentNow that you are (almost) done with BUA 455, and more so when you graduate, you have a very useful set of skills.:::- Explaining these skills to others is a challenge.- I will spend a little time talking about how to explain those skills to other people- Preview: It took me decades to figure out how to talk about what I do, in part, because this discipline was more obscure. - Increased interest in Data Science and Analytics has resulted in better terminology. - [White Paper from DataCamp provides an excellent blueprint](https://drive.google.com/file/d/1_VoM3D6tPftjZpXCnTL8SKYBlOM_4KjG/view?usp=sharing)## ### Key Points from This Week::: fragment**More with Geographic Data**:::- Adding Shadow Text- Filtering Map Data and Comparing Variables::: fragment**Project Management**:::- Review of skills covered throughout course- Managing data projects this way is beneficial::: fragment**Publishing Work on RPubs**:::- Useful for publishing and linking to work::: fragmentYou may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**:::