HW 5 - Part 1 is due on Wed. 3/26 - Grace Period ends 3/28
Draft Proposals are due Thursday 3/27
Thursday 3/27: In-class work day and NO OFFICE HOURS
Tuesday 4/1: Review using practice questions
Quiz 2 is on Thursday, 4/3
Mostly on Weeks 5 through 9, but material is cumulative.
Mostly on HW Assignments 4 and 5 , but material is cumulative.
Data Mgmt. tasks will requires 2 or 3 steps and then you will answer questions.
HW 5 - Part 2 will be posted after Quiz 2.
Today, Thursday, and Tuesday, 4/1
Practice Questions for Quiz 2 are posted.
Tue. 4/1: Skills/Concepts Review for Quiz 2
Putting skills together for different goals
Come with questions or submit them as Engagement Questions.
Today: Intro to Managing and Plotting Geographic Data
Today’s geographic maps will not be on Quiz 2.
Mapping data is an effective data curation tool that may be useful in this class, other classes, your career.
We will cover more about geographic data and map visualizations after Quiz 2.
If you want help with mapping project data, please reach out to me or TA.
In-class Exercise - Week 10
Purpose:
To gain some experience and understanding of map data available in R and elsewhere.
To experiment with mapping data
Students are encouraged to use domestic or international map data in their dashboards if appropriate.
Data Preparation
The data for today’s in-class exercise is part of R.
These geographic data are useful if you have information by state or county and you want to show a choropleth map of your data.
R also has world information, e.g., countries, continents, etc.
Code
```{r us data prep}us_states <- map_data("state") |> # state polygons (not used today) rename("state" = "region")us_counties <- map_data("county") |> # county polygons rename("state" = "region", "county" = "subregion") |> mutate(county = gsub(" ", "", county), county = gsub("'","", county) |> tolower())#unique(us_counties$county[us_counties$state=="louisiana"]) # note issue Louisiana countiescnty2019_all <- county_2019#unique(cnty2019_all$name[cnty2019_all$state=="Louisiana"]) # note issue Louisiana countiescnty2019_all <- cnty2019_all |> mutate(state = tolower(state), county = tolower(name), county = gsub(" county", "", county), county = gsub(" parish", "", county), county = gsub("\\.", "", county), # \\ is required because . used in R coding county = gsub(" ", "", county), county = gsub("'","", county)) |> relocate(county, .before=name)cnty2019_all <- full_join(us_counties,cnty2019_all) # geo data and demographic data```
Grid is populated left to right and then top to bottom unless otherwise specified.
Code
```{r eval=F}#|label: grid of pct plots# note alternative to grid.arrange used# code shown is for export version of plotsgrid <- plot_grid(cnty_hs_grad, cnty_computer, cnty_bachelors, cnty_brdbd, ncol=2)```
In-class Exercise - Plot Grid (Steps 1 & 2)
Steps to Follow
Examine the available variables in the cnty_2019_all dataset saved from R.
Create Individual Plots. Variable names in R dataset and definitions:
household_has_smartphone: Households with Smart Phones
median_age: Median Age
median_household_income: Median Household Income
median_individual_income: Median Individual Income
You can copy provided plot code and modify it for these variables or you could try to create a function (not required today).
Code
```{r}#|label: usmaps demographic maps exercise# select data for plot (not required, but helpful)# cnty_data2 <- # create four individual plots, one for each variable# use provided plot code and modify```
In-class Exercise - Plot Grid (Steps 3 & 4)
Create 2x2 plot grid of these four variables by US County as part of your class participation credit for this week.
For full credit, plots must be in the order specified.
Row 1: should have smartphone variable and median age.
Row 2: median household and individual income.
Create Plot Grid using plot_grid command.
Code
```{r}#|label: in-class exercise 2x2 plot grid# for full credit grid of plots must be in order specified# use plot_grid command# export plots to img folder using save_plot ```
Right click on plot grid, then save as… and save to img folder with correct name.
or use save_plot command
Mapping Log Transformed Data
The previous examples above are all percent data.
No transformations are needed
In contrast, population data or financial data are often right-skewed and need to be log transformed.
Recall from MAS 261, BUA 345 (and perhaps FIN classes):
An effective transformation for right skewed data is the natural log (LN) transformation.
The following demo shows how useful it is for mapping right skewed data.
```{r}#|label: untransformed pop hist codehist_pop <- cnty_data3 |> ggplot() + geom_histogram(aes(x=pop1k), fill="lightblue", col="darkblue") + labs(x="Population", title="Histogram of US Population Data", y = "Count",subtitle="Unit is 1000 People") + theme_classic()```
The Problem: We see we have skewed data BUT presenting log transformed data in a map may complicate data interpretation.
Solution - Natural Log Transformed Plot
Data are not transformed, but data axes and scale are
Options specified in scale_fill_continuous:
trans = "log"
breaks = c(...)
break intervals determined by examining data
Code
```{r}#|label: log transformed map plot codecnty_lpop <- cnty_data3 |> ggplot(aes(x=long, y=lat, group=group, fill=pop1k)) + geom_polygon() + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "", title="Population by County", subtitle="Unit is 1000 People and Date are Log-transformed") + scale_fill_continuous(type = "viridis",trans="log", breaks=c(1,10,100,1000,10000)) + theme(legend.key.width = unit(.4, "cm"))```
Histogram of Log transformed Data
Final dashboard doesn’t include exporatory plots
Data exploration plots:
Histograms, Scatterplots and boxplots are all useful
Code
```{r}#|label: log transformed pop hist codehist_lpop <- cnty_data3 |> ggplot() + geom_histogram(aes(x=pop1k),fill="lightblue", col="darkblue") + labs(x="Population", title="Histogram of Natural Log of US Population Data", y = "Count", subtitle="Unit is 1000 People and Data are Log-transformed") + theme_classic() + scale_x_continuous(trans="log", breaks=c(1,10,100,1000,10000))```
Histogram of Log-transformed Data
Population Plot Grid
Code
```{r eval=F}#|label: plot code for pop gridgrid.arrange(cnty_pop, hist_pop,cnty_lpop, hist_lpop, ncol=2) ```
When and How to Log Transform
Log transformation are useful if you have right skewed POSITIVE data such as
Prices
Population
Sales
Income
Note: If data (x) have zeros, a good option is to use log(x + 1)
ln(1) = 0 (In R log(1) = 0)
0 values in the data will still be zeros
In the following example we will create plots for number of households by county:
Histograms with and without LN transformation
Map plots of with and without LN transformation
Number of Households Per County
Without Transformation
Untransformed Data Histogram
Code
```{r}#|label: data and untransformed hh data histogram code#|cnty_data4 <- cnty2019_all |> select(long:county, households) |> mutate(households1K = households/1000)hist_hholds <- cnty_data4 |> ggplot() + geom_histogram(aes(x=households1K), fill="lightblue", col="darkblue") + labs(x="Number of Households", title="Histogram of US Household Data", y = "Count",subtitle="Unit is 1000 Households") + theme_classic()```
Number of Households is highly right-skewed.
Number of Households Per County
With Transformation
Log transformed Data Histogram
Code
```{r}#|label: log transformed hh data histogram codehist_lhholds <- cnty_data4 |> ggplot() + geom_histogram(aes(x=households1K), fill="lightblue", col="darkblue") + labs(x="Number of Households", title="Histogram of Natural Log of US Household Data", y = "Count", subtitle="Unit is 1000 Households and Data are Log-transformed") + theme_classic() + scale_x_continuous(trans="log", breaks=c(1,10,100,1000))```
Determine if variables need to be log-transformed.
Quiz 2 Practice Questions are posted.
Quiz 2 will be on Thursday, 4/3.
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.
Source Code
---title: "Weeks 10 and 11"subtitle: "Introduction To Geographic Data and Quiz 2 Review" author: "Penelope Pooler Eisenbies"date: last-modifiedtoc: truetoc-depth: 3toc-location: lefttoc-title: "Table of Contents"toc-expand: 1format: html: code-line-numbers: true code-fold: true code-tools: trueexecute: echo: fenced---## Housekeeping```{r include=F}#|label: setupknitr::opts_chunk$set(echo=T, highlight=T, message=F, warning=F) # specifies default options for all chunksoptions(scipen=100) # suppress scientific notation # install pacman if neededif (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")pacman::p_load(pacman, tidyverse, ggthemes, gridExtra, magrittr, kableExtra, RColorBrewer, maps, usdata, countrycode, mapproj, shadowtext, cowplot, grid) # install and load required packages p_loaded() # verify loaded packages```### Upcoming Dates- **HW 5 - Part 1 is due on Wed. 3/26 - Grace Period ends 3/28**- **Draft Proposals are due Thursday 3/27**- **Thursday 3/27: In-class work day and NO OFFICE HOURS**- **Tuesday 4/1: Review using practice questions**- **Quiz 2 is on Thursday, 4/3** - Mostly on Weeks 5 through 9, but material is cumulative. - Mostly on HW Assignments 4 and 5 , but material is cumulative. - Data Mgmt. tasks will requires 2 or 3 steps and then you will answer questions.- HW 5 - Part 2 will be posted after Quiz 2.## Today, Thursday, and Tuesday, 4/1- **Practice Questions for Quiz 2 are posted.**- **Tue. 4/1: Skills/Concepts Review for Quiz 2** - Putting skills together for different goals - Come with questions or submit them as `Engagement Questions`.- **Today: Intro to Managing and Plotting Geographic Data** - Today's geographic maps will not be on Quiz 2. - Mapping data is an effective data curation tool that may be useful in this class, other classes, your career.- We will cover more about geographic data and map visualizations after Quiz 2.- **If you want help with mapping project data, please reach out to me or TA**.## In-class Exercise - Week 10::::::: columns:::: {.column width="48%"}::: fragment**Purpose:**:::- To gain some experience and understanding of map data available in R and elsewhere.- To experiment with mapping data- Students are encouraged to use domestic or international map data in their dashboards if appropriate.::::::: {.column width="4%"}:::::: {.column width="48%"}{fig.align="center"}::::::::::## Data Preparation- The data for today's in-class exercise is part of R.- These geographic data are useful if you have information by state or county and you want to show a [choropleth map](https://en.wikipedia.org/wiki/Choropleth_map){target="_blank"} of your data.- R also has world information, e.g., countries, continents, etc.::: fragment```{r us data prep}us_states <- map_data("state") |> # state polygons (not used today) rename("state" = "region")us_counties <- map_data("county") |> # county polygons rename("state" = "region", "county" = "subregion") |> mutate(county = gsub(" ", "", county), county = gsub("'","", county) |> tolower())#unique(us_counties$county[us_counties$state=="louisiana"]) # note issue Louisiana countiescnty2019_all <- county_2019#unique(cnty2019_all$name[cnty2019_all$state=="Louisiana"]) # note issue Louisiana countiescnty2019_all <- cnty2019_all |> mutate(state = tolower(state), county = tolower(name), county = gsub(" county", "", county), county = gsub(" parish", "", county), county = gsub("\\.", "", county), # \\ is required because . used in R coding county = gsub(" ", "", county), county = gsub("'","", county)) |> relocate(county, .before=name)cnty2019_all <- full_join(us_counties,cnty2019_all) # geo data and demographic data```:::## County Demographic Plots:::::::::::::::::::: panel-tabset### [Data & Plot 1]{style="color:blue;"}::::::: columns:::: {.column width="48%"}- Creating a new dataset not required, but helpful- Plot code could be converted into a function::: fragment```{r}#|label: select data and plot 1 mapcnty_data1 <- cnty2019_all |>select(long:county, hs_grad, bachelors, household_has_computer, household_has_broadband) cnty_hs_grad <- cnty_data1 |>ggplot(aes(x=long, y=lat, group=group, fill=hs_grad)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Percent with High School Degree") +scale_fill_continuous(type ="viridis") +theme(plot.background =element_rect(fill ="lightgrey", color =NA),legend.key.size =unit(.4, 'cm'),plot.title =element_text(size =10),legend.text=element_text(size =8))```:::::::::: {.column width="4%"}:::::: {.column width="48%"}Font in plot adjusted for screen.```{r plot 1 shown, echo=F, fig.dim=c(7,6)}(cnty_hs_grad1 <- cnty_hs_grad + theme(plot.background = element_rect(colour = "darkgrey", linewidth=2), legend.key.size = unit(1, 'cm'), plot.title = element_text(size = 18), legend.text= element_text(size = 12)))```::::::::::### [Plot 2]{style="color:blue;"}:::::: columns::: {.column width="48%"}```{r}#|label: plot 2 codecnty_bachelors <- cnty_data1 |>ggplot(aes(x=long, y=lat, group=group, fill=bachelors)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Percent with Bachelor's Degree") +scale_fill_continuous(type ="viridis") +theme(plot.background =element_rect(fill ="lightgrey", color =NA),legend.key.size =unit(.4, 'cm'),plot.title =element_text(size =10),legend.text=element_text(size =8))```:::::: {.column width="4%"}:::::: {.column width="48%"}Font in plot adjusted for screen.```{r plot 2 shown, echo=F, fig.dim=c(7,6)}(cnty_bachelors1 <- cnty_bachelors + theme(plot.background = element_rect(colour = "darkgrey", linewidth=2), legend.key.size = unit(1, 'cm'), plot.title = element_text(size = 18), legend.text= element_text(size = 12)))```:::::::::### [Plot 3]{style="color:blue;"}:::::: columns::: {.column width="48%"}```{r}#|label: plot 3 codecnty_computer <- cnty_data1 |>ggplot(aes(x=long, y=lat, group=group, fill=household_has_computer)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Percent of Households with A Computer")+scale_fill_continuous(type ="viridis") +theme(plot.background =element_rect(fill ="lightgrey", color =NA),legend.key.size =unit(.4, 'cm'),plot.title =element_text(size =10),legend.text=element_text(size =8))```:::::: {.column width="4%"}:::::: {.column width="48%"}Font in plot adjusted for screen```{r plot 3 shown, echo=F, fig.dim=c(7,6)}(cnty_computer1 <- cnty_computer + theme(plot.background = element_rect(colour = "darkgrey", linewidth=2), legend.key.size = unit(1, 'cm'), plot.title = element_text(size = 18), legend.text= element_text(size = 12)))```:::::::::### [Plot 4]{style="color:blue;"}:::::: columns::: {.column width="48%"}```{r plot 4 code}#|label: plot 4 codecnty_brdbd <- cnty_data1 |> ggplot(aes(x=long, y=lat, group=group, fill=household_has_broadband)) + geom_polygon() + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "", title="Percent of Households with BroadBand") + scale_fill_continuous(type = "viridis") + theme(plot.background = element_rect(fill = "lightgrey", color = NA), legend.key.size = unit(.4, 'cm'), plot.title = element_text(size = 10), legend.text= element_text(size = 8))```:::::: {.column width="4%"}:::::: {.column width="48%"}Font in plot adjusted for screen```{r plot 4 shown, echo=F, fig.dim=c(7,6)}(cnty_brdbd1 <- cnty_brdbd + theme(plot.background = element_rect(colour = "darkgrey", linewidth=2), legend.key.size = unit(1, 'cm'), plot.title = element_text(size = 18), legend.text= element_text(size = 12)))```:::::::::::::::::::::::::::::## Demographic Plot Grid - Order MattersGrid is populated left to right and then top to bottom unless otherwise specified.```{r eval=F}#|label: grid of pct plots# note alternative to grid.arrange used# code shown is for export version of plotsgrid <- plot_grid(cnty_hs_grad, cnty_computer, cnty_bachelors, cnty_brdbd, ncol=2)``````{r grid of 4 demo pct plots for slides, fig.align='center', fig.dim=c(12,6), echo=F}plot_grid(cnty_hs_grad1, cnty_computer1, cnty_bachelors1, cnty_brdbd1, ncol=2) # screen versiongrid <- plot_grid(cnty_hs_grad, cnty_computer, cnty_bachelors, cnty_brdbd, ncol=2) # export versionsave_plot("img/grid_plot_example_Penelope_Pooler.png", grid)```## In-class Exercise - Plot Grid (Steps 1 & 2)**Steps to Follow**1. Examine the available variables in the **`cnty_2019_all`** dataset saved from R.2. **Create Individual Plots**. Variable names in R dataset and definitions: - **`household_has_smartphone`: Households with Smart Phones** - **`median_age`: Median Age** - **`median_household_income`: Median Household Income** - **`median_individual_income`: Median Individual Income**- You can copy provided plot code and modify it for these variables or you could try to create a function (not required today).::: fragment```{r}#|label: usmaps demographic maps exercise# select data for plot (not required, but helpful)# cnty_data2 <- # create four individual plots, one for each variable# use provided plot code and modify```:::## In-class Exercise - Plot Grid (Steps 3 & 4)3. Create 2x2 plot grid of these four variables by US County as part of your class participation credit for this week.- **For full credit, plots must be in the order specified.** - **Row 1:** should have smartphone variable and median age. - **Row 2:** median household and individual income.- Create Plot Grid using `plot_grid` command.::: fragment```{r}#|label: in-class exercise 2x2 plot grid# for full credit grid of plots must be in order specified# use plot_grid command# export plots to img folder using save_plot ```:::4. Right click on plot grid, then save as... and save to `img` folder with correct name.- or use `save_plot` command## Mapping Log Transformed Data:::::: columns::: {.column width="68%"}- The previous examples above are all percent data.- No transformations are needed- In contrast, population data or financial data are often right-skewed and need to be **log transformed.**- Recall from MAS 261, BUA 345 (and perhaps FIN classes): - An effective transformation for right skewed data is the natural log (LN) transformation. - The following demo shows how useful it is for mapping right skewed data.:::::: {.column width="4%"}:::::: {.column width="28%"}:::::::::## Plot of Skewed Data:::::: columns::: {.column width="48%"}```{r}#|label: untransformed pop. map codecnty_data3 <- cnty2019_all |>select(long:county, pop) |>mutate(pop1k = pop/1000)cnty_pop <- cnty_data3 |>ggplot(aes(x=long, y=lat, group=group, fill=pop1k)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Population by County",subtitle="Unit is 1000 People") +scale_fill_continuous(type ="viridis") +theme(legend.key.width =unit(.4, "cm"))```:::::: {.column width="4%"}:::::: {.column width="48%"}```{r untransformed pop map shown, echo=F, fig.dim=c(7,6)}cnty_pop + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::::::::## Histogram Clarifies Data Skewness:::::: columns::: {.column width="48%"}```{r}#|label: untransformed pop hist codehist_pop <- cnty_data3 |>ggplot() +geom_histogram(aes(x=pop1k),fill="lightblue", col="darkblue") +labs(x="Population", title="Histogram of US Population Data",y ="Count",subtitle="Unit is 1000 People") +theme_classic()```**The Problem:** We see we have skewed data BUT presenting log transformed data in a map may complicate data interpretation.:::::: {.column width="4%"}:::::: {.column width="48%"}```{r untransformed pop hist shown, fig.dim=c(7,6), echo=F}hist_pop + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::::::::## Solution - Natural Log Transformed Plot::::::: columns:::: {.column width="48%"}- Data are not transformed, but data axes and scale are- Options specified in `scale_fill_continuous`: - `trans = "log"` - `breaks = c(...)`- break intervals determined by examining data::: fragment```{r}#|label: log transformed map plot codecnty_lpop <- cnty_data3 |>ggplot(aes(x=long, y=lat, group=group, fill=pop1k)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Population by County",subtitle="Unit is 1000 People and Date are Log-transformed") +scale_fill_continuous(type ="viridis",trans="log",breaks=c(1,10,100,1000,10000)) +theme(legend.key.width =unit(.4, "cm"))```:::::::::: {.column width="4%"}:::::: {.column width="48%"}```{r log transformed map plot shown, echo=F, fig.dim=c(7,6)}cnty_lpop + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```::::::::::## Histogram of Log transformed Data::::::: columns:::: {.column width="48%"}- Final dashboard doesn't include exporatory plots- Data exploration plots: - Histograms, Scatterplots and boxplots are all useful::: fragment```{r}#|label: log transformed pop hist codehist_lpop <- cnty_data3 |>ggplot() +geom_histogram(aes(x=pop1k),fill="lightblue", col="darkblue") +labs(x="Population", title="Histogram of Natural Log of US Population Data",y ="Count",subtitle="Unit is 1000 People and Data are Log-transformed") +theme_classic() +scale_x_continuous(trans="log", breaks=c(1,10,100,1000,10000))```:::::::::: {.column width="4%"}:::::: {.column width="48%"}**Histogram of Log-transformed Data**```{r log transformed pop hist shown, echo=F, fig.dim=c(7,6)}hist_lpop + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```::::::::::## Population Plot Grid```{r eval=F}#|label: plot code for pop gridgrid.arrange(cnty_pop, hist_pop,cnty_lpop, hist_lpop, ncol=2) ``````{r grid of all four population plots, fig.align='center', fig.dim=c(14,6), echo=F}grid.arrange(cnty_pop, hist_pop,cnty_lpop, hist_lpop, ncol=2) grid.rect(.5,.5,width=unit(.99,"npc"), height=unit(0.99,"npc"), gp=gpar(lwd=3, fill=NA, col="darkgrey"))```## When and How to Log Transform- Log transformation are useful if you have right skewed POSITIVE data such as - Prices - Population - Sales - Income- Note: If data (x) have zeros, a good option is to use log(x + 1) - ln(1) = 0 (In R `log(1)` = 0) - 0 values in the data will still be zeros- In the following example we will create plots for number of households by county: - Histograms with and without LN transformation - Map plots of with and without LN transformation## Number of Households Per County### Without Transformation:::::: columns::: {.column width="48%"}**Untransformed Data Histogram**```{r}#|label: data and untransformed hh data histogram code#|cnty_data4 <- cnty2019_all |>select(long:county, households) |>mutate(households1K = households/1000)hist_hholds <- cnty_data4 |>ggplot() +geom_histogram(aes(x=households1K),fill="lightblue", col="darkblue") +labs(x="Number of Households", title="Histogram of US Household Data",y ="Count",subtitle="Unit is 1000 Households") +theme_classic()```:::::: {.column width="4%"}:::::: {.column width="48%"}**Number of Households is highly right-skewed.**```{r untransformed hh data histogram shown, echo=F, fig.dim=c(7,6)}hist_hholds + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::::::::## Number of Households Per County### With Transformation:::::: columns::: {.column width="48%"}**Log transformed Data Histogram**```{r}#|label: log transformed hh data histogram codehist_lhholds <- cnty_data4 |>ggplot() +geom_histogram(aes(x=households1K),fill="lightblue", col="darkblue") +labs(x="Number of Households", title="Histogram of Natural Log of US Household Data",y ="Count",subtitle="Unit is 1000 Households and Data are Log-transformed") +theme_classic() +scale_x_continuous(trans="log", breaks=c(1,10,100,1000))```:::::: {.column width="4%"}:::::: {.column width="48%"}**Log-transformed data appear normally distributed**```{r log transformed hh data histogram shown, echo=F, fig.dim=c(7,6)}hist_lhholds + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::::::::## Number of Households Per County:::::: columns::: {.column width="48%"}```{r untransformed hh plot map code}cnty_hholds <- cnty_data4 |> ggplot(aes(x=long, y=lat, group=group, fill=households1K)) + geom_polygon() + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "", title="Number of Households by County", subtitle="Unit is 1000 Households") + scale_fill_continuous(type = "viridis") + theme(legend.key.width = unit(.4, "cm"))```:::::: {.column width="4%"}:::::: {.column width="48%"}**Map of untransformed households per county data is uninformative.**```{r untransformed data hh map shown, echo=F, fig.dim=c(7,6)}cnty_hholds + theme(plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))```:::::::::## In-class Exercise 2**Log-transformed data map is more more informative about geographic variability.**Submit R code to create log transformed households map in a text (`.txt`) file with your name.```{r log transformed hh data map, echo=F, fig.align='center', fig.dim=c(14,6.5)}(cnty_lhholds <- cnty_data4 |> ggplot(aes(x=long, y=lat, group=group, fill=households1K)) + geom_polygon() + theme_map() + coord_map("albers", lat0 = 39, lat1 = 45) + labs(fill= "", title="Number of Households by County", subtitle="Unit is 1000 Households and data are Log-transformed") + scale_fill_continuous(type = "viridis", trans="log", breaks=c(1,10,100,1000)) + theme(legend.key.width = unit(.4, "cm")))```## Households per County Plot Grid```{r household grid plot code, eval=F}grid.arrange(hist_hholds, hist_lhholds, cnty_hholds, cnty_lhholds, ncol=2)``````{r grid of all four household plots, echo = F, fig.align='center', fig.dim=c(14,6)}grid.arrange(hist_hholds, hist_lhholds, cnty_hholds, cnty_lhholds, ncol=2)```## Quiz 2 Information- Questions and Material from Quiz 1 may be on Quiz 2- Practice Questions will be posted by 10/31 - Review Quiz 1 and Quiz 1 Practice Questions - Review Week HW 4 and HW 5 - Part 1 and recent lectures- Study Tip: Feel free to add on to practice questions .qmd file with extra chunks and notes so that all of your notes are in one place.## Quiz 2 Information Cont'd- Converting text (character) date information to a date using [**lubridate**](https://rawgit.com/rstudio/cheatsheets/main/lubridate.pdf) commands (Week 5) - Example R commands:`ymd`, `dmy`, `mdy`, `ym` combined with `paste` to combine columns- Extracting year, month, or day from the date variable using lubridate commands (Week 6) - Example R commands: `year`, `month`, `quarter`, `wday`, `day`- Converting an `xts` dataset to a tibble (standard R dataset) (Week 7) - Creating a lineplot from time series (non-xts) dataset- Converting a tibble to an `xts` dataset - Creating an interactive `hchart`## Quiz 2 Info Cont'd- You should be familiar with the `bls_tidy` function we created and how to use it to import similar datasets.- There will be datasets to be imported AND combined (joined) - Data sets can be joined by row. - You should know how to do the different joins we covered and what each one does: - `full_join` - `right_join` - `left_join` - `inner_join`- Data sets can be stacked by column if the columns are identical - In BUA 455 we covered `bind_rows`## Quiz 2 Info Cont'd- Cleaning messy data (Week 5 and Weeks 8-9) - Dealing with text (character variables) - `gsub` - `separate` - `unite` or `paste` or `paste0` - `ifelse` can be be used for text OR for numeric data - `ifelse` followed by `factor` allows you to make any categorical variable you want.- Other commands for modifying text: - `tolower` and `toupper` - `str_trim`, `str_squish` and `str_pad`## Additional Text Commands::: fragment**Additional skills for Quiz 2 from HW 5 - Part 1:**:::- summing across rows using `sum(c_across(...))`- using `pivot_wider` and then `pivot_longer` and then replacing NAs with 0 to create a 'complete' data set - Useful for area plots- Plotting skills for Quiz 2 - unformatted line plot, area plot, or grouped bar chart - `hchart` in **highcharter** package::: fragment**NOT ON QUIZ 2:**:::- [**Commands to covert case**](https://stringr.tidyverse.org/reference/case.html)- `str_to_title`: First letter of each word- `str_to_sentence`: First letter of first word## ### Key Points::: fragment**Introduction to Geographic Data**:::- Use skills already covered to - clean data and check text variables - join datasets - create plot grids for comparing variables- Determine if variables need to be log-transformed.- Quiz 2 Practice Questions are posted.- Quiz 2 will be on Thursday, 4/3.::: fragmentYou may submit an 'Engagement Question' about each lecture until midnight on the day of the lecture. **A minimum of four submissions are required during the semester.**:::