Grid is populated left to right and then top to bottom unless otherwise specified.
#|label: grid of pct plots# note alternative to grid.arrange used# code shown is for export version of plotsgrid <-plot_grid(cnty_hs_grad, cnty_computer, cnty_bachelors, cnty_brdbd, ncol=2, )
In-class Exercise - Plot Grid (Steps 1 & 2)
Steps to Follow
Examine the available variables in the cnty_2019_all dataset saved from R.
Create Individual Plots. Variable names in R dataset and definitions:
household_has_smartphone: Households with Smart Phones
median_age: Median Age
median_household_income: Median Household Income
median_individual_income: Median Individual Income
You can copy provided plot code and modify it for these variables or you could try to create a function (not required today).
#|label: usmaps demographic maps exercise# select data for plot (not required, but helpful)# cnty_data2 <- # create four individual plots, one for each variable# use provided plot code and modify
In-class Exercise - Plot Grid (Steps 3 & 4)
Create 2x2 plot grid of these four variables by US County as part of your class participation credit for this week.
For full credit, plots must be in the order specified.
Row 1: should have smartphone variable and median age.
Row 2: median household and individual income.
Create Plot Grid using plot_grid command.
#|label: in-class exercise 2x2 plot grid# for full credit grid of plots must be in order specified# use plot_grid command# export plots to img folder using save_plot
Right click on plot grid, then save as… and save to img folder with correct name.
or use save_plot command
Mapping Log Transformed Data
The previous examples above are all percent data.
No transformations are needed
In contrast, population data or financial data are often right-skewed and need to be log transformed.
Recall from MAS 261, BUA 345 (and perhaps FIN classes):
An effective transformation for right skewed data is the natural log (LN) transformation.
The following demo shows how useful it is for mapping right skewed data.
#|label: untransformed pop hist codehist_pop <- cnty_data3 |>ggplot() +geom_histogram(aes(x=pop1k),fill="lightblue", col="darkblue") +labs(x="Population", title="Histogram of US Population Data",y ="Count",subtitle="Unit is 1000 People") +theme_classic()
The Problem: We see we have skewed data BUT presenting log transformed data in a map may complicate data interpretation.
Solution - Natural Log Transformed Plot
Data are not transformed, but data axes and scale are
Options specified in scale_fill_continuous:
trans = "log"
breaks = c(...)
break intervals determined by examining data
#|label: log transformed map plot codecnty_lpop <- cnty_data3 |>ggplot(aes(x=long, y=lat, group=group, fill=pop1k)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Population by County",subtitle="Unit is 1000 People and Date are Log-transformed") +scale_fill_continuous(type ="viridis",trans="log",breaks=c(1,10,100,1000,10000)) +theme(legend.key.width =unit(.4, "cm"))
Histogram of Log transformed Data
Final dashboard doesn’t include exporatory plots
Data exploration plots:
Histograms, Scatterplots and boxplots are all useful
#|label: log transformed pop hist codehist_lpop <- cnty_data3 |>ggplot() +geom_histogram(aes(x=pop1k),fill="lightblue", col="darkblue") +labs(x="Population", title="Histogram of Natural Log of US Population Data",y ="Count",subtitle="Unit is 1000 People and Data are Log-transformed") +theme_classic() +scale_x_continuous(trans="log", breaks=c(1,10,100,1000,10000))
Histogram of Log-transformed Data
Population Plot Grid
#|label: plot code for pop gridgrid.arrange(cnty_pop, hist_pop,cnty_lpop, hist_lpop, ncol=2)
When and How to Log Transform
Log transformation are useful if you have right skewed POSITIVE data such as
Prices
Population
Sales
Income
Note: If data (x) have zeros, a good option is to use log(x + 1)
ln(1) = 0 (In R log(1) = 0)
0 values in the data will still be zeros
In the following example we will create plots for number of households by county:
Histograms with and without LN transformation
Map plots of with and without LN transformation
Number of Households Per County
Without Transformation
Untransformed Data Histogram
#|label: data and untransformed hh data histogram code#|cnty_data4 <- cnty2019_all |>select(long:county, households) |>mutate(households1K = households/1000)hist_hholds <- cnty_data4 |>ggplot() +geom_histogram(aes(x=households1K),fill="lightblue", col="darkblue") +labs(x="Number of Households", title="Histogram of US Household Data",y ="Count",subtitle="Unit is 1000 Households") +theme_classic()
Number of Households is highly right-skewed.
Number of Households Per County
With Transformation
Log transformed Data Histogram
#|label: log transformed hh data histogram codehist_lhholds <- cnty_data4 |>ggplot() +geom_histogram(aes(x=households1K),fill="lightblue", col="darkblue") +labs(x="Number of Households", title="Histogram of Natural Log of US Household Data",y ="Count",subtitle="Unit is 1000 Households and Data are Log-transformed") +theme_classic() +scale_x_continuous(trans="log", breaks=c(1,10,100,1000))
Log-transformed data appear normally distributed
Number of Households Per County
cnty_hholds <- cnty_data4 |>ggplot(aes(x=long, y=lat, group=group, fill=households1K)) +geom_polygon() +theme_map() +coord_map("albers", lat0 =39, lat1 =45) +labs(fill="", title="Number of Households by County",subtitle="Unit is 1000 Households") +scale_fill_continuous(type ="viridis") +theme(legend.key.width =unit(.4, "cm"))
Map of untransformed households per county data is uninformative.
In-class Exercise 2
Log-transformed data map is more more informative about geographic variability.
Submit R code to create log transformed households map in a text (.txt) file with your name.
Determine if variables need to be log-transformed.
Quiz 2 Practice Questions are posted.
Quiz 2 will be on Thursday, 4/3.
You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.