true

Goals for this exercise

Learn how to label figures in Quarto documents
Use scatterplots to examine assertions from the Chetty study and Simpson’s Paradox

Setup

Once again, we start by defining our exercise setup, loading packages and data. We are using the include: true argument to specify that these blocks should remain visible in our final output for informational purposes at the start of the course, but as the semester progresses we will start to hide these blocks in our final output.

Load packages

We load the same packages as we did in our last exercise.

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readxl)

Read in Chetty data

Our dataset this time comes from the Chetty study. You were introduced to Chetty in the reading for today’s class, via the Executive Summary.

This code creates a new data frame named “chetty” by reading in the data from another Excel file.

Also note the use of skip=1 within our read_excel() function. Can you determine why this argument is necessary?

chetty <- read_excel("data/Lab1_Chetty_2014.xlsx", skip=1)

Get to know the data

Once again, use glimpse() or just invoke the name of the data frame to inspect it:

glimpse(chetty)

## Rows: 741
## Columns: 48
## $ cz                   <dbl> 100, 200, 301, 302, 401, 402, 500, 601, 602, 700,…
## $ cz_name              <chr> "Johnson City", "Morristown", "Middlesborough", "…
## $ state                <chr> "TN", "TN", "TN", "TN", "NC", "VA", "NC", "NC", "…
## $ region               <chr> "South", "South", "South", "South", "South", "Sou…
## $ division             <chr> "East South Central", "East South Central", "East…
## $ pop_2000             <dbl> 576081, 227816, 66708, 727600, 493180, 92753, 105…
## $ top_100              <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ urban                <chr> "urban", "urban", "rural", "urban", "urban", "rur…
## $ rel_mobility         <dbl> 0.37170, 0.35611, 0.37557, 0.37253, 0.38913, 0.42…
## $ abs_mobility         <dbl> 38.38750, 37.77675, 39.04925, 37.84125, 36.96925,…
## $ prob_q1q5            <dbl> 0.06219881, 0.05365194, 0.07263514, 0.05628121, 0…
## $ frac_black           <dbl> 0.0208408, 0.0197791, 0.0146459, 0.0563648, 0.173…
## $ racial_seg           <dbl> 0.0903837, 0.0931530, 0.0642501, 0.2099943, 0.262…
## $ income_seg           <dbl> 0.0348657, 0.0262809, 0.0240811, 0.0921036, 0.071…
## $ poverty_seg          <dbl> 0.0301532, 0.0278582, 0.0146831, 0.0842674, 0.061…
## $ affluence_seg        <dbl> 0.0382395, 0.0253348, 0.0258143, 0.1019656, 0.080…
## $ frac_commute_under15 <dbl> 0.3252578, 0.2764278, 0.3585358, 0.2685696, 0.291…
## $ hhi_percap           <dbl> 31559.77, 29958.93, 22328.48, 35884.29, 38891.75,…
## $ gini                 <dbl> 0.46804, 0.43459, 0.44095, 0.50832, 0.46553, 0.44…
## $ top1inc_share        <dbl> 13.459000, 10.631000, 10.691000, 15.080000, 11.91…
## $ gini_bottom99        <dbl> 0.33345, 0.32828, 0.33404, 0.35752, 0.34636, 0.33…
## $ frac_p25top75        <dbl> 0.54796, 0.53750, 0.46685, 0.50410, 0.49991, 0.53…
## $ taxrate              <dbl> 0.0203924, 0.0234471, 0.0153799, 0.0188704, 0.017…
## $ gov_exp              <dbl> 1886.148, 2004.337, 1189.820, 2356.851, 1891.450,…
## $ inctax_progressive   <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1…
## $ eitc                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ sch_exp              <dbl> 5.184547, 4.505886, 5.614119, 4.899846, 5.462676,…
## $ tchr_ratio           <dbl> NA, NA, 15.08494, NA, 15.38528, NA, 16.67801, 16.…
## $ test_pctile          <dbl> 2.7283790, -3.4002740, -9.3150620, -6.0318310, -2…
## $ hs_dropout           <dbl> -0.0152656, -0.0235207, -0.0046291, -0.0110711, 0…
## $ colleges_percap      <dbl> 0.0138869, 0.0087790, 0.0449721, 0.0109951, 0.014…
## $ college_tuition      <dbl> 4816.820, 4762.230, 11840.400, 3480.440, 9715.330…
## $ college_gradrate     <dbl> -0.0024312, -0.1011827, 0.1112985, -0.0238261, 0.…
## $ labor_participation  <dbl> 0.5873474, 0.6249742, 0.4789631, 0.6148306, 0.656…
## $ manufacturing        <dbl> 0.2373972, 0.2377556, 0.2335314, 0.1455054, 0.215…
## $ growth_imports       <dbl> 5.2937860, 3.0304790, 2.0625960, 1.0783080, 1.016…
## $ teen_lfp             <dbl> 0.0037529, 0.0047773, 0.0028932, 0.0042875, 0.003…
## $ migration_in         <dbl> 0.005639832, 0.016206061, 0.008050009, 0.01630291…
## $ migration_out        <dbl> 0.004697256, 0.014235172, 0.011602806, 0.01355415…
## $ foreign_born         <dbl> 0.0117837, 0.0230553, 0.0070780, 0.0199675, 0.052…
## $ social_capital       <dbl> -0.29785830, -0.76735477, -1.27025130, -0.2218846…
## $ frac_religious       <dbl> 0.5144033, 0.5438951, 0.6678060, 0.6019530, 0.487…
## $ crime_rate           <dbl> 0.0014095, 0.0018436, 0.0008545, 0.0013441, 0.002…
## $ frac_single_mothers  <dbl> 0.1898032, 0.1851060, 0.2110027, 0.2056023, 0.220…
## $ frac_divorced        <dbl> 0.1101729, 0.1159584, 0.1134514, 0.1142778, 0.092…
## $ frac_married         <dbl> 0.6008929, 0.6133591, 0.5902804, 0.5751500, 0.585…
## $ inc_growth           <dbl> -0.0022776, -0.0021528, -0.0037121, -0.0019974, -…
## $ `Urban or Rural`     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Questions

How many observations (rows) are there in our chetty data frame?

There are 741 observations/rows in the chetty data frame

How many variables (columns) are there?

There are 48 variables/columns in the chetty data frame

What does each observation represent?

Each row/observation in the chetty data frame represents a CZ (commuting zone) in the United States

Variables of interest

Today we will examine the following variables:

Variable	Type	Description
`social_capital`	continuous	Social capital score
`abs_mobility`	continuous	Absolute mobility score
`rel_mobility`	continuous	Relative mobility score
`urban`	categorical	Identifies whether the CZ is `urban` or `rural`
`gini`	continuous	Gini coefficient

Exercise, Part 1

One of the conclusions of the Chetty study in 2014 was that social capital, defined as “the strength of an individual’s social network and community,” may influence upward economic mobility. We’re going to explore that assertion in this exercise using Chetty’s own data and a mixture of the variables seen above.

Question: How strong is the relationship between social capital and economic mobility?

Step 1

Using geom_point(), visualize the relationship between social capital and absolute mobility.

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = abs_mobility
           )) +
  geom_point() +
  labs(
    x = "Social capital score",
    y = "Absolute mobility score"
  )

Social capital vs. Absolute mobility

Question 1

Do you see a correlation? Is it what you expected from the Chetty study executive summary?

Yes, I see a relatively positive correlation.

Step 2

Add an aesthetic to your graph to represent whether the CZ is urban or not.

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = abs_mobility,
           color = urban
           )) +
  geom_point() +
  labs(
    x = "Social capital score",
    y = "Absolute mobility score",
    color = "Urban v.s. Rural"
  )

$Social capital vs. Absolute mobility, segmented by urban and rural CZ's$

Social capital vs. Absolute mobility, segmented by urban and rural CZ’s

Question 2

Do you see any indication that Simpson’s Paradox could be an issue?

Yes, since the correlation for urban is not that obvious. Instead, the rural correlation is very strong.

Simpson’s Paradox: when a trend appears in separate groups but reverses or disappears when the data is aggregated, leading to misleading conclusions about wages, mobility, policy impacts, or inequality.

Step 3

Use faceting to separate urban and non-urban CZ’s into two separate plots.

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = abs_mobility,
           color = urban
           )) +
  geom_point() +
  labs(
    x = "Social capital score",
    y = "Absolute mobility score",
    color = "Urban v.s. Rural"
  ) +
  facet_grid(~urban)

Social capital vs. Absolute mobility, faceted by urban and rural CZ’s

Question 3

Does this change your answer to the previous question? Do you see any indication that Simpson’s Paradox could be an issue?

It seems like it’s no longer an issue.

Step 4

Use geom_smooth() to add a line of best fit to your faceted plot, and experiment with these different options for your line:

method: determines what kind of regression model to use in calculating the slope of the line of best fit
- lm: linear model (straight line)
- loess: locally-estimated smoothing (smooth line)
- And several others
se: show or hide a shaded area representing the calculated standard error:
- TRUE: show the standard error
- FALSE: hide the standard error

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = abs_mobility,
           color = urban
           )) +
  geom_point() +
  labs(
    x = "Social capital score",
    y = "Absolute mobility score",
    color = "Urban v.s. Rural"
  ) +
  facet_grid(~urban) +
  geom_smooth(method = "lm",
                  se = FALSE)

Social capital vs. Absolute mobility, faceted by urban and rural CZ’s with line of best fit

Question 4

What differences do you observe between the urban and rural CZ’s? Do you see any indication that Simpson’s Paradox could be an issue?

Yes, the data for urban is more condense and less range.

Exercise, Part 2

Step 5

Let’s examine relative mobility this time. Borrowing your code from above, create a scatterplot that replaces the abs_mobility variable with rel_mobility, and add a line of best fit. Also, use a color aesthetic on `geom_point() only to draw a single line for the entire dataset.

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = rel_mobility,
           )) +
  geom_point(aes(color = urban)) + # Remeber!! to add aes() here
  geom_smooth(method = "lm") +
  labs(
    x = "Social capital score",
    y = "Relative mobility score",
    color = "Urban v.s. Rural"
  )

Social capital vs. Relative mobility, segmented by urban/rural

Step 6

Now take the same chart from Step 5 above and facet it by the urban variable.

ggplot(data = chetty, mapping = aes(
           x = social_capital,
           y = rel_mobility,
           )) +
  geom_point(aes(color = urban)) + # Remeber!! to add aes() here
  geom_smooth(method = "lm",
              se = FALSE) +
  facet_wrap(~urban) +
  labs(
    x = "Social capital score",
    y = "Relative mobility score",
    color = "Urban v.s. Rural"
  )

Social capital vs. Relative mobility, faceted by urban/rural

Question 5

What differences do you observe when we use relative mobility instead of absolute?

It is relatively negative correlation. Notice that relative mobility reverses the relationship, and urban CZs in particular have probably the weakest correlation.

Exercise, Part 3

Question: How strong is the relationship between inequality and economic mobility?

The Gini coefficient is a popular measure of income and wealth inequality. In a perfectly unequal society where one person holds all the wealth and everyone else has none of it, the Gini coefficient would be 1. In a perfectly equal society where all members hold an equal share of the wealth, the Gini coefficient would be 0.

With this in mind, Chetty calculated the Gini coefficient for all of the CZs in the study, and this data is included in our dataset for today’s lesson.

Step 6

In this last phase, let’s examine the relationship between inequality and mobility by plotting our gini variable on the X axis and our abs_mobility variable on the Y axis.

ggplot(data = chetty, mapping = aes(
           x = gini,
           y = abs_mobility,
           )) +
  geom_point(aes(color = urban)) + # Remeber!! to add aes() here
  geom_smooth(method = "lm",
              se = FALSE) +
  facet_wrap(~urban) +
  labs(
    x = "Gini coefficient",
    y = "Absolute mobility score"
  )

Question 6

What do you observe when we analyze the relationship between the Gini coefficient and the absolute mobility score?

With higher absolute mobility, the gini coefficient is lower, which means the economic gap between the rich and the poor is less. The lower the Gini coefficient, the higher the mobility score. In other words, the more income equality we see, the greater the economic mobility. We can perhaps conclude, then, that income inequality acts as an impediment to upward mobility in certain CZ’s.

Wrap Up

Great work! When you’re finished, follow these steps:

In the YAML block at the top of the document, in the “author” line, replace the professor’s name with yours.
Make sure the date on the document is correct.
Click the Render button at the top of your Quarto document to render your work as an interactive HTML document. After the rendering process finishes (it may take 5-10 seconds), the document should be displayed in the Viewer panel in the bottom-right portion of RStudio.
Explore the HTML output. Note the table of contents, the code blocks, the visuals, and the callout boxes that highlight important information.
When studying and trying to reinforce your skills with R and the tidyverse, you may refer back to this completed Quarto document as a resource.

Class 03: Examining Economic Mobility

In-class exercise

Charlotte Li

Jan 28, 2025

Goals for this exercise

Setup

Load packages

Read in Chetty data

Get to know the data

Questions

Variables of interest

Exercise, Part 1

Step 1

Question 1

Step 2

Question 2

Step 3

Question 3

Step 4

Question 4

Exercise, Part 2

Step 5

Step 6

Question 5

Exercise, Part 3

Step 6

Question 6

Wrap Up