Lab 9 Part II: Analysis in 4 Parts: Time, Space, Scatterplots, and Regression

Are the declines in Union Membership leading to income inequality? We will measure income inequality by using the share of all income capture by the top 1% of income earnings. As this share increases, income inequality increases.

We will examine this question by analyzing inequality and union membership over time, over space, and using a linear regression. Fun stuff!

In this part of the lab:

Generate a time plot of income inequality and union membership
Generate a map showing the spatial distribution of income inequality in the US
Scatter plot of income inequality and union membership
Linear Regression
Time Plot

Let’s look at the historical trends in income inequality (top 1% share) and rates of union membership. We will use the national level datasets we created in Part I.

Now for the time plot. This is a two-axis plot- throwback to earlier on in the semester! Remember to set your coefficient to use to scale the second axis variable (Union_Rate). We’re also going to set two hexidecimal colors for each line (top1Color and UnionColor)

inequality_union_usa <- read_csv("inequality_union_usa.csv")

#First Plot 
ggplot(inequality_union_usa, aes(x = Year)) +
  geom_line(aes(y = top1), color = "blue") +
  geom_line(aes(y = Union_Rate), color = "red")

## Warning: Removed 3 row(s) containing missing values (geom_path).

#Second Attempt - fix the scale for union rates
coeff <- 100
ggplot(inequality_union_usa, aes(x = Year)) +
  geom_line(aes(y = top1), color = "blue") +
  geom_line(aes(y = Union_Rate/coeff), color = "red")

## Warning: Removed 3 row(s) containing missing values (geom_path).

#Third Attempt - make things look nice
coeff <- 100
ggplot(inequality_union_usa, aes(x = Year)) +
  geom_line(aes(y = top1), color = "blue") +
  geom_line(aes(y = Union_Rate/coeff), color = "red") +
  
  scale_y_continuous(name = "Top 1% Income Share", 
                     sec.axis = sec_axis(trans=~., name = "Union Membership")) +
  theme_classic() +
  ggtitle("Income Inequality and Union Membership\n 1970-2019") +
  theme(plot.title = element_text(hjust = 0.5))

## Warning: Removed 3 row(s) containing missing values (geom_path).

Question: Can we attribute the increases in income inequality to declines in union membership? Why or why not?

Spatial Analysis

For this analysis, we will use the metro-level dataset. We’ll mutate a new column called top1_cut, which breaks the income share of the top 1% in each metro area into 5 groups. The breaks = are where the breaks or “cuts” are in the data. You’ll see this clearly in the map.

msa_inq_union <- readRDS("msa_inq_union.rds")

msa_inq_union2 <- msa_inq_union %>%
  mutate(top1_cut = cut(top1, breaks = c(0, 12, 15, 20, 25, 45)))

ggplot(data = msa_inq_union2, aes(geometry = geometry)) +
  geom_sf(aes(fill = top1_cut), lwd = 0.25) +
  coord_sf(xlim = c(-126, -66), ylim = c(23, 51.5), expand = FALSE) +
scale_fill_brewer(na.value = "grey", name = "Percent income share of Top 1%") +
  ggtitle("Top 1% Income Share for Metro Areas") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

To examine income inequality in a single state (such as Florida), we can change the xlim and ylim values for our coordinates.

ggplot(data = msa_inq_union2, aes(geometry = geometry)) +
  geom_sf(aes(fill = top1_cut), lwd = 0.25) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) +
scale_fill_brewer(na.value = "grey") +
  ggtitle("Top 1% Income Share for Florida") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

Scatter Plot

When we want to examine a relatinoship between two continuous variables, a scatter plot can be helpful. We will also add a linear regression line geom_smooth(method ='lm').

 ggplot(msa_inq_union2, aes(x = MemTotal, y = top1)) +
           geom_point() +
  geom_smooth(method ='lm') +
  theme_classic() +
  ggtitle("Top 1% Income Share and Union Membership") +
  xlab("Pct Union Membership") +
  ylab("Top 1% Share")

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 694 rows containing non-finite values (stat_smooth).

## Warning: Removed 694 rows containing missing values (geom_point).

Question: What does the relationship look like between union membership and income inequality?

Linear Regression

To regress top 1% income share on union membership rates wage.

result.1 <- lm(top1 ~ MemTotal, data = msa_inq_union2)
summary(result.1)

## 
## Call:
## lm(formula = top1 ~ MemTotal, data = msa_inq_union2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5253 -2.2236 -0.5847  1.2661 26.3450 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.45427    0.46180  35.631   <2e-16 ***
## MemTotal    -0.07893    0.03514  -2.246   0.0256 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.072 on 233 degrees of freedom
##   (694 observations deleted due to missingness)
## Multiple R-squared:  0.0212, Adjusted R-squared:  0.017 
## F-statistic: 5.046 on 1 and 233 DF,  p-value: 0.02562

What if we want a nicer looking table? For example, the above doesn’t look so great for a blog post. There’s the stargazer package that will make nice looking tables. You input the regression results, title, and the type of output (here it’s plain text).

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

stargazer(result.1,title="Results", align=TRUE, type="text")

## 
## Results
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                top1            
## -----------------------------------------------
## MemTotal                     -0.079**          
##                               (0.035)          
##                                                
## Constant                     16.454***         
##                               (0.462)          
##                                                
## -----------------------------------------------
## Observations                    235            
## R2                             0.021           
## Adjusted R2                    0.017           
## Residual Std. Error      4.072 (df = 233)      
## F Statistic            5.046** (df = 1; 233)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Now we can make it look even better - especially if this was final results for a blog post. First, let’s make sure the code chunk doesn’t show up. At the beginning of the chunk do the following

{r, echo=FALSE, results='asis', message=FALSE}

echo=FALSE will hide the R code; results='asis' will convert HTML code to nice output, and message=FALSE will hide any error messages.

In stargazer change type="text" to type="html".

You won’t see the really nice table until you knit.

**Results**

	Dependent variable:

	top1

MemTotal	-0.079^**
	(0.035)

Constant	16.454^***
	(0.462)


Observations	235
R²	0.021
Adjusted R²	0.017
Residual Std. Error	4.072 (df = 233)
F Statistic	5.046^** (df = 1; 233)

Note:	p<0.1; p<0.05; p<0.01

Question: What if you did a t-test for the relationship between top 1% income share and union membership? Do you get the same results as the regression? Why or why not?

We can now add controls

result.2 <- lm(top1 ~ MemTotal + population, data = msa_inq_union2)

result.3 <- lm(top1 ~ MemTotal + population + median_income, data = msa_inq_union2)

result.4 <- lm(top1 ~ MemTotal + population + median_income + college_pct, data = msa_inq_union2)

result.5<- lm(top1 ~ MemTotal + population + median_income + college_pct + unemployed, data = msa_inq_union2)

Now let’s make a nice looking table worthy of a blog post. Using stargazer, begin the code chunk with:

{r, results='asis', echo=FALSE, message=FALSE}

and in the chunk

stargazer(result.2, result.3, result.4, result.5, title="Results", align=TRUE, type="html")

and when you knit you have a nice looking table!

**Results**

	Dependent variable:

	top1
	(1)	(2)	(3)	(4)

MemTotal	-0.088^***	-0.103^***	-0.089^**	-0.094^**
	(0.033)	(0.034)	(0.035)	(0.037)

population	0.00000^***	0.00000^***	0.00000^***	0.00000^***
	(0.00000)	(0.00000)	(0.00000)	(0.00000)

median_income		0.0001^*	0.00002	0.00002
		(0.00003)	(0.00004)	(0.00004)

college_pct			6.836	7.455^*
			(4.281)	(4.459)

unemployed				5.892
				(11.640)

Constant	15.803^***	13.172^***	13.311^***	12.442^***
	(0.449)	(1.578)	(1.575)	(2.332)


Observations	235	235	235	235
R²	0.141	0.152	0.161	0.162
Adjusted R²	0.133	0.141	0.146	0.144
Residual Std. Error	3.824 (df = 232)	3.808 (df = 231)	3.795 (df = 230)	3.801 (df = 229)
F Statistic	18.970^*** (df = 2; 232)	13.764^*** (df = 3; 231)	11.030^*** (df = 4; 230)	8.847^*** (df = 5; 229)

Note:	p<0.1; p<0.05; p<0.01

For your blog post, once you are ready to finalize things, you’ll want to use echo=FALSE and message=FALSE a lot! You don’t want your reader looking through all the code.

For more details on formatting regresison results using stargazer, see here and here.

One thing to note is that as we add additional controls, the coefficient for union membership is relatively stable and remains statistically significant at the 5% level.

Question: Are there additional controls you would like to add to the regression? Why?

END

Lab 9 Part II: Analysis in 4 Parts: Time, Space, Scatterplots, and Regression

ECON 210 Middlebury College

Last updated Tue Apr 13

Spatial Analysis

Scatter Plot

Linear Regression