Are the declines in Union Membership leading to income inequality? We will measure income inequality by using the share of all income capture by the top 1% of income earnings. As this share increases, income inequality increases.
We will examine this question by analyzing inequality and union membership over time, over space, and using a linear regression. Fun stuff!
In this part of the lab:
Let’s look at the historical trends in income inequality (top 1% share) and rates of union membership. We will use the national level datasets we created in Part I.
Now for the time plot. This is a two-axis plot- throwback to earlier on in the semester! Remember to set your coefficient to use to scale the second axis variable (Union_Rate). We’re also going to set two hexidecimal colors for each line (top1Color and UnionColor)
inequality_union_usa <- read_csv("inequality_union_usa.csv")
#First Plot
ggplot(inequality_union_usa, aes(x = Year)) +
geom_line(aes(y = top1), color = "blue") +
geom_line(aes(y = Union_Rate), color = "red")
## Warning: Removed 3 row(s) containing missing values (geom_path).
#Second Attempt - fix the scale for union rates
coeff <- 100
ggplot(inequality_union_usa, aes(x = Year)) +
geom_line(aes(y = top1), color = "blue") +
geom_line(aes(y = Union_Rate/coeff), color = "red")
## Warning: Removed 3 row(s) containing missing values (geom_path).
#Third Attempt - make things look nice
coeff <- 100
ggplot(inequality_union_usa, aes(x = Year)) +
geom_line(aes(y = top1), color = "blue") +
geom_line(aes(y = Union_Rate/coeff), color = "red") +
scale_y_continuous(name = "Top 1% Income Share",
sec.axis = sec_axis(trans=~., name = "Union Membership")) +
theme_classic() +
ggtitle("Income Inequality and Union Membership\n 1970-2019") +
theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 3 row(s) containing missing values (geom_path).
Question: Can we attribute the increases in income inequality to declines in union membership? Why or why not?
For this analysis, we will use the metro-level dataset. We’ll mutate a new column called top1_cut, which breaks the income share of the top 1% in each metro area into 5 groups. The breaks = are where the breaks or “cuts” are in the data. You’ll see this clearly in the map.
msa_inq_union <- readRDS("msa_inq_union.rds")
msa_inq_union2 <- msa_inq_union %>%
mutate(top1_cut = cut(top1, breaks = c(0, 12, 15, 20, 25, 45)))
ggplot(data = msa_inq_union2, aes(geometry = geometry)) +
geom_sf(aes(fill = top1_cut), lwd = 0.25) +
coord_sf(xlim = c(-126, -66), ylim = c(23, 51.5), expand = FALSE) +
scale_fill_brewer(na.value = "grey", name = "Percent income share of Top 1%") +
ggtitle("Top 1% Income Share for Metro Areas") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5))
To examine income inequality in a single state (such as Florida), we can change the xlim and ylim values for our coordinates.
ggplot(data = msa_inq_union2, aes(geometry = geometry)) +
geom_sf(aes(fill = top1_cut), lwd = 0.25) +
coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) +
scale_fill_brewer(na.value = "grey") +
ggtitle("Top 1% Income Share for Florida") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5))
When we want to examine a relatinoship between two continuous variables, a scatter plot can be helpful. We will also add a linear regression line geom_smooth(method ='lm').
ggplot(msa_inq_union2, aes(x = MemTotal, y = top1)) +
geom_point() +
geom_smooth(method ='lm') +
theme_classic() +
ggtitle("Top 1% Income Share and Union Membership") +
xlab("Pct Union Membership") +
ylab("Top 1% Share")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 694 rows containing non-finite values (stat_smooth).
## Warning: Removed 694 rows containing missing values (geom_point).
Question: What does the relationship look like between union membership and income inequality?
To regress top 1% income share on union membership rates wage.
result.1 <- lm(top1 ~ MemTotal, data = msa_inq_union2)
summary(result.1)
##
## Call:
## lm(formula = top1 ~ MemTotal, data = msa_inq_union2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5253 -2.2236 -0.5847 1.2661 26.3450
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.45427 0.46180 35.631 <2e-16 ***
## MemTotal -0.07893 0.03514 -2.246 0.0256 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.072 on 233 degrees of freedom
## (694 observations deleted due to missingness)
## Multiple R-squared: 0.0212, Adjusted R-squared: 0.017
## F-statistic: 5.046 on 1 and 233 DF, p-value: 0.02562
What if we want a nicer looking table? For example, the above doesn’t look so great for a blog post. There’s the stargazer package that will make nice looking tables. You input the regression results, title, and the type of output (here it’s plain text).
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
stargazer(result.1,title="Results", align=TRUE, type="text")
##
## Results
## ===============================================
## Dependent variable:
## ---------------------------
## top1
## -----------------------------------------------
## MemTotal -0.079**
## (0.035)
##
## Constant 16.454***
## (0.462)
##
## -----------------------------------------------
## Observations 235
## R2 0.021
## Adjusted R2 0.017
## Residual Std. Error 4.072 (df = 233)
## F Statistic 5.046** (df = 1; 233)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Now we can make it look even better - especially if this was final results for a blog post. First, let’s make sure the code chunk doesn’t show up. At the beginning of the chunk do the following
{r, echo=FALSE, results='asis', message=FALSE}
echo=FALSE will hide the R code; results='asis' will convert HTML code to nice output, and message=FALSE will hide any error messages.
In stargazer change type="text" to type="html".
You won’t see the really nice table until you knit.
| Dependent variable: | |
| top1 | |
| MemTotal | -0.079** |
| (0.035) | |
| Constant | 16.454*** |
| (0.462) | |
| Observations | 235 |
| R2 | 0.021 |
| Adjusted R2 | 0.017 |
| Residual Std. Error | 4.072 (df = 233) |
| F Statistic | 5.046** (df = 1; 233) |
| Note: | p<0.1; p<0.05; p<0.01 |
Question: What if you did a t-test for the relationship between top 1% income share and union membership? Do you get the same results as the regression? Why or why not?
We can now add controls
result.2 <- lm(top1 ~ MemTotal + population, data = msa_inq_union2)
result.3 <- lm(top1 ~ MemTotal + population + median_income, data = msa_inq_union2)
result.4 <- lm(top1 ~ MemTotal + population + median_income + college_pct, data = msa_inq_union2)
result.5<- lm(top1 ~ MemTotal + population + median_income + college_pct + unemployed, data = msa_inq_union2)
Now let’s make a nice looking table worthy of a blog post. Using stargazer, begin the code chunk with:
{r, results='asis', echo=FALSE, message=FALSE}
and in the chunk
stargazer(result.2, result.3, result.4, result.5, title="Results", align=TRUE, type="html")
and when you knit you have a nice looking table!
| Dependent variable: | ||||
| top1 | ||||
| (1) | (2) | (3) | (4) | |
| MemTotal | -0.088*** | -0.103*** | -0.089** | -0.094** |
| (0.033) | (0.034) | (0.035) | (0.037) | |
| population | 0.00000*** | 0.00000*** | 0.00000*** | 0.00000*** |
| (0.00000) | (0.00000) | (0.00000) | (0.00000) | |
| median_income | 0.0001* | 0.00002 | 0.00002 | |
| (0.00003) | (0.00004) | (0.00004) | ||
| college_pct | 6.836 | 7.455* | ||
| (4.281) | (4.459) | |||
| unemployed | 5.892 | |||
| (11.640) | ||||
| Constant | 15.803*** | 13.172*** | 13.311*** | 12.442*** |
| (0.449) | (1.578) | (1.575) | (2.332) | |
| Observations | 235 | 235 | 235 | 235 |
| R2 | 0.141 | 0.152 | 0.161 | 0.162 |
| Adjusted R2 | 0.133 | 0.141 | 0.146 | 0.144 |
| Residual Std. Error | 3.824 (df = 232) | 3.808 (df = 231) | 3.795 (df = 230) | 3.801 (df = 229) |
| F Statistic | 18.970*** (df = 2; 232) | 13.764*** (df = 3; 231) | 11.030*** (df = 4; 230) | 8.847*** (df = 5; 229) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
For your blog post, once you are ready to finalize things, you’ll want to use echo=FALSE and message=FALSE a lot! You don’t want your reader looking through all the code.
For more details on formatting regresison results using stargazer, see here and here.
One thing to note is that as we add additional controls, the coefficient for union membership is relatively stable and remains statistically significant at the 5% level.
Question: Are there additional controls you would like to add to the regression? Why?
END