Declines in Manufacturing Employment and Marriage Rates

Consequences of the China Import Shock

Alexander Ng

May 17, 2020

Abstract

Using data published by Dorn (2019) for all 722 local labor markets in the continental United States from 1990-2014, we study the impact of the “China Shock” (import growth in manufacturing) on gender specific differentials in employment status, wages, and marital status of young women. This paper replicates the methodology and select findings in Autor, Dorn, & Hanson (2019). by 1.06 percent of the 18-39 age population, reduces male relative annual earnings gap by $445 at the 50th percentile of earnings distribution and reduces marriage rates by .95 percent of 18-39 female population. These findings support Gary Becker’s theory on the impact of employment loss on the economics and marital value of young men. We apply instrumental variables (IV) regression and implement results in R. Unlike regression analyses focused on prediction accuracy, this study focuses on inference

The effects are statistically and socially significant: one unit trade shock reduces blue-collar employment of parameter estimates for policy planning and model assessment.¹ Keywords: International Trade, Labor-market adjustment, Marriage , Manufacturing, inequality

1 Introduction

In this paper, we weave together two strands of economic research: international trade and household economics to study the impact of import competition from China on US labor markets and the concurrent impact on US marriage rates.

China has emerged as the second largest economy in the world in the past three decades driven by its role in world trade. Standard economic theory argues that international trade improves welfare. However, the gains from international trade are not distributed equally to regions or social classes within countries. In the United States, international trade with China has resulted in an increasing net import of Chinese goods over this period (Autor, Dorn, & Hanson, 2016).

The household economic theory of Gary Becker treats marriage as a rational enterprise that seeks to maximize economic utility of both spouses. Using a comparative advantage argument for marital partners as used in international trade between nations, Becker argues that a higher earning spouse (usually the man) and a lower earning spouse (usually the woman) benefit in marriage by a division of labor: earning income vs. household management and child rearing. A prediction of Becker’s theory is that a shrinking earnings gap between men and women reduces the utility of marriage and therefore marriage rates. The sociological theory is hard to test in practice because of the difficulty in distinguishing cause and effect. The “China shock” provides an opportunity to test these theories. Autor et al. (2019) uses variations in gender-specific employment and industry specialization across local market markets to test Becker’s theory using Chinese imports as an external shock.

This paper replicates key findings of Autor et al. (2019), explores some additional directions and successfully translate their results from Stata to R. Section 2 reviews some relevant big picture trends in marriage, manufacturing employment and Becker’s household economic theory. Section 3 explains the methodology used in building the IV regressions. Section 4 reports the regression results and interpretes the model output. Section 5 puts the findings in context with current market news.

2 Literature Review

This section reviews salient facts in China import studies, US marriage trends and considers the strengths and weaknesses of Becker’s theory in the context of Autor et al. (2019).

The literature on international trade with China is vast. Autor et al. (2019) is part of an ambitious ongoing research programme undertaken by three economists at MIT, Zurich and Harvard. Their core findings about the impact of Chinese trade on local labor markets was developed in Autor, Dorn, & Hanson (2013) which is cited and used widely. To date, they are published 13 research articles on their website ChinaShock. In a review article (Autor et al., 2016), the authors point out that China shock began after internal trade liberalization policies were adopted in 1990 and accelerated with China’s accession to the WTO.

Eriksson, Russ, Shambaugh, & Xu (2019) observes that the long-term decline in manufacturing industries over a century was also driven by other forces including: technological innovation, product life cycle effects – not just import competition from China or earlier competition from Japan. Moreover, this phenomenon is not limited to the US and documented in other European countries. The trend is displayed in Figure A.2.

Bloom, Handley, Kurman, & Luck (2019) extend the ADH methodology to show the Chinese import shock affected low human capital regions more severely. Unlike Autor et al. (2019), they conclude Chinese competition caused no net job losses but caused a reallocation by industry (from manufacturing to service sectors) and by region (from the South and mid-West) to the coasts. They use non-public microdata from the Census in their study.

The literature on marriage trends is likewise enormous. The most salient fact about US marriages is the long term decline in US marriage rates since 1970 as shown in Figure A.1. From a high of 14 marriages per 1000 people per year in 1946 after World War II, the US marriage rate has declined to its present rate of 7 marriages. For this purposes of this study, the China trade shock is one of several factors contributing to that decline. Just as important, the decline in marriage rates is documented in many countries and thus not limited to the United States (Wikipedia (2020)).

Gary Becker argues that marriage occurs when the benefits for the two partners exceeds the benefit remaining single. Marriage allocates the division of labor based on the comparative advantage between the two partners Wikipedia (2020). Sawhill (2014) does not reject Becker’s theory but argues a revision is needed.
She argues that recent research suggests social class also imposes barriers to marriage. Growing income inequality has produced at “shortage of women at the top of the income distribution … and a shortage of men at the bottom end”. Because of class segmentation, sex ratios within classes matter. With few good men in the bottom third of earnings scale, women are choosing to raise a single parent family rather than due with a violent or abusive partner. Likewise, Parker & Stepler (2017) shows a growing education gap in marriage rates. Adults with a 4-year college degree has 65% marriage rates versus 50% among those with high school degree or less.

Stated differently, whereas, in the past, the doctor used to marry the nurse, now the doctor marries another doctor. This is a challenge to Becker’s comparative advantage argument but not to household utility maximization.

On the other hand, recent survey work by Pew Research Center appears to support the utility maximization perspective of Becker’s theory. Among cohabiting but not engaged couples, the top reason to delay marriage is lack of financial readiness by at least one partner. Among all US adults, 71% believe a man’s ability to support a family financially is very important compared to 32% for a woman’s ability. See Figure A.3 for details. The 39 percent male-female opinion gap is largest assymmetry in the survey results (Graf (2019)).

3 Methodology

The methodology constructs a series of instrumental variables (IV) regressions. The approach closely follows Autor et al. (2019) in its modeling and data. The goal of the analysis is inference – establishing the causality and parameter magnitude between the predictor and response variables – not maximization of prediction accuracy.

Consequently, this paper’s approach (inference) departs from that of many other data science studies which focus on predicting responses (forecasting). The inference approach is concerned with model misspecification, omitted variables and endogeneity while the forecasting approach focuses on model parsimony and out-of-sample testing. The outputs of interest for inference are the estimated coefficients of the relevant predictor variables and their statistical significance. The coefficients of the control variables are of lesser importance. Control variables are included in the model to avoid model misspecification error even if the coefficients are not statistically significant.

The argument for the impact of Chinese trade shocks on employment status, earnings gap and marital status requires building a set of related IV regressions against the target dependent variables. Before we discuss the model, we need to consider the construction of the data set. Section 3.1 describes the key predictor and response variables necessary to put into the model. Subsection 3.2 describes the IV regression models construction approach. The subsection 3.3 describes the R code and its implementation.

3.1 Data

The dataset is published by Dorn (2019) as a single large dataframe of $M=1444$ observations and $N=228$ columns in a Stata dta file format. We explain the subset of data used in this paper - organizing our exposition by rows and columns. The rows or observations will consist of a $K=722$ local labor markets observed over $T=2$ periods from 1990-2000 and 2000-2014.

The concept of a local labor market is central to the accuracy of this study. United States Commuting Zones (CZ) serve as the proxy for the concept of a local labor market as defined in USDA (2019). It strike a balance between two competing problems: if the local labor markets are too big (like states or Census divisions) then the data is too coarse and the statistical effects are undetectable But if the regions are too granular, e.g. towns or counties, then macroeconomic data may be unavailable or inaccurate and the number of entities too many. There were 722 CZs in 1990 and these represent a local cluster of counties (typically 4) with certain commuting patterns measured by statistical grouping algorithm: there is more commuting traffic within the counties of a CZ than across those without. A CZ does not have a minimum or maximum population or political affiliation with state. CZs are updated on a decennial basis and identified by their CZone code in the dataset. They are a standard geographical division for labor market research.

We only use a subset of the columns in the dataset. The predictor variables can be divided into the numerous control variables, the endogenous predictor(s) and instrumental variables. Over each of these period, changes in employment status, earnings gap and marital status will be calculated at the local labor market level.

The control predictors are included in the models to avoid misspecification error that would cast doubt on the parameter inferences. These predictors are enumerated below which column names in sans serif font.

Level of share of manufacturing in overall local labor market. l_shind_manuf_cbp, l_sh_empl_f
Level of college education in the population (%) l_sh_popedu_c
Level of foreign born population (%) l_sh_popfborn
Level of task outsourcing (%) l_sh_routine33, l_task_outsource
Time effect t2
Region dummies: all labor markets are assigned to 9 US Census divisions. New England is the base category. Others are mid-atlantic, NE Central, NW Central, South Atlantic, SE Central, SW Central, Mountain and Pacific. reg_midatl, reg_encen, reg_wncen, reg_satl, reg_escen, reg_wscen, reg_mount, reg_pacific
Racial percentages: Blacks, Asian, Hispanics, Others: l_sh_pop_black, l_sh_pop_asian, l_sh_pop_oth, l_sh_pop_hispanic.

The unit of measure of trade shock is “the average change in Chinese import penetration in a CZ’s industries, weight by each industry’s share in initial CZ employment” Autor et al. (2019). The same measure is constructed as used in earlier papers such as (Autor et al., 2013). The trade shock is defined as:

\[ \Delta IP_{i\tau}^{cu} = \sum_{j}\frac{L_{ij90}}{L_{i90}}\Delta IP_{j\tau}^{cu} = \sum_{j}\frac{L_{ij90}}{L_{i90}} \frac{\Delta M_{j\tau}^{cu}}{(Y_{j91}+M_{j91}-X_{j91})} \]

Because these are slow-changing variables, the observation period $\tau$ is measured in decades and identifies one of two time intervals: 1990 to 2000 and then 2000 to 2014. The variables in the formula are defined as follows: Industries are indexed by letter $j$, time periods by $\tau$. $Y_{j91}$ is the national income from industry $j$ in 1991. $\Delta M_{j\tau}^{cu}$ is the growth in Chinese imports in the US for industry $j$ in period $\tau$. Thus, $M_{j91}$ is the gross import $X_{j91}$ represents the gross exports of industry $j$ in base year 1991. The weighting scheme uses $L_{ij90}/L_{i90}$ which is the share of industry $j$ in CZ $i$ total employment in 1990 based on County Business Patterns data maintained by the United States Census. The value of $\Delta IP_{i\tau}^{cu}$ depends on CZ $i$ through its labor share in each industries.

The trade shock metric above does not distinguish impact on male and female workers on each CZ. Because industries hire different proportions of men and women, there could be gender specific effects of trade shock on employment outcomes. To distinguish gender effects, two additional metrics are defined:

\[\Delta IP_{i\tau}^{m,cu} = \sum_{j}(1-f_{ij90}) \frac{L_{ij90} } {L_{i90}} \Delta IP_{j\tau}^{cu} \text{ and } \Delta IP_{i\tau}^{f,cu} = \sum_{j}f_{ij90}\frac{L_{ij90} } {L_{i90}} \Delta IP_{j\tau}^{cu} \]

Note that the male and female components of trade shock add up to the total trade shock.

\[ \Delta IP_{i\tau}^{m,cu} + \Delta IP_{i\tau}^{f,cu}= \Delta IP_{i\tau}^{cu} \]

The instrumental variable using for the IV regression needs to satisfy two conditions: relevance and exogeneity (a.k.a exclusion restriction). Autor et al. (2019) instrument $\Delta IP_{i\tau}^{cu}$ on the concurrent growth of Chinese exports to eight other developed countries. The variable is

\[ \Delta IP_{i\tau}^{co} = \sum_{j} \frac{L_{ij80}}{L_{i80}}\Delta IP_{j\tau}^{co} \] where $\Delta IP_{j\tau}^{co}$ is the import penetration to other countries calculated with a three year lag in industry $j$ and $L_{i80}$ and $L_{ij80}$ are the employment shares of the commuting zone $i$ with a ten year lag. The time lag enables the instrumental variable to mitigate simultaneity bias. The validity of the instrumental variable is argued in Autor et al. (2013).

One argument against the instrumental variable exogeneity is that product demand shocks across the US and other developed countries is correlated. The counterargument is that the import penetration is explained by China’s rising competitiveness (a supply shock), China’s joining the World Trade Organization (WTO) and lowering of trade barriers. They build a trade model to show that the observed net imports are explainable by implied changes in China’s comparative advantage.

3.2 Model

The standard approach to establish causal relationships in econometrics is to use an IV regression with a valid identification strategy. That is because economics does not allow randomized controlled experiments as in the natural sciences. Stock & Watson (2011) describes IV regression in an econometrics context. Sundstrom (2017) describes the IV regressionality function in an R context. Hanck, Arnold, Gerber, & Schmelzer (2019) gives a deeper understanding of the R platform to run IV regression as a companion to the Stock & Watson (2011) and explains the diagnostic testing.

The IV regression models can be divided into two types based on their choice of trade shock predictors:

those using $\Delta IP_{i\tau}^{cu}$ defined below: the average change in import penetration.
those using two predictors: $\Delta IP_{i\tau}^{m,cu} \text{ and } \Delta IP_{i\tau}^{f,cu}$: the male and female-specific import penetration.

In total, this paper calculates 24 IV regressions against 12 dependent variables. We tabulate all regressions in the table A.1 in the Appendix. Each IV model has 19 control predictors and either 1 or 2 endogenous predictor variables. Each endogenous variable is associated 1-to-1 with a corresponding instrumental variable.

The IV regression model structure has the form:

\[ \Delta Y_{sit} = c + \alpha_t + \beta_{1}\Delta IP_{i\tau}^{cu} + \mathbf{X}_{it}^{T}\mathbf{b_2}+e_{si\tau} \] where $\Delta Y_{sit}$ is the decade change in response (e.g. the manufacturing employment share of the population from ages 18-39 in commuting zone CZ $i$ for gender category $s$ (one of male, female or both)). A constant intercept $c$ is generally not of interest. The coefficient $\alpha_t$ is a decade dummy for either 1990-2000 or 2000-2014. Note that all weighted averages are adjusted both for time and population. All changes in the second period 2000-2014 are scaled by 1.4 due to the unequal lengths. The CZ-level change in import exposure is captured by $\Delta IP_{i\tau}^{cu}$ instrumented by the variable $\Delta IP_{i\tau}^{co}$. The vector of control variables $\mathbf{X}_{it}^{T}$ represents the start of period CZ-level predictor.

The gender-specific IV regression model structure is similar except for replacing the import shock by gender specific import shocks and instrumenting on $\Delta IP_{i\tau}^{m,co}$ and $\Delta IP_{i\tau}^{f,co}$ which are defined mutatis mutandis.

\[ \Delta Y_{sit} = c + \alpha_t + \beta_{2}\Delta IP_{i\tau}^{m,cu} + \beta_{3}\Delta IP_{i\tau}^{f,cu} + \mathbf{X}_{it}^{T}\mathbf{b_2}+e_{si\tau} \]

This paper’s Data Appendix reports the parameter estimates and standard errors for $\beta_1, \beta_2, \beta_3$. The summaries of each IV regression model are exported into a separate log file discussed later.

3.3 Software

This paper relies on several resources. First, Dorn (2019) provides both the Stata code and scrubbed data file to enable replication. Second, third party R packages were essential to transforming the data, calculating the IV model output and rendering the table and information in this report. We focus our comments on those aspects likely to cause differences in our regression estimates and those published in Autor et al. (2019). The Stata Code calculates the IV regression model using the Stata ivregress command. Since there is no base R equivalent to ivregress, we have to use third party resources.

We found the AER package essential. Hanck et al. (2019) and Sundstrom (2017) provide useful guidance on adapting the instrumental variables regression functionality by calling ivreg and running diagnostic tests for endogenity and weak instruments. Robust standard errors are also reported using the sandwich covariance estimator. While coefficient estimates were almost identical, standard errors were close but not identical. This did not appear to alter statistical significant levels of parameters materially. Additional utility packages were used: stargazer for regression tables, bookdown for authoring, and rio for conversion of Stata dta files to csv.

4 Results

4.1 Data Exploration

The data is provided by Dorn (2019) in a Stata file workfile9014wwd.dta of 1444 observations of 228 variables. There are no missing or NA values in any of the columns most likely because the authors wrangled and cleansed the data. The authors also provided a data user guide and systematically named all columns. We limit the study to use approximately 95 columns and all observations.

We will explore the data in two ways. By comparing job losses and the import shock metric and examining two Commuting Zones in depth. We follow that with a discussion of summary statistics. The Figure 4.1 illustrates that CZ job losses are indeed correlated with import shocks. We grouped the dataset by CZone over both time periods and showed cumulative job losses vs. cumulative import shock $\Delta IP_{i\tau}^{cu}$ below. Note that correlation does not prove causality. Thus, the instrumenting strategy is critical for that analysis. A further observation of the scatterplot is that no CZs were net exporters to China during that period although a small number had manufacturing job gains.

Figure 4.1: Job Losses vs. Import Shock

We report the top 5 Commuting Zones by young adult manufacturing job loss over the 1990-2014 period in Table A.2 using d_sh_empl_mfg_age1839. A common thread of these top Commuting Zones is that they are exposed to textile and furniture industries which are easily replaced by competitive Chinese imports. The top CZ 6501 is Blue Ridge, GA which spans Fannin and Gilmer Counties. A local news article Southerland (2008) cites the company which closed their plant in Fannin County. As a result of these structural changes, local business leaders and politicians decided to transition the economy from manufacturing to tourism.

That change was kick-started a few years ago by the loss of nearly 1,000 jobs when Levi-Strauss, American Uniform and Shaw Carpet all closed within 12 months of each other. The textile base that once sustained this area was dealt a death blow. The closings were part of a larger trend that has seen these industries flee to low wage foreign countries… Georgia Trend

The second Commuter Zone is CZone 402 centered at Martinsville, Virginia and surrounded by Henry and Patrick counties. The counties of this CZ routinely lead the state in unemployment rate. A 2009 NYTimes article Thomas (2009) summarizes the challenges. Typical of other southern Virginia counties, Martinsville lost its textile and furnituring manufacturing business. Over the decade 2000-2009, 10,000 manufacturing jobs disappeared from the area. Martinsville then turned its hopes toward motorsports racing to compete with Charlotte, NC. That effort failed despite statewide support and local economic development stimulus. An economics professor noted that competing in motorsports was a challenge because “There is a specialized labor pool here [in Charlotte] that you can’t replicate anyplace”.

We summarize the statistics of interest in Table A.3 [Mean Outcomes and Levels 1990, 2000]. Regarding the first Panel A Manufacturing Employment in the young adult population ages 18-39, in 1990, 17.4 percent of men, 8.7 percent of young women, 13.0 percent overall were employed. This declined over the next decade to 14.1 percent of men, 6.7 percent of young women and 10.4 percent overall. Over the period 1990-2014, total manufacturing employment in young adults shrank at a rate of 2.6 percent per decade.

Regarding the employment gap in the young adult population ages 18-39 in Panel B, men had higher employment to population rates than women throughout the sample. In 1990, the men-women employment gap was 14.6 percent, but 1.2 percent more men were unemployed while 15.9 percent more women were not in the labor force. By 2000, some differences had narrowed: men-women employment gap was 12.1 percent, 0.5 percent more men were unemployed while 17.7 percent more women were not in the labor force. Over the period 1990-2014, the employment gap shrank by 2.7 percent per decade.

In Table A.3 Panel C, we observe that men outearned women at all percentiles of the earnings distribution in both 1990 and 2000 but that gap decreased significantly over 24 years. The average decline in earnings gap per decade was 1893 dollars in the 25th percentile, 2125 at the median, 2490 at the 75th percentile. The earnings gap increases with percentile indicating greater variation further up the economic ladder. The level of earnings gap in 2000 requires a comment. In 2000, the 25th percentile earnings gap widened to 7112 (contrary to our claim), but that earnings gap reversed sharply later (2000-2014) shrinking to 3807.

In Table A.3 Panel D, we observe that in 1990 53 percent of women ages 18-39 were married. That declined to 49.7 percent in 2000. The rate of decline per decade was 6.9 percent in married status, 1.6 percent in widowed, separated or divorced status while young single women increased by 8.5 percent per decade.

4.2 Interpretation

IV regression models in this paper are identified by a unique code contained in Table A.1. The 2SLS parameter estimates (called $\beta_1, \beta_2, \beta_3$ in Section 3.2) of the non-control variables are reported grouped by response variable in Tables ??, ??, ?? and ??.

The full summary models using robust standard errors are exported to a text file called IVmodel_summary.txt as part of the diagnostic output of this analysis. The models are exported in the same order as their model codes (e.g siv1, siv2,.., siv24). Consequently, this report only displays parameters $\beta_1, \ldots , \beta_3$ of interest.

We note that regression coefficients were very closely between R and the author’s reported findings from Stata. Generally, we found coefficient estimates to differ by less than 1 percent for all regressions. The non-robust standard errors were quite different (about 20-50% lower) than the authors robust regressions. By using the sandwich robust standard error option we reduce the deviation to about 10-20% relative error. Thus, we do not report replication differences.

In Table ??, we estimate that a 1 unit trade shock reduces young adult employment in manufacturing by 1.06 percent. The realized trade has been about 1 unit per decade during the 1990-2014 period ($\beta_1(iv1)=-1.06$). The effects for young men are similar ($\beta_1(iv3) = -0.99$) and for young women ($\beta_1(iv5)=-1.09$). These are economically significant declines. Based on the 1990 Census, the 18-39 population (89.86 million), a 1 unit trade shock is associated with 952,000 job losses. The realized trade shock over 24 years corresponds to a drop of 2.5 percent among men and 2.7 percent among women. The parameters estimates in models iv1, iv3, iv5 are statistically significant at the 1 percent level.

Focusing on gender specific impact of the trade shocks by male-dominated or female-dominated industries, we see comparable magnitude effects. A unit shock to male-dominated industries reduces male employment by 2.6 percent with a modest statistically insignificant shock to female employment. A unit shock to female-dominated industries reduces female employment by 2.6 percent with a moderate positive but statistically insignificant impact on male employment of .8 percent. (iv4, iv6).

We also test for weak instruments in Table ??. Both F-statistics clearly of 97.7 for iv1 and 42.0 and 86.0 for iv2 clearly reject a null hypothesis of weak instruments. We don’t report this test in subsequent Tables before the results are the same (and independent of the response variable). The Wu-Hausman test clearly shows the presence of endogeneity in the variables $\Delta IP_{i\tau}^{cu}, \Delta IP_{i\tau}^{m,cu}, \Delta IP_{i\tau}^{f,cu}$ in all IV regressions and is reported in all subsequent regression tables. Likewise, we report adjusted $R^2$ which are in the range of 29% to 75% across all the regressions. These results support the reasonableness of the IV models and are not reported in Autor et al. (2019).

Measuring the employment gap is important to shifts in the comparative economic advantage of males over females in the Becker and Wilson theories. In Table ??, we estimate that a 1 unit trade shock reduces the gap in employed males vs. females by .65 percent of the population ($\beta_1(iv7)$). This employment loss is driven not solely by job losses of men in manufacturing but also in other industries. This trade shock is also associated with an 0.46 percent increase in men-vs.-women gap not in the labor force (iv11). A 1 unit male-dominated industry trade shock reduces male relative employment by 3.1 percent and increases males relative non-participation in the labor force (NILF) by 2.8 percent of the population ($\beta_2(iv8), \beta_2(iv12)$ ). These effects are statistically and economically significant. A 1 unit femaled-dominated industry trade shock increases male relative employment by 2.2 percent and reduces NILF by 2.1 percent ($\beta_3(iv8), \beta_3(iv12)$).

Looking at the males-female earnings gap in Table ?? a unit trade shock reduces male relative earnings by $672 at the 25th percentile, $445 at the median and $847 at 75th percentile $\beta_1(iv13), \beta_1(iv15), \beta_1(iv17)$. A male-dominant industry trade shock is even worse reducing male relative earnings by $2216, $2945 and $3685 at the 25th, median and 75th percentiles respectively $\beta_2(iv14), \beta_2(iv16), \beta_2(iv18)$. The impact is uneven across wage earners. Low wage earners are more affected by a unit trade shock since the male relative earnings loss is a larger share of the total male earnings. According to the Becker theory, the marital market value of low wage earning males will decline disportionately.

Finally, marriage rates are also impacted by trade import shocks. Table ?? shows a unit trade shock reduces the percentage of currently married women in the 18-39 year range by 0.95 percent, the number of divorced, widowed or separated women by 0.21 percent and increased the proportion of single women by 1.2 percent. These results are consistent with the predictions of Becker’s theory of household specialization in marriage.

While the status uses comprehensive data from the entire continental United States, it makes a simplifying assumption in the treatment of the data. It assigns uniform employment population-based weighting to the impact of industry imports to each CZ. However, if regions differ in their productivity or cost efficiency within a single industry, the regressions could produce biased parameter estimates or if the sensitivity of industries to imports changes over time, these effects are hard to capture even with the time control variable in the regression.

5 Conclusions

This paper finds that Chinese import shocks from 1990-2014 reduced the employment prospects of young adult men, narrowed their relative earnings gap over women and reduced women’s marital rates while increasing their single status. Using the variation in labor market exposure to competition in manufacturing at the CZ level, we obtain parameter estimates associated with these trade shocks and their impact on these variables. Trade shocks then provide a natural experiment to test and support Becker’s theory of household specialization. This study is a replication of existing peer-reviewed research so the credit for the data construction, parameter estimates and key ideas belongs to Autor et al. (2019). While R practitioners have replicated Autor et al. (2013) as an R package ADH, this paper contributes the first known implementation in R. We conclude with a big picture concern and some directions for future work.

Because both marriage rates and manufacturing employment have been on long term declines, a logical critique of this paper’s findings is that explaining both effects in terms of the China shock is gratuitous. Labor market shocks from net imports are only partial explanations contributing to the observed effects. The critique would argue that the IV regression model is measuring three coincidental effects not causal effects.

Here, both the economic theory and the identification strategy for the IV regressions are critical to proving the story. Finding other identification strategies to confirm causality may be a worthwhile future direction of research. Overall, I conclude that the economic and social theories remain relevant for today and the central thesis is plausible. My key remaining concern is on policymaking.

Do these results give policy guidance in today’s world? In the current Covid-19 pandemic environment, we are observing a large import shock to the negative direction. Should we expect see a rise in marriage rates or manufacturing employment as a reverse import shock takes place? Even though imports are contributing factors, social distancing and economy lockdowns may have larger opposite first order effect. I conclude that policymakers would need to prioritize first and second order effects implied by this study. Presumably, policymakers would wish to address employment losses in manufacturing rather than promoting marriage rates through governmental means.

This analysis has been extended in other directions. By studying the impact of trade shocks on young male mortality, health and populations, we can evaluate changes in the economic attractiveness of young adult males. This would more fully establish Becker on household specialisation. This demographic work has been undertaken in Autor et al. (2019) and Autor et al. (2013). They have not been included here due to space limitations. Second, microdata at the firm level can be used to measure employment and output effects of trade competition from China. This approach has been undertaken by Bloom et al. (2019).

6 References

Autor, D., Dorn, D., & Hanson, G. (2013). The china syndrome: Local labor market effects of import competition in the united states. American Economic Review, 103(6), 2121–2168. https://doi.org/10.1257/aer.103.6.2121

Autor, D., Dorn, D., & Hanson, G. (2016). The china shock: Learning from labor-market adjustment to large changes in trade. Annual Review of Economics, 8, 205–240. https://doi.org/10.1146/annurev-economics-080315-015041

Autor, D., Dorn, D., & Hanson, G. (2019). When work disappears: Manufacturing decline and the falling marriage market value of young men. American Economic Review: Insights, 1(2), 161–178. https://doi.org/10.1257/aeri.20180010

Bloom, N., Handley, K., Kurman, A., & Luck, P. (2019). The impact of chinese trade on u.s. Employment: The good, the bad, and the debatable. Stanford University, NBER. Retrieved from https://nbloom.people.stanford.edu/sites/g/files/sbiybj4746/f/bhkl_posted_draft.pdf

Dorn, D. (2019). Replication data package: "When work disappears: Manufacturing decline and the falling marriage market value of young men". www.ddorn.net. Retrieved from https://www.ddorn.net/data/ADH-WWD-FileArchive.zip

Eriksson, K., Russ, K., Shambaugh, J. C., & Xu, M. (2019). Trade shocks and the shifting landscape of u.s. Manufacturing (Working Paper No. 25646). National Bureau of Economic Research. https://doi.org/10.3386/w25646

Graf, N. (2019). Key findings on marriage and cohabitation in the u.s. Pew Research Center. Retrieved from https://pewrsr.ch/2NnfuOJ

Hanck, C., Arnold, M., Gerber, A., & Schmelzer, M. (2019). Introduction to econometrics with r. University of Duisberg-Essen. Retrieved from https://www.econometrics-with-r.org/

Parker, K., & Stepler, R. (2017). As u.s. Marriage rate hovers at 50. Pew Research Center. Retrieved from https://www.pewresearch.org/fact-tank/2017/09/14/as-u-s-marriage-rate-hovers-at-50-education-gap-in-marital-status-widens/

Sawhill, I. (2014). The economics of marriage, and family breakdown. Brookings Institute. Retrieved from https://www.brookings.edu/opinions/the-economics-of-marriage-and-family-breakdown/

Southerland, R. (2008). Fannin county: A brand new economy. GeorgiaTrend. Retrieved from https://www.georgiatrend.com/2008/05/01/fannin-county-a-brand-new-economy/

Stock, J., & Watson, M. (2011). Introduction to econometrics (3rd edition). Addison Wesley Longman.

Sundstrom, W. (2017). Instrument variables regression. Santa Clara University. Retrieved from https://rpubs.com/wsundstrom/t_ivreg

Thomas, K. (2009). A longtime racing town shifts its focus. New York Times. Retrieved from https://www.nytimes.com/2009/03/31/sports/othersports/31nascar.html

USDA. (2019). Commuting zones and labor market areas. United States Department of Agriculture, Economic Research Service. Retrieved from https://www.ers.usda.gov/data-products/commuting-zones-and-labor-market-areas/

Wikipedia. (2020). Family economics. Wikipedia.org. Retrieved from https://en.wikipedia.org/wiki/Family_economics

Appendix

A Tables & Charts

Figure A.1: Marriage Rates: 1867-2017, United States

Figure A.2: US Manufacturing Share of Employment 1939-2015

Figure A.3: Survey on Attitudes on Finance in Marriage

Figure A.1 shows marriage rates declined from a peak of 16 marriages per 1000 people after World War II to 7 marriages per 1000 in 2017. Source: U.S. Census Bureau and Centers for Disease Control. Reproduced from “State of Our Unions 2019” The National Marriage Project, University of Virginia. Figure A.2 shows Manufacturing share of US non-farm employment in the United States in decline since World War II. Figure A.3 shows graphical summaries of Pew Research Center surveys on US responses to Marriage readiness surveys to unmarried non-engaged cohabitation partners and views on marriage appropriateness of partners.

Table A.1: Inventory of IV Regression Models
IVModel	response	Main	Gender	Purpose
Code	Column Name in Dataset	shock	shock	Description of Model
iv1	d_sh_empl_mfg_age1839	Y	N	M+F Employment vs. Trade Shock
iv2	d_sh_empl_mfg_age1839	N	Y	M+F Employment vs. Gender Trade Shocks
iv3	d_sh_empl_mfg_age1839m	Y	N	Male Employment vs Trade Shock
iv4	d_sh_empl_mfg_age1839m	N	Y	Male Employment vs Gender Trade Shocks
iv5	d_sh_empl_mfg_age1839f	Y	N	Female Employment vs Trade Shock
iv6	d_sh_empl_mfg_age1839f	N	Y	Female Employment vs Gender Trade Shocks
iv7	d_gender_gap_emp_1839	Y	N	Gender Employment Gap vs Trade Shock
iv8	d_gender_gap_emp_1839	N	Y	Gender Employment Gap vs Gender Trade Shock
iv9	d_gender_gap_unemp_1839	Y	N	Gender Unemployed Gap vs Trade Shock
iv10	d_gender_gap_unemp_1839	N	Y	Gender Unemployed Gap vs Gender Trade Shock
iv11	d_gender_gap_nonemp_1839	Y	N	Gender NILF Gap vs Trade Shock
iv12	d_gender_gap_nonemp_1839	N	Y	Gender NILF Gap vs Gender Trade Shock
iv13	d_gender_gap_inc1839p25	Y	N	Percentile 25 Earnings Gap vs Trade Shock
iv14	d_gender_gap_inc1839p25	N	Y	Percentile 25 Earnings Gap vs Gender Trade Shock
iv15	d_gender_gap_inc1839p50	Y	N	Percentile 50 Earnings Gap vs Trade Shock
iv16	d_gender_gap_inc1839p50	N	Y	Percentile 50 Earnings Gap vs Gender Trade Shock
iv17	d_gender_gap_inc1839p75	Y	N	Percentile 75 Earnings Gap vs Trade Shock
iv18	d_gender_gap_inc1839p75	N	Y	Percentile 75 Earnings Gap vs Gender Trade Shock
iv19	d_sh_fem1839_marrexsep	Y	N	Married Status vs Trade Shock
iv20	d_sh_fem1839_marrexsep	N	Y	Married Status vs Gender Trade Shock
iv21	d_sh_fem1839_widdivsep	Y	N	Widowed, Div, Sep vs Trade Shock
iv22	d_sh_fem1839_widdivsep	N	Y	Widowed, Div, Sep vs Gender Trade Shock
iv23	d_sh_fem1839_single	Y	N	Single vs Trade Shock
iv24	d_sh_fem1839_single	N	Y	Single vs Gender Trade Shock
Note:
IV code (leftmost column) refers to regression outputs of the Data Appendix
¹ IVModel code also refers to the object name in the R code.

Table A.2: Top 5 Commuting Zones by Manufacturing Job Losses 1990-2014
CZone	Import Shock	Job Losses	Description
6501	2.11	-27.92	Blue Ridge, GA (Fannin, Gilmer Counties)
402	6.43	-23.68	Martinsville, VA (Henry, Patrick Counties)
1002	7.53	-19.38	Morganton, NC (Burke,McDowell,Mitchell,Yancey Counties)
800	3.58	-19.16	Gastonia, NC (Cleveland, Gaston, Lincoln, Rutherford Counties)
1100	8.31	-18.09	Hickory, NC (Alexander, Caldwell, Catawba, Iredell Counties)
Note:
Job Losses in %Population(18-39 years old)

Table A.3: Mean Outcomes and Levels 1990, 2000
Description	Mean outcome	Level in 1990	Level in 2000
A. Manufacturing Employment
Total (%) 18-39	-2.61	12.98	10.38
Male (%) 18-39	-3.19	17.37	14.14
Female(%) 18-39	-2.06	8.68	6.67
B. Male-Female Employment Gap
Employed(%) 18-39	-2.74	14.64	12.09
Unemploy(%) 18-39	0.03	1.22	0.54
NILF (%) 18-39	2.71	-15.87	-17.69
C. Male-Female Earnings Gap
Income P25 18-39	-1893.86	6925.72	7112.02
Income Med 18-39	-2125.92	13375.94	12679.38
Income P75 18-39	-2490.60	17489.16	16055.27
D. Women’s Marital Status
Married % 18-39	-6.92	53.05	49.73
Widow,Sep,Div18-39	-1.62	12.11	11.16
Single 18-39	8.55	34.84	39.11
Note:
Means are weighted by Commuting Zone/Total Continental US population and time

B Code

In this section, we provide all the code used in the assignment.

library(tidyverse)
library(tidyselect)
library(ggplot2)
library(knitr)
library(kableExtra)
library(skimr)
library(AER)
library(stargazer)

knitr::opts_chunk$set(echo = FALSE, message=FALSE, warning=FALSE)
knitr::include_graphics('chart-job-losses.png')

# Code Block: transform-csv
# Sets the working folder

working_dir = "/Volumes/DATA/dat/ang/datascience/621_DATA_MINING/FINAL_PROJECT"

data_dir = paste0(working_dir, "/", "ADH-WWD-AERi-Data/dta")
other_dir = paste0(working_dir, "/", "ADH-WWD-AERi-Data/other")
dat2_dir = paste0(working_dir, "/", "dat2")

setwd(working_dir)
#  Convert all the raw data files from Autor, Dorn, Hanson website
#  and convert format from Stata to CSV using the rio conversion package.
#  This code chunk does not need to be evaluated more than once.
# ---------------------------------------------------------------------------
raw_files = c("czone9014_percentiles.dta", 
             "graph_bygender.dta" ,
             "graph_gap.dta",
             "workfile7090wwd.dta",
             "workfile9014wwd.dta" )

#
# Convert Stata data files from the researchers to R csv files.
# ---------------------------------------------------------------
library(rio)

# Some files are located in a data subfolder of the zip file.

setwd(data_dir)

for(f in raw_files ){
  
    print(f)
    newfile = str_replace(f, "\\.dta", "\\.csv")
    newfile = paste0(dat2_dir, "/", newfile)
    rio::convert(f, newfile)
}

# Some files are located in an other subfolder of the zip file
setwd(other_dir)

for(f in list.files() ){
    print(f)  
    newfile = str_replace(f, "\\.dta", "\\.csv")
    newfile = paste0(dat2_dir, "/", newfile)
    rio::convert(f, newfile)
}
# All files have been converted and placed into a /dat2 subfolder
# keeping the original name with csv suffix.

setwd(working_dir)
# Load the transformed data files

parse_file <- function(f){
    df = read.csv(paste0(dat2_dir , "/", f ))  
    return(df)
}

#  Load the needed file(s)
# This file has everything staged for all regressions.
#
workfile9014wwd = parse_file("workfile9014wwd.csv")  

# Variable lists are passed to a formula

control_vars = c("l_shind_manuf_cbp", "l_sh_popedu_c" , "l_sh_popfborn", 
    "l_sh_empl_f", "l_sh_routine33", "l_task_outsource", "t2" ,
    "reg_midatl", "reg_encen", "reg_wncen", "reg_satl", "reg_escen" , 
    "reg_wscen", "reg_mount" , "reg_pacif"  )

race_vars = c("l_sh_pop_black", "l_sh_pop_asian",  
              "l_sh_pop_oth", "l_sh_pop_hispanic")

endogenous_mainshock = c("d_impusch_p9")
instrumental_mainshock = c("d_impotch_p9_lag")
endogenous_gendershock = c("d_impuschm_p9cen", "d_impuschf_p9cen")
instrumental_gendershock = c("d_impotchm_p9cen_lag", "d_impotchf_p9cen_lag")


build_formula <- function( Y, ENDO, INSTR, vec_ctrl, vec_race )
{
    f = as.formula(  
      paste0(Y, " ~ ", 
             paste(c(ENDO, vec_ctrl, vec_race), collapse = " + " ),
             " | " ,
             paste(c(INSTR, vec_ctrl, vec_race), collapse = " + ") ) 
      )
    return(f)
}


build_model <- function(Y, ENDO, INSTR, vec_ctrl, vec_race){

    f = build_formula(Y, ENDO, INSTR, vec_ctrl , vec_race )  
    model = ivreg( formula = f, weights = timepwt24, data=workfile9014wwd)
    return(model)
}

build_model_summary <- function( ivmod ){
  return(summary(ivmod, vcov = sandwich, diagnostics = TRUE))  
}

build_results <- function(Yname){
     mod1 = build_model(Yname, endogenous_mainshock, 
                        instrumental_mainshock, control_vars, race_vars )  
  
     mod2 = build_model(Yname, endogenous_gendershock,
                        instrumental_gendershock, control_vars, race_vars)
     
     summary1 = build_model_summary(mod1)
     summary2 = build_model_summary(mod2)
     
     results = list(mainshock_model = mod1, 
                          gendershock_model = mod2,
                          mainshock_summary = summary1, 
                          gendershock_summary = summary2)
     return(results)
}


#
#  Main shock. Table 1A:  Mfg Employment by both Genders
#  Evaluate the both-gender employment share response based on shock from combined impact
#  of trade impact on a both-gender basis and from trade impact
#  on a gender specific basis.
# ------------------------------------------------------------------------

res1A = build_results("d_sh_empl_mfg_age1839")
iv1  = res1A$mainshock_model
iv2  = res1A$gendershock_model
siv1 = res1A$mainshock_summary
siv2 = res1A$gendershock_summary

res1Am = build_results("d_sh_empl_mfg_age1839m")
iv3    = res1Am$mainshock_model
iv4    = res1Am$gendershock_model
siv3   = res1Am$mainshock_summary
siv4   = res1Am$gendershock_summary

res1Af = build_results("d_sh_empl_mfg_age1839f")
iv5    = res1Af$mainshock_model
iv6    = res1Af$gendershock_model
siv5   = res1Af$mainshock_summary
siv6   = res1Af$gendershock_summary

#
#  Main shock. Table 1B:  Employment Gap Status
#
# Table 1B - Column 1

res1B  = build_results("d_gender_gap_emp_1839")
iv7    = res1B$mainshock_model
iv8    = res1B$gendershock_model
siv7   = res1B$mainshock_summary
siv8   = res1B$gendershock_summary

# Table 1B column 2  Gender Gap unemployed

res1B2  = build_results("d_gender_gap_unemp_1839")

iv9     = res1B2$mainshock_model
iv10    = res1B2$gendershock_model
siv9    = res1B2$mainshock_summary
siv10   = res1B2$gendershock_summary

# Table 1B  column 3

res1B3  = build_results("d_gender_gap_nonemp_1839")

iv11     = res1B3$mainshock_model
iv12     = res1B3$gendershock_model
siv11    = res1B3$mainshock_summary
siv12    = res1B3$gendershock_summary

#
#  Main shock. Table 1C:  Earnings Gap in Dollar Terms
#
#  Percentile 25
#
res1Cp25  = build_results("d_gender_gap_inc1839p25")
iv13      = res1Cp25$mainshock_model
iv14      = res1Cp25$gendershock_model
siv13     = res1Cp25$mainshock_summary
siv14     = res1Cp25$gendershock_summary

#
#   Percentile 50
#
res1Cp50  = build_results("d_gender_gap_inc1839p50")
iv15      = res1Cp50$mainshock_model
iv16      = res1Cp50$gendershock_model
siv15     = res1Cp50$mainshock_summary
siv16     = res1Cp50$gendershock_summary

#
#   Percentile 75
#
res1Cp75  = build_results("d_gender_gap_inc1839p75")
iv17      = res1Cp75$mainshock_model
iv18      = res1Cp75$gendershock_model
siv17     = res1Cp75$mainshock_summary
siv18     = res1Cp75$gendershock_summary


#
#   TABLE 3A Women's Marital Status
#
#   for Married ex separated
res3Amar  = build_results("d_sh_fem1839_marrexsep")
iv19      = res3Amar$mainshock_model
iv20      = res3Amar$gendershock_model
siv19     = res3Amar$mainshock_summary
siv20     = res3Amar$gendershock_summary

res3Adiv  = build_results("d_sh_fem1839_widdivsep")
iv21      = res3Adiv$mainshock_model
iv22      = res3Adiv$gendershock_model
siv21     = res3Adiv$mainshock_summary
siv22     = res3Adiv$gendershock_summary

res3Adiv  = build_results("d_sh_fem1839_widdivsep")
iv21      = res3Adiv$mainshock_model
iv22      = res3Adiv$gendershock_model
siv21     = res3Adiv$mainshock_summary
siv22     = res3Adiv$gendershock_summary


res3Asing = build_results("d_sh_fem1839_single")
iv23      = res3Asing$mainshock_model
iv24      = res3Asing$gendershock_model
siv23     = res3Asing$mainshock_summary
siv24     = res3Asing$gendershock_summary


workfile9014wwd %>% filter(yr == 1990 ) %>% 
      mutate(wt = timepwt24) %>%
      summarize( 
               # CALC-1
               m_l_sh_empl_mfg_age1839  = sum(wt * l_sh_empl_mfg_age1839), 
               m_l_sh_empl_mfg_age1839m = sum(wt * l_sh_empl_mfg_age1839m), 
               m_l_sh_empl_mfg_age1839f = sum(wt * l_sh_empl_mfg_age1839f),
               # CALC-2
               m_l_gender_gap_emp_1839  = sum(wt * l_gender_gap_emp_1839), 
               m_l_gender_gap_unemp_1839  = sum(wt * l_gender_gap_unemp_1839), 
               m_l_gender_gap_nonemp_1839  = sum(wt * l_gender_gap_nonemp_1839),
               # CALC-5 
               m_l_gender_gap_inc1839p25= sum(wt * l_gender_gap_inc1839p25),
               m_l_gender_gap_inc1839p50= sum(wt * l_gender_gap_inc1839p50),
               m_l_gender_gap_inc1839p75= sum(wt * l_gender_gap_inc1839p75),
               # CALC-7
               m_l_sh_fem1839_marrexsep = sum(wt * l_sh_fem1839_marrexsep),
               m_l_sh_fem1839_widdivsep = sum(wt * l_sh_fem1839_widdivsep),
               m_l_sh_fem1839_single    = sum(wt * l_sh_fem1839_single)
               ) -> levels1990


sum_wt2 = 1.4  # time weighting for the 2000-2014 period

workfile9014wwd %>% filter(yr == 2000 ) %>% 
      mutate(wt = timepwt24) %>%
      summarize( 
      # CALC-1
      m_l_sh_empl_mfg_age1839  = sum(wt * l_sh_empl_mfg_age1839)/sum_wt2, 
      m_l_sh_empl_mfg_age1839m = sum(wt * l_sh_empl_mfg_age1839m)/sum_wt2, 
      m_l_sh_empl_mfg_age1839f = sum(wt * l_sh_empl_mfg_age1839f)/sum_wt2,
      # CALC-2
      m_l_gender_gap_emp_1839  = sum(wt * l_gender_gap_emp_1839)/sum_wt2, 
      m_l_gender_gap_unemp_1839  = sum(wt * l_gender_gap_unemp_1839)/sum_wt2, 
      m_l_gender_gap_nonemp_1839  = sum(wt * l_gender_gap_nonemp_1839),
      # CALC-5 
      m_l_gender_gap_inc1839p25= sum(wt * l_gender_gap_inc1839p25)/sum_wt2,
      m_l_gender_gap_inc1839p50= sum(wt * l_gender_gap_inc1839p50)/sum_wt2,
      m_l_gender_gap_inc1839p75= sum(wt * l_gender_gap_inc1839p75)/sum_wt2,
      # CALC-7
      m_l_sh_fem1839_marrexsep = sum(wt * l_sh_fem1839_marrexsep)/sum_wt2,
      m_l_sh_fem1839_widdivsep = sum(wt * l_sh_fem1839_widdivsep)/sum_wt2,
      m_l_sh_fem1839_single    = sum(wt * l_sh_fem1839_single)/sum_wt2
               ) -> levels2000


sum_wt = sum(workfile9014wwd$timepwt24)

workfile9014wwd %>% 
      mutate(wt = timepwt24) %>%
      summarize( 
         # CALC-3
         mo_d_sh_empl_mfg_age1839    = sum( wt * d_sh_empl_mfg_age1839) / sum_wt,
         mo_d_sh_empl_mfg_age1839m   = sum( wt * d_sh_empl_mfg_age1839m) / sum_wt,
         mo_d_sh_empl_mfg_age1839f   = sum( wt * d_sh_empl_mfg_age1839f) / sum_wt ,
         
         mo_d_gender_gap_emp_1839    = sum( wt * d_gender_gap_emp_1839) / sum_wt,
         mo_d_gender_gap_unemp_1839  = sum( wt * d_gender_gap_unemp_1839) / sum_wt,
         mo_d_gender_gap_nonemp_1839 = sum( wt * d_gender_gap_nonemp_1839) / sum_wt,
         # CALC-4
         mo_d_gender_gap_inc1839p25  = sum( wt * d_gender_gap_inc1839p25) / sum_wt, 
         mo_d_gender_gap_inc1839p50  = sum( wt * d_gender_gap_inc1839p50) / sum_wt, 
         mo_d_gender_gap_inc1839p75  = sum( wt * d_gender_gap_inc1839p75) / sum_wt, 
         # CALC-6
         mo_d_sh_fem1839_marrexsep   = sum( wt * d_sh_fem1839_marrexsep) / sum_wt, 
         mo_d_sh_fem1839_widdivsep   = sum( wt * d_sh_fem1839_widdivsep) / sum_wt, 
         mo_d_sh_fem1839_single      = sum( wt * d_sh_fem1839_single) / sum_wt
         )  -> mean_outcomes
knitr::include_graphics('img/marriage_rates.png')
imglist = c( 'img/Autor_Manufacturing_Share.png')
knitr::include_graphics(imglist)
imglist = c( 'img/Pew_Marriage_Combined.png')
knitr::include_graphics(imglist)

workfile9014wwd %>% select(czone, yr, timepwt24, d_impusch_p9, d_sh_empl_mfg_age1839) %>%
  group_by(czone) %>% summarise(import_shock = sum(d_impusch_p9), 
                                job_losses = sum(d_sh_empl_mfg_age1839) ) %>% 
  arrange(job_losses) -> gg

gg %>% ggplot(aes(x=import_shock, y=job_losses)) + 
  geom_point(size=0.5, color ="blue", alpha = .5) + 
  ggtitle(label="Import Shocks vs. Job Losses (1990-2014)", 
          subtitle="All Commuting Zones in lower 48 States. Data from ADH 2019") +
  xlab("Change in Chinese Imports d_impusch_p9") + 
  ylab("Job Losses (%/1990 Population)") -> ggchart

ggsave(filename="chart-job-losses.png", plot=ggchart)
   

ggtop5 = head(gg, n=5)

ggtop5$description = c("Blue Ridge, GA (Fannin, Gilmer Counties)", 
                       "Martinsville, VA (Henry, Patrick Counties)", 
                       "Morganton, NC (Burke,McDowell,Mitchell,Yancey Counties)", 
                       "Gastonia, NC (Cleveland, Gaston, Lincoln, Rutherford Counties)", 
                       "Hickory, NC (Alexander, Caldwell, Catawba, Iredell Counties)")


df_regression = data.frame( IVModel   = c("Code", paste("iv", as.character(1:24), sep="" ) ),
                            response = c("Column Name in Dataset", 
                                          "d_sh_empl_mfg_age1839", # 1
                                          "d_sh_empl_mfg_age1839", # 2
                                          "d_sh_empl_mfg_age1839m", # 3
                                          "d_sh_empl_mfg_age1839m", # 4
                                          "d_sh_empl_mfg_age1839f", # 5
                                          "d_sh_empl_mfg_age1839f", # 6
                                          "d_gender_gap_emp_1839",  # 7
                                          "d_gender_gap_emp_1839",  # 8
                                          "d_gender_gap_unemp_1839",  # 9
                                          "d_gender_gap_unemp_1839",  # 10
                                          "d_gender_gap_nonemp_1839",  # 11
                                          "d_gender_gap_nonemp_1839",  # 12
                                          "d_gender_gap_inc1839p25",  # 13
                                          "d_gender_gap_inc1839p25",  # 14
                                          "d_gender_gap_inc1839p50",  # 15
                                          "d_gender_gap_inc1839p50",  # 16
                                          "d_gender_gap_inc1839p75",  # 17
                                          "d_gender_gap_inc1839p75",  # 18
                                          "d_sh_fem1839_marrexsep",  # 19      
                                          "d_sh_fem1839_marrexsep",  # 20      
                                          "d_sh_fem1839_widdivsep",  # 21      
                                          "d_sh_fem1839_widdivsep",  # 22      
                                          "d_sh_fem1839_single",  # 23      
                                          "d_sh_fem1839_single"  # 24      
                                          ) ,
                            Main =       c("shock", 
                                          "Y", "N", "Y","N",
                                          "Y", "N", "Y","N",
                                          "Y", "N", "Y","N",
                                          "Y", "N", "Y","N",
                                          "Y", "N", "Y","N",
                                          "Y", "N", "Y","N"
                                           ) ,
                            Gender =    c("shock", 
                                          "N", "Y", "N","Y",
                                          "N", "Y", "N","Y",                                          
                                          "N", "Y", "N","Y",                                          
                                          "N", "Y", "N","Y",  
                                          "N", "Y", "N","Y",                                          
                                          "N", "Y", "N","Y" ) ,
                    Purpose =  c( "Description of Model",
                                  "M+F Employment vs. Trade Shock",
                                  "M+F Employment vs. Gender Trade Shocks",
                                  "Male Employment vs Trade Shock",
                                  "Male Employment vs Gender Trade Shocks",
                                  "Female Employment vs Trade Shock",                                          
                                  "Female Employment vs Gender Trade Shocks",                                          
                                  "Gender Employment Gap vs Trade Shock",
                                  "Gender Employment Gap vs Gender Trade Shock",
                                  "Gender Unemployed Gap vs Trade Shock",
                                  "Gender Unemployed Gap vs Gender Trade Shock",
                                  "Gender NILF Gap vs Trade Shock",
                                  "Gender NILF Gap vs Gender Trade Shock",
                                  "Percentile 25 Earnings Gap vs Trade Shock",
                                  "Percentile 25 Earnings Gap vs Gender Trade Shock",
                                  "Percentile 50 Earnings Gap vs Trade Shock",
                                  "Percentile 50 Earnings Gap vs Gender Trade Shock",
                                  "Percentile 75 Earnings Gap vs Trade Shock",
                                  "Percentile 75 Earnings Gap vs Gender Trade Shock",
                                  "Married Status vs Trade Shock",
                                  "Married Status vs Gender Trade Shock",
                                  "Widowed, Div, Sep vs Trade Shock",
                                  "Widowed, Div, Sep vs Gender Trade Shock",
                                  "Single vs Trade Shock",
                                  "Single vs Gender Trade Shock"  )
                            ) 
                           
df_regression %>% kable(
            caption="Inventory of IV Regression Models", 
            booktabs=T) %>% 
            kable_styling( full_width = F, 
                           latex_options = c("HOLD_position") ,
                 bootstrap_options = c("bordered", "striped") ) %>%
             footnote(general=
            "IV code (leftmost column) refers to regression outputs of the Data Appendix",
            number = c("IVModel code also refers to the object name in the R code.")
            ) %>%
  landscape()

ggtop5 %>%  kable(caption = 
                    "Top 5 Commuting Zones by Manufacturing Job Losses 1990-2014", 
                  digits=2 ,
                col.names = c("CZone", "Import Shock", "Job Losses", "Description")
        ) %>%
  kable_styling(latex_options = c("HOLD_position"), 
                bootstrap_options = c("bordered", "striped")) %>%
       column_spec(2, width="8em") %>%
       column_spec(3, width="8em") %>%
       column_spec(4, width="20em") %>%
  footnote(general = "Job Losses in %Population(18-39 years old)")


mean_var_labels = c("sh_empl_mfg_age1839", 
                "sh_empl_mfg_age1839m",
                "sh_empl_mfg_age1839f",
              
                "gender_gap_emp_1839" ,
                "gender_gap_unemp_1839" ,
                "gender_gap_nonemp_1839" ,
                
                "gender_gap_inc1839p25" ,
                "gender_gap_inc1839p50" ,
                "gender_gap_inc1839p75" ,
                
                "sh_fem1839_marrexsep" ,
                "sh_fem1839_widdivsep" ,
                "sh_fem1839_single" 
              )

mean_var_english = c( "Total (%) 18-39",
                      "Male  (%) 18-39",
                      "Female(%) 18-39",
                      
                      "Employed(%) 18-39",
                      "Unemploy(%) 18-39",
                      "NILF    (%) 18-39",

                      "Income P25  18-39",
                      "Income Med  18-39",                      
                      "Income P75  18-39",                      
                      
                      "Married %    18-39",
                      "Widow,Sep,Div18-39",                      
                      "Single       18-39" )

sections = c("A. Manufacturing Employment",
             "B. Male-Female Employment Gap",
             "C. Male-Female Earnings Gap",
             "D. Women's Marital Status"
             )

df = data.frame(description = mean_var_english, 
                mean_outcomes = as.vector(t(mean_outcomes)),
                levels_1990 = as.vector(t(levels1990)) ,
                levels_2000 = as.vector(t(levels2000))
                )


df %>% 
  kable(caption = "Mean Outcomes and Levels 1990, 2000", digits=2 ,
        col.names = c("Description", "Mean outcome", "Level in 1990", "Level in 2000")
        ) %>%
  kable_styling(latex_options = c("HOLD_position"), full_width = T, 
                bootstrap_options = c("bordered", "striped")) %>%
  pack_rows(sections[1], 1,3 ) %>%
  pack_rows(sections[2], 4,6 ) %>%
  pack_rows(sections[3], 7,9 ) %>%
  pack_rows(sections[4], 10,12) %>%
  footnote( general = 
  "Means are weighted by Commuting Zone/Total Continental US population and time" )

  

weak_instrument1_stat = c(siv1$diagnostics[1,3],
                          siv3$diagnostics[1,3],
                          siv5$diagnostics[1,3],  
                          siv2$diagnostics[1,3],
                          siv4$diagnostics[1,3],
                          siv6$diagnostics[1,3])

weak_instrument2_stat = c(NA,NA,NA,  
                          siv2$diagnostics[2,3],
                          siv4$diagnostics[2,3],
                          siv6$diagnostics[2,3])

wu_hausman_stat = c( siv1$diagnostics[2,3],
                     siv3$diagnostics[2,3],
                     siv5$diagnostics[2,3],
                     siv2$diagnostics[3,3],
                     siv4$diagnostics[3,3],
                     siv6$diagnostics[3,3]
                     )

weak_instrument1_row = c("Weak Instr 1 (F)", format(round(weak_instrument1_stat,1)) )
weak_instrument2_row = c("Weak Instr 2 (F)", format(round(weak_instrument2_stat,1)) )
wu_hausman_row = c("Wu-Hausman stat:", format(round(wu_hausman_stat,1)))


stargazer(iv1, iv3, iv5, iv2, iv4, iv6, header=FALSE, 
          label="tab:mfg1", 
          title="Manufacturing employment as a share of population, age 18-39" ,
          dep.var.labels=c("M+F", "Males", "Females", "M+F", "Males", "Females"),
         
          se = list( siv1$coefficients[,2], 
                     siv3$coefficients[,2], 
                     siv5$coefficients[,2], 
                     siv2$coefficients[,2], 
                     siv4$coefficients[,2], 
                     siv6$coefficients[,2]), # Robust standard error
          keep = c( endogenous_mainshock, endogenous_gendershock ) ,
          covariate.labels = c("$\\beta_1 = \\Delta Import$", 
                               "$\\beta_2 = \\Delta Import \\times male$", 
                               "$\\beta_3 = \\Delta Import \\times female$"),
          omit.stat = c("ser", "rsq"),
          add.lines = list(weak_instrument1_row, weak_instrument2_row, wu_hausman_row),
          notes = "IV regression used robust SE from sandwich option.",
          digits = 2 ,
          table.placement = "H" ,
          object.names = TRUE ,
          model.numbers = FALSE
          )


wu_hausman_stat2 = c( siv7$diagnostics[2,3],
                     siv9$diagnostics[2,3],
                     siv11$diagnostics[2,3],
                     siv8$diagnostics[3,3],
                     siv10$diagnostics[3,3],
                     siv12$diagnostics[3,3]
                     )
wu_hausman_row2      = c("Wu-Hausman stat:", format(round(wu_hausman_stat2 ,1)))


stargazer(iv7, iv9, iv11, iv8, iv10, iv12, header=FALSE, 
          label="tab:employment", 
          title="Male-female differential by employment status, age 18-39" ,
          dep.var.labels=c("Emp", "Unemp", "NILF", "Emp", "Unemp", "NILF"),
          se = list( siv7$coefficients[,2], siv9$coefficients[,2], siv11$coefficients[,2],
                     siv8$coefficients[,2], siv10$coefficients[,2], siv12$coefficients[,2]
                     ), # Robust standard error
          keep = c( endogenous_mainshock, endogenous_gendershock ) ,
          covariate.labels = c("$\\beta_1 = \\Delta Import$", 
                               "$\\beta_2 = \\Delta Import \\times male$", 
                               "$\\beta_3 = \\Delta Import \\times female$"),
          omit.stat = c("ser", "rsq"),
          add.lines = list(wu_hausman_row2),
          notes = "IV regression used robust SE from sandwich option.",
          digits = 2 ,
          table.placement = "H" ,
          object.names = TRUE ,
          model.numbers = FALSE
          )


wu_hausman_stat3 = c(siv13$diagnostics[2,3],
                     siv15$diagnostics[2,3],
                     siv17$diagnostics[2,3],
                     siv14$diagnostics[3,3],
                     siv16$diagnostics[3,3],
                     siv18$diagnostics[3,3]
                     )

adjr2_stat = c(siv13$adj.r.squared, siv15$adj.r.squared, siv17$adj.r.squared,
               siv14$adj.r.squared, siv16$adj.r.squared, siv18$adj.r.squared)

wu_hausman_row3      = c("Wu-Hausman stat:", format(round(wu_hausman_stat3 ,1)))
adjr2_row = c("Adj $R^2$:", format(round(adjr2_stat,2)))

stargazer(iv13, iv15, iv17, iv14, iv16, iv18, header=FALSE, 
          label="tab:earningsgap", 
          title="Male-female differential in annual earnings, age 18-39" ,
          dep.var.labels=c("P25", "Median", "P75", "P25", "Median", "P75"),
          se = list( siv13$coefficients[,2], 
                     siv15$coefficients[,2], 
                     siv17$coefficients[,2],
                     siv14$coefficients[,2], 
                     siv16$coefficients[,2], 
                     siv18$coefficients[,2]
                     ), # Robust standard error
          keep = c( endogenous_mainshock, endogenous_gendershock ) ,
          covariate.labels = c("$\\beta_1 = \\Delta Import$", 
                               "$\\beta_2 = \\Delta Import \\times male$", 
                               "$\\beta_3 = \\Delta Import \\times female$"),
          omit.stat = c("ser", "rsq", "n", "adj.rsq"),
          add.lines = list( wu_hausman_row3, adjr2_row),
          notes = "IV regression used robust SE from sandwich option.",
          digits = 0,
          table.placement = "H" ,
          object.names = TRUE ,
          model.numbers = FALSE
          )


wu_hausman_stat4 = c(siv19$diagnostics[2,3],
                     siv21$diagnostics[2,3],
                     siv23$diagnostics[2,3],
                     siv20$diagnostics[3,3],
                     siv22$diagnostics[3,3],
                     siv24$diagnostics[3,3]
                     )

adjr2_stat = c(siv19$adj.r.squared, siv21$adj.r.squared, siv23$adj.r.squared,
               siv20$adj.r.squared, siv22$adj.r.squared, siv24$adj.r.squared)

wu_hausman_row4      = c("Wu-Hausman stat:", format(round(wu_hausman_stat3 ,1)))

adjr2_row = c("Adj $R^2$:", format(round(adjr2_stat,2)))

stargazer(iv19, iv21, iv23, iv20, iv22, iv24, header=FALSE, 
          label="tab:maritalstatus", 
          title="Women's marital status 1990-2014, age 18-39" ,
          dep.var.labels=c("Married", "WDS", "Single", "Married", "WDS", "Single"),
          se = list( siv19$coefficients[,2], 
                     siv21$coefficients[,2], 
                     siv23$coefficients[,2],
                     siv20$coefficients[,2], 
                     siv22$coefficients[,2], 
                     siv24$coefficients[,2]
                     ), # Robust standard error
          keep = c( endogenous_mainshock, endogenous_gendershock ) ,
          covariate.labels = c("$\\beta_1 = \\Delta Import$", 
                               "$\\beta_2 = \\Delta Import \\times male$", 
                               "$\\beta_3 = \\Delta Import \\times female$"),
          omit.stat = c("ser", "rsq", "n", "adj.rsq"),
          add.lines = list( wu_hausman_row4, adjr2_row),
          notes = c("IV regression used robust SE from sandwich option.", 
                    "WDS: Widowed, Divorced or Separated"),
          digits = 2,
          table.placement = "H" ,
          object.names = TRUE ,
          model.numbers = FALSE
          )
# 
# Build a list of all the summary models from text string name
# siv1, siv2, .etc.  siv24
# Then dump the summary output to file.
model_summary_list = lapply( paste("siv", 1:24, sep=""), get )

capture.output(model_summary_list, file="IVmodel_summary.txt")

CUNY School of Professional Studies: ngalex8@gmail.com. Submitted for Data 621 Instructor: Nasrin Khan↩