Resilience and Recovery

Author

Kristina Bishop

Abstract
The ability to recover from income shocks and return to the pre-shock income level reflects an individual’s resilience and external support. However, we know little about the speed of recovery from income shocks. This paper quantifies how fast households recover from income shocks and examines the household characteristics of those who recover more quickly. This is accomplished using an empirical model of income dynamics. The model allows for measurement error in self-reported income. The model estimates inform the construction of simulated income paths free of measurement error. The measurement-error-free income paths are then perturbed by introducing a 10% or 25% income shock to map out differential recovery paths across households and periods. Accounting for measurement error decreases the percentage of households who recover within the sample period by ten percent for a 25% shock. The results show differential recovery speeds by household head demographic category— White recover faster than Black, high-school educated recover faster than college-educated, and nonmetro recover faster than metro across the entire distribution of recovery times. The half-life recovery times are shorter and about 80% of households recover the income lost from the shock by the end of the sample period. I decompose the demographic analysis to test whether parameters, random errors, or observables drive group differences. Finally, I calculate the aggregate income loss and difference in the Gini coefficient due to an income shock. This helps to explain how slow recovery from an income shock widens the inequality gap.

JEL: D31, D15, J60, C23
Keywords: Income recovery, Measurement Error

1 Introduction

The COVID-19 pandemic was a huge income shock that affected households and presented problems in income recovery. Of the 64.9% of Americans who experienced income loss, 21.6% think they’ll return to their pre-pandemic financial state within three months, 28.2% by six months, 46.9% within a year, and 3.3% expect that they will never recover (Backman, 2021). Catastrophic as the pandemic has proven, households also experience more mundane types of income loss.

Today’s economy presents inevitable income shocks for households. Income shocks are any loss of income and may be difficult to offset. These income shocks may come from a variety of sources: layoffs, changes in wage or salary, loss of hours worked, inflation, recessions, or the global pandemic. While the sources of an income shock differ, each household shows their resiliency in how well they overcome the income shock.

Historically, Americans have valued the ethic of resiliency, even during the time of the Great Depression. In the movie musical Swing Time (Stevens, 1936), Ginger Rogers and Fred Astaire sang these encouraging words to lift audiences during the Depression period, “Pick yourself up, dust yourself off, and start all over again.” This song, “Pick Yourself Up,” emphasized resiliency–the ability to overcome adversity. While most try to get back up, it takes some longer than others. Some may never be able to stand again.

The United States Agency for International Development (USAID) defines resilience as “the ability of people, households, communities, countries, and systems to mitigate, adapt to, and recover from shocks and stresses in a manner that reduces chronic vulnerability and facilitates inclusive growth” (Dunford & Gottlieb, 2016). I study resilience at the household level to understand the extent to which households can recover on their own from an income shock. The ability to recover from an income shock is the quantitative measure of a household’s resilience: how well does the household “bounce back” or recover its lost income?

It is natural to wonder who recovers from income shocks and how long it takes a household to recover from them. Yet, very little work has been done to predict recovery times after a shock. Another uncertainty is how recovery times differ across household types. By knowing which types of households are at risk for longer recovery times, policymakers can apply this model to other settings to better design policy to aid income shock recovery. Understanding the speed and length of income shock recovery also has important implications for the income inequality gap and intergenerational income inequality.

The specific research questions of this study are two-fold. First, I seek to quantify how fast households recover from income shocks - if, at all. Second, I show the household characteristics associated with quicker recovery.

To explore the research questions, I estimate a model of income dynamics, allowing for measurement error and demographic heterogeneity. Accounting for measurement error is critical when accounting for income dynamics. I follow Lee, Ridder, & Strauss (2017), who developed an estimation procedure to simulate error-free consumption paths. I apply their model of consumption to income data from the United States. Similar to consumption, income data commonly contains non-classical measurement errors (Bound, Brown, & Mathiowetz, 2001; Glewwe, 2007). Even administrative data is not innocuous (Li, Millimet, & Roychowdhury, 2019). The measurement error is non-classical because it is serially correlated and mean-reverting (Bound & Krueger, 2015; Cristia & Schwabish, 2007). These features are explicitly incorporated. In addition to measurement error, I include demographic diversity in the model. I repeat the estimation on various sub-samples of the population to capture these differences. The parameters and error distributions are allowed to vary across demographic groups.

After the model estimation, I identify the determinants of true income, the variances of the measurement errors, and the variances of the unobserved determinants of actual income. Then, I use the estimated determinants of actual income to simulate income paths free of measurement error. I introduce a shock into the initial error term and trace the recovery path. I measure how long it will take for various household types to return to their previous income level before the shock. Finally, I split the sample and re-estimate the empirical model on different sub-samples. These sub-samples are based on observed characteristics of households to map how different demographic groups vary in their recovery paths. I then decompose the differences in recovery paths across demographic groups to isolate the causes of differential recovery times.

The main contribution of this paper is to provide a new method to estimate income shock recovery speeds. Prior literature has examined recovery after specific, realized shocks such as those due to natural disasters, job displacement, health shocks, or the economic environment. The examination of these types of actual shocks has the advantage that they are plausibly exogenous. A disadvantage is that such shocks only occur in unique circumstances. The external validity of these results is unknown. I provide a complementary approach. My approach has the advantage of generalizability. The analysis is performed using nationally representative samples. In addition, I estimate the recovery speed, rather than only whether or not recovery occurs. The approach to learning about recovery speeds may help policymakers in designing aid programs. While this method has definite advantages relative to the existing literature, it comes at the cost of assuming that the shock itself does not change the income dynamics. I assume that the model of income dynamics before the (simulated) shock still holds afterwards.

This study uses data from the Survey of Income and Program Participation (SIPP). The SIPP records monthly income for about four years. I compare the results of the model and simulation across data releases of the SIPP to account for how the economic environment may impact an individual’s ability to recover from shocks.

The analysis yields several important findings. First, I find that differences in recovery rates differ when accounting for measurement errors. The distribution of recovery times is sensitive to the modeling of the measurement error. Second, I find that White household heads recover faster than Black, married recover faster than single, high school educated recover faster than college educated, and those who live in metro areas recover slower than non-metro areas. Next, results show differential effects on the proportion of households who recover by the end of the sample period or in 14 four-month periods, 56 months. When I decompose the heterogeneous results, I find that group differences are likely due to differences in the distribution of the random errors rather than parameters, the persistence of income, observed characteristics, or initial income values. Finally, using the economic environment from the 2008 data compared to the 2004 data does not seem to have a large effect on the distribution of recovery speeds.

2 Data

The data come from the Survey of Income and Program Participation (SIPP). The SIPP is a multi-stage stratified sample of the US population. Each cohort is interviewed four times a year about the previous four months for 2.5 to 4 years. The Census Bureau randomly divides each cohort into four rotation groups and interviews each rotation group in a separate month. A single interview of each of the four rotation groups comprises a wave. Due to the staggered timing of the survey administration, the 16 waves correspond to different calendar periods across rotation groups. The 2008 cohort has data between May 2008 - November 2013. The 2004 cohort has data from February 2004 - January 2008.

Since respondents are interviewed every four months and asked about the prior four months, the data collection for the SIPP generates seam bias, where a disproportionate number of transitions in the values of a variable for each individual occur between the final month of a wave and the first month of a subsequent wave. As a result of seam bias, the survey responses are most accurate in the interview month.1 To overcome the problem of seam bias, I aggregate the data every four months for each household, which gives 16 possible waves (or periods) for each individual in the data set. Since individuals are likely to report a new monthly income every four months from the previous four months, an additional source of measurement error is present in household income data beyond the measurement error commonly present in income data.

2.1 Sample Construction

For the main analysis, I use a sample of household heads in the SIPP, ages 25 to 55 in the initial period of the sample, who are not currently enrolled full-time in school nor on active military duty. I exclude household heads in agriculture2 or self-employment.3

Table \(\ref{samp-sel}\) gives a summary of the sample construction for the 2008 SIPP data release. The table starts from the universe of all individuals sampled in all time periods. It lists both the number of households and total observations which are dropped and remain at each step.4

2.2 Household Income Variables

I apply the empirical model of income dynamics to the total real, logged household income. The SIPP measures total household income as the sum of various income sources: earnings, investment or property income, means-based transfer income, social insurance payments, and other income. The SIPP provides the monthly total household income after they top code the income values. The top-coded income is treated as the observed income and top coding is an additional source of measurement error, not accounted for in this paper. I adjust the household income measures with CPI month-year data over the same period to generate real income measures.

2.3 Other Variables

The number of persons in each household varies over time. In the analysis, I use the demographic subgroups of race, education, marital and metro status of the first wave of the household head. I do not exclude households from the analysis who experience changes in these variables throughout the sample.

Table 1 shows the descriptive statistics of total real logged household income and household size by the wave of the 2008 data.

Table 1: Summary Statistics of Total Real Log household Income and Household Size, in Mean and standard Deviation for the 2008 Data
Total Real Log Household Income
Household Size
wave Mean SD Mean SD
1 9.81 0.94 3.14 1.58
2 9.84 0.94 3.15 1.58
3 9.83 0.90 3.14 1.56
4 9.82 0.93 3.14 1.56
5 9.83 0.88 3.14 1.57
6 9.82 0.91 3.16 1.58
7 9.81 0.92 3.15 1.59
8 9.82 0.91 3.17 1.60
9 9.80 0.94 3.16 1.58
10 9.82 0.88 3.16 1.59
11 9.82 0.91 3.16 1.58
12 9.80 0.96 3.16 1.58
13 9.81 0.94 3.15 1.58
14 9.80 0.95 3.15 1.58
15 9.79 1.00 3.16 1.59
16 9.78 1.03 3.15 1.59

Table 2 shows the summary of variables across the 2008 and 2004 waves.

Table 2: Household Head Summary Statistics: 2008 and 2004 Data
2004 2008
Mean Income 35.41 35.95
Mean Age 42.99 43.74
Mean Household Size 3.16 3.15
Mean Number of Children 1.06 1.03
Mean Household Members under 18 1.14 1.11
Mean Household Members over 65 0.03 0.03
Mean Living in Metro Area 0.78 0.8
Mean of Married Household Members 0.64 0.63
Mean College-educated Household Members 0.48 0.54
Mean of White Household Members 0.88 0.88
N 31860 40624

3 Empirical Methodology

3.1 Model

In this section, I introduce the models for the true (unobserved) and the reported (observed) income. I estimate the model using the reported income. Then I use the results to simulate error-free income.

The model is based on a standard model from poverty dynamics to capture persistence in true income (Jalan & Ravallion, 2002). This is a reduced-form dynamic panel income data model. I follow the methods of Lee et al. (2017), who introduce measurement error into an AR(1) dynamic process. They estimate a model of consumption, whereas my model is applied to income data.

The dynamic model for true income is given by:

\[\begin{align} y^{*}_{it} &= \alpha_{i} + \gamma y^{*}_{it-1} + \beta X_{it} + \delta_{it} + \epsilon_{it} \label{eq:true} \quad i = 1, \ldots, N; t = 2, \ldots, T \\ \delta_{it} &= \delta_{it-1} + u_{it} + d_{t} \nonumber \end{align}\]

\(y_{it}^{*}\) is a measure of true, unobserved household \(i\) income at wave \(t\). \(X_{it}\) is household size. \(\alpha_{i}\) is a time-invariant unobserved household-specific intercept. \(\delta_{it}\) captures time effects with a household-specific stochastic trend. The time indicators in the model mitigate contemporaneous cross-sectional correlation (Roodman, 2009). A shock \(u_{it}\) has a permanent effect on income converging to the ratio of shock and \(1-\gamma\). \(d_{t}\) is the average time effect across households. \(\epsilon_{it}\) is a random shock.

The relationship between true income and observed income is given by:

\[\begin{align} \label{eq:me} y_{it} &= y^{*}_{it} + \eta_{it} \quad {i = 1, \ldots, N; t= 1, \ldots, T} \\ \eta_{it} &= e_{i} + v_{it} \nonumber, \end{align}\]

where \(\eta_{it}\) is the measurement error, which is decomposed into a time-invariant component, \(e_{i}\), and a time-varying component, \(v_{it}\). The inclusion of \(e_{i}\) is due to evidence of serial correlation in measurement error in income, which makes the measurement error non-classical.

Substitution of \(\ref{eq:me}\) into \(\ref{eq:true}\) yields the following equation for observed income.

\[\begin{align} \label{eq:yit} y_{it} &= \alpha_{i} + \gamma y_{it-1} + \beta X_{it} + \delta_{it} + \tau_{it} \quad i = 1, \ldots, N; t = 2, \ldots, T \\ \tau_{it} &= (1 - \gamma )e_{i} + v_{it} - \gamma v_{it-1} + \epsilon_{it}\nonumber \end{align}\]

3.2 Estimation

I first-difference the model, which eliminates the time invariant components, \(\alpha_{i}\) and \(e_{i}\). First-differencing the model yields the main estimating equation:

\[\begin{align} \Delta y_{it} &= \gamma \Delta y_{it-1} + \beta \Delta X_{it} + d_{t} + \Delta \tau_{it} \label{eq:delta_yit} \quad {i = 1, \ldots, N; t = 3, \ldots, T} \\ \Delta \tau_{it} &= u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it}, \nonumber \end{align}\]

I use a two-step generalized method of moments (GMM) estimation of the model in first-differences (Arellano & Bond, 1991). This yields parameter estimates: \(\gamma\), \(\beta\), and \(d_{t}\). I estimate the model using moment conditions that are derived using the following set of assumptions:

I assume that \(\epsilon_{it}, u_{it}, \text{ and } v_{it}\) are white noise - mean zero, homoskedastic, serially uncorrelated, and independent of all other terms. Thus, \(v_{it}\) is classical measurement error. The assumptions on \(\epsilon_{it}\), \(u_{it}\), and \(v_{it}\) are:

\[\begin{align*} E[\epsilon_{it}|v_{it},\ldots,\epsilon_{i,t-1},\ldots,u_{it},\ldots,X_{it},\alpha_{i},e_{i}] &= 0\\ E[u_{it}|v_{it},\ldots,\epsilon_{it},\ldots,u_{i,t-1},\ldots,X_{it},\alpha_{i},e_{i}] &= 0 \\ E[v_{it}|v_{i,t-1},\ldots,\epsilon_{it},\ldots,u_{it},\ldots,X_{it},\alpha_{i},e_{i}] &= 0 \\ \end{align*}\] \[\begin{align*} Var[\epsilon_{it}|v_{it},\ldots,\epsilon_{i,t-1},\ldots,u_{it},\ldots,X_{it},\alpha_{i},e_{i}] &= \sigma^{2}_{\epsilon} \\ Var[u_{it}|v_{it},\ldots,\epsilon_{it},\ldots,u_{i,t-1},\ldots,X_{it},\alpha_{i},e_{i}] &= \sigma^{2}_{u} \\ Var[v_{it}|v_{i,t-1},\ldots,\epsilon_{it},\ldots,u_{it},\ldots,X_{it},\alpha_{i},e_{i}] &= \sigma^{2}_{v}\\ \end{align*}\]

\(e_{i}\) can be correlated with \(y^{*}_{it}\) and \(X_{it}\). I assume that the control variable \(X_{it}\) is strictly exogenous: \(E[X_{it}\epsilon_{is}] = 0 \quad \forall t,s\), conditional on the household effect. While not used for estimation, I assume that the errors have a joint normal distribution for the simulation.

Instruments are used for the dynamic panel data. The exogenous variables – household size and time indicators – serve as instruments for themselves. Due to the MA(1) error in the residual, lagged income in \(t-3\) and before are valid instruments. 5 I use the past values of lagged income, \(y_{it-3}\) to \(y_{t-8}\) as the instruments for the lagged dependent variable using the forward orthogonal deviations transformation. Instead of subtracting the previous observation from the contemporaneous one as in first-differencing, the orthogonal deviations subtracts the average of all future available observations of a variable. The transformation is computable for all observations except for the last for each household. The forward orthogonal deviations transformation minimizes data loss (Arellano & Bover, 1995).6 The Windmeijer-corrected cluster-robust errors are used to correct for potential bias when using two-step estimation (Roodman, 2009; Windmeijer, 2005). Under the stated assumptions, the GMM estimators are consistent and asymptotically normal (Holtz-Eakin, Newey, & Rosen, 1988).

After estimating the slope coefficients in the model, the estimated residuals are used to estimate the the error component variances. These estimates are shown in Table 3. The bootstrap standard errors account for the sampling uncertainty of these variances. I use the standard errors when simulating recovery times to ensure that the simulations account for sampling uncertainty. These variances are necessary for the simulation of measurement-error-free income. \(\sigma^{2}_{\epsilon}\), \(\sigma^{2}_{u}\), and \(\sigma^{2}_{v}\) are identified from the GMM residuals but \(\sigma^{2}_{e}\) remains unidentified. The identification comes from three moment conditions involving the product of \(\Delta \tau_{it}\) with itself and its lagged components.

\[\begin{align} E[\Delta \tau_{it} \Delta \tau_{it-2}] &= \gamma \sigma^{2}_{v} \label{eq:mc1} \\ E[\Delta \tau_{it} \Delta \tau_{it-1}] &= -\sigma^{2}_{v}(1 + 2\gamma + \gamma^{2}) - \sigma^{2}_{\epsilon} \label{eq:sig2eps} \\ E[(\Delta \tau_{it})^{2}] &= \sigma^{2}_{u}+ 2\sigma^{2}_{v}(1 + \gamma + \gamma^{2}) +2\sigma^{2}_{\epsilon} \label{eq:sig2u} \end{align}\]
Table 3: Variance Estimates from the Income Dynamics Model on 2008 Data
$\sigma^{2}_{\epsilon}$ $\sigma^{2}_{v}$ $\sigma^{2}_{u}$
Mean 0.04 0.07 0.02
Standard Deviation 0.04 0.03 0.01
Note: Bootstrapped standard errors (1000 repetitions) clustered at the household wave.

3.3 Simulations

In this section, I describe how the model estimates are used for simulation of household income paths to observe the effect of an income shock. The model of income dynamics provides estimates for the income process at the population level, which informs the household income paths used to track resiliency. The household income paths are simulated according to the distribution of the parameter estimates and random errors. However, the first values of the income paths are not known. A process of using a linear projection to obtain the initial values for the income process is described in the proceeding section. I will discuss the identification, assumptions, and bounds on the projection errors. The upper and lower bounds of the projection errors when substituted provide bounds for the measurement error correction on the income path. Then, after simulating each household’s income path, I simulate a shock into their respective income path and calculate the recovery times for each household across replications.

After estimating the parameters and simulating random errors from their respective distributions, first-differencing the model in equation \(\ref{eq:true}\) provides the change in true income:

\[\begin{equation} \Delta y_{it}^{*} = \gamma \Delta y^{*}_{it-1} + \beta \Delta X_{it} + d_{t} + u_{it} + \Delta \epsilon_{it}, \quad t = 3, \ldots, T \end{equation}\]

The first-differenced income, \(\Delta y^{*}_{it}\) uses draws of the GMM parameter estimates from their distributions using their standard errors and the simulated error distributions. The parameters are drawn to account for the uncertainty of the sampling estimates. The model gives estimates for the change in true income \(\Delta y^{*}_{it}\) but I am interested in the level of income \(y_{it}^{*}\). Since I will simulate a shock to \(y^{*}_{it}\) and determine when a household returns to the previous level of income, then I need to find values for the income path in levels.

The true income path is simulated from the relation of first-differences: \[\begin{equation*} y^{*}_{it} = y^{*}_{it-1} + \Delta y^{*}_{it}, \quad t \geq 3 \end{equation*}\]

The model provides estimates for \(\Delta y^{*}_{it}\) and with a previous value \(y^{*}_{it-1}\), I find the current household income \(y^{*}_{it}\). However, the preceding equation is only valid for \(t \geq 3\), so to start off the income process, the value of \(y^{*}_{i2}\) is required but not readily available without further distributional assumptions. In addition, the computation of \(\Delta y^{*}_{i3}\) from the model necessitates the value of \(\Delta y^{*}_{i2}\). A linear projection provides estimates for the initial values.

These initial values are obtained through a linear projection. A linear projection for the initial values may be found by first-differencing equation \(\ref{eq:true}\) and recursively substituting for \(\Delta y^{*}_{it-1}\). It follows that \(\Delta y^{*}_{i2}\) is a linear function of \(X_{i2}, X_{i1}, X_{i,-1}, \ldots\) and \(y^{*}_{i2}\) is a linear function of \(X_{i2}, X_{i1}, \ldots,\) and \(\alpha_{i}\). The household effect \(\alpha_{i}\) may be correlated with \(X_{it}\).

The linear projections of these relations on \(X_{i1}, X_{i2}\):

\[\begin{align*} \Delta y_{i2}^{*} &= \delta_{0} + \beta'_{0} X_{i1} + \beta'_{1} X_{i2} + \zeta_{i1} \\ y_{i2}^{*} &= \delta_{1} + \beta'_{2} X_{i1} + \beta'_{3} X_{i2} + \zeta_{i2}, \end{align*}\]

These linear projections use the initial, observed data values \(X_{i1}\) and \(X_{i2}\) with estimated parameters \(\delta_{0}, \delta_{1}, \beta_{0}, \beta_{1}, \beta_{3} \text{ and } \beta_{3}\) and projection errors \(\zeta_{i1}\) and \(\zeta_{i2}\) to obtain estimates for the initial true income values \(y^{*}_{i2}\) and \(\Delta y^{*}_{i2}\). The only remaining unknown are the projection errors \(\zeta_{1}\) and \(\zeta_{2}\) for which distributional assumptions must be made.

The distributional assumptions for \(\zeta_{1},\zeta_{2}\) are the normal distribution, with variances \(\sigma^{2}_{1}, \sigma_{2}^{2}\) and covariance \(\sigma_{12}\). \(\zeta_{1}, \zeta_{2}\) are assumed to be homoskedastic and uncorrelated with \(v\). Estimates for \(\zeta_{1}\) and \(\zeta_{2}\) are given through the linear projection estimation regression. The linear projection estimation regression is derived by substituting observed income into the equation for the linear projections. The following estimating equations are derived as the linear projections for the projection error variances:

\[\begin{align*} \Delta y_{i2} &= \delta_{0} + \beta'_{0} X_{i1} + \beta'_{1} X_{i2} + \Delta v_{i2} + \zeta_{i1} \\ y_{i2} &= \delta_{1} + \beta'_{2} X_{i1} + \beta'_{3} X_{i2} + e_{i} + v_{i2} + \zeta_{i2} \end{align*}\]

I estimate the coefficients of the projections and variances of the projection errors by linear regression. Denote the errors in the estimating equations as \(\psi_{1}, \psi_{2}\). Thus, \(\psi_{1} = \Delta v_{i2} + \zeta_{i1}\) and \(\psi_{2} = e_{i} + v_{i2} + \zeta_{i2}\). The projection errors are a function of the residual variances and the measurement errors.

From the linear projection on \(X_{i1}, X_{i2}\), I identify the variances for the projection errors. The projection errors are estimated from the GMM moment condition residuals and when combined with the linear projection, they provide initial values for \(y^{*}_{i2}\) and \(\Delta y^{*}_{i2}\).

\[\begin{align*} \sigma^{2}_{1} &= var(\psi_{1}) - 2 \sigma^2_{v} \\ \sigma_{12} &= cov(\psi_{1},\psi_{2}) - \sigma^{2}_{v} \\ \sigma^{2}_{2} &= var(\psi_{2}) - \sigma^{2}_{e} - \sigma^{2}_{v} \end{align*}\]

Next, I will discuss the identification of each of the linear projection covariance and variances. \(\sigma^{2}_{1}\) and \(\sigma_{12}\) are identified from the residuals of the GMM moment condition for \(\sigma^{2}_{v}\), \(\ref{eq:mc1}\), and using the linear projection estimate of \(var(\psi_{1})\). Similarly, \(\sigma_{12}\) is identified from the linear projection estimate of \(cov(\psi_{1},\psi_{2})\) and \(\sigma^{2}_{v}\). \(\sigma^{2}_{2}\) is not identified, since \(\sigma^{2}_{e}\) is not identified from the moment conditions of the residuals from the GMM estimation. While \(\sigma^{2}_{2}\) is not identified, there can be found an upper and lower bound as a range of estimates for \(\sigma^{2}_{2}.\) The upper bound on the variance of \(\sigma^{2}_{2}\) is derived when \(\sigma^{2}_{e}\geq 0\). When \(\sigma^{2}_{e} = 0\), \(\sigma^{2}_{2}\) is at its maximum value. The upper bound on \(\sigma^{2}_{2}\) is: \[\sigma^{2}_{2} \leq Var(\psi_{2}) - \sigma^{2}_{v}.\]

Let \(\Sigma\) be the variance matrix of the random errors in simulation.7 Since \(\Sigma\) must be positive semi-definite, this provides a lower bound for \(\sigma^{2}_{2}\). When the smallest eigenvalue is zero, the joint normal distribution of the errors in simulation is singular. When \(\sigma^{2}_{e}\) is at its upper bound, \(\sigma^{2}_{2}\) is at its minimum.

Since the initial observations are for period 2, I allow for correlation between \(\zeta_{i1}, \zeta_{i2}\) and \(\Delta \epsilon_{i3}\) and estimate the associated covariances. These simulated errors in \(\Sigma\) have the following assumptions:

\[\begin{align*} E[\zeta_{1} \Delta \epsilon_{t}] = E[\zeta_{2} \Delta \epsilon_{t} ] = 0, \quad t\geq 4 \\ E[\zeta_{1} u_{t}] = E[\zeta_{2} u_{t} ] = 0, \quad t\geq 3\\ \end{align*}\]

There are two moment conditions to exactly identify \(E[\zeta_{1} \Delta \epsilon_{3}]\) and \(E[\zeta_{2} \Delta \epsilon_{3}]\).

\[\begin{align*} E[\zeta_{1} \Delta \epsilon_{3}] - Cov(\psi_{1},\Delta \tau_{3}) - (1+ 2\gamma) \sigma^{2}_{v} = 0 \\ E[\zeta_{2} \Delta \epsilon_{3}] - Cov(\psi_{2},\Delta \tau_{3}) - (1+ \gamma) \sigma^{2}_{v} = 0 \\ \end{align*}\]

Finally, with bounds on the projection error variances, the projection errors can be simulated from their distributions. Combined with the linear projection equations, the initial income path values \(y^{*}_{i2}\) and \(\Delta y^{*}_{i2}\) are found. From the first-differencing relation, \(\Delta y_{i2}^{*} = y^{*}_{i2} - y^{*}_{i1}\), I can back out also the \(y^{*}_{i1}\) value. Then, the entire income path for each household can be simulated for many replications.

After deriving the income path simulations, I discuss the introduction of a shock into the income path. Simulating an income path with and without the shock allows me to compare how the shock affects the income trajectory. I simulate two income paths: \(y^{*}_{it}\), the corrected income without the shock and \(y^{**}_{it}\) the corrected income with the shock. For each household \(i\), the income path in the presence of a shock \(y^{**}_{it}\) can be compared to the counterfactual of the income path in the absence of a shock \(y_{it}^{*}\). Both \(y^{*}_{it}\) and \(y^{**}_{it}\) use the same parameters, observables, and random errors. By construction, \(y^{*}_{i1} = y^{**}_{i1}\). They only differ in \(t=2\), when the shock is introduced. \(y^{**}_{it}\) receives the equivalent of the shock to the income in levels.

The shock size is a parameter in the method that can be specified. I choose two shock sizes to observe the choice of a shock size. The income shock represents the equivalent of either 10% or 25% decrease of the total real household income. A 10% shock is comparable to the large shock from Jalan & Ravallion (2002), which they classify as a 10% or higher fall in household expenditure. The 25% shock, larger than 10%, captures the average of larger shocks that may potentially occur within the data.

I introduce a shock by assuming that \(y^{*}_{i2}\) falls by \(\pi_{i2}\). I consider two ways of choosing values for \(\pi_{i2}\). First, I reduce each household’s income by a fixed percentage, either 10% or 25%. In this case, \(y^{**}_{i2} = (1-\pi_{i2})y^{*}_{i2}\), where \(\pi_{i2} = 0.1 \text{ or } 0.25\). Second, I use a common value of \(\pi_{i2}\) for all households in the same demographic group, \(y^{**}_{i2} = (1-\pi_{i2})y^{*}_{i2}.\) I define \(\pi_{i2} = \pi E[y_{hl}],\) where \(E[y_{h}]\) is the expected value of the income for all households belonging to demographic subgroup \(h\). In this manner, each demographic subgroup receives their own, respective shock size, proportional to the group mean income level. For example, the White household heads have a shock based off their mean, while the Black household heads receive a shock based off their mean.Though both results are shown for the different types of measurement error and demographic subgroups, the first and main results will focus on the household specific shock.

Further, I set \(y^{*}_{i1} = y^{*}_{i2}\) to compare the \(y^{**}_{it}\) against \(y^{*}_{it}\) and then calculate the aggregate income lost from the shock. For example, assume a household starts with an initial income level of \(100\) in \(t=1\). I set \(y^{*}_{i1} = y^{*}_{i2} = 100\) for this household. Then a 10% shock would decrease the initial income level by 10, so \(y^{**}_{i2} = 90\). After \(t=2\), the income path is calculated from recursive income values, parameter estimates, and random errors. This illustrative example is shown in Table 4.

Table 4: Example on how a shock is introduced
Corrected income without shock Corrected income with shock
t=1 100 100
t=2 100 90
t=3 ... ...

Once the income paths with and without the shock have been generated and the income shock introduced, I calculate the recovery times on the income with the shock. I define the recovery period as the first period \(t_{i}\) such that \(y^{**}_{i}(t_{i}) \geq y^{*}_{i1}\). This is the first period where the household \(i\)’s income in the presence of a shock is at least as large as the income before the shock. The recovery time is \(t_{i} - 2\), since the shock occurs in period 2 and the recovery income is referenced to the income in period 1. This implies that within the SIPP 2008 data wave with 16 periods, possible recovery times are from 1 to 14. Alternatively, the half-life recovery is the first period \(t_{i}\) such that \(y^{**}_{i}(t_{i}) \geq y^{*}_{i1} - 0.5*\text{shock}\) with a recovery time of \(t_{i}-2\). Another measure of recovery is the percentage of income recovered by comparing the final present discounted value income to the initial income. For each household replication wave, I compute the income loss by summing the difference of present discounted value income without the shock from the present discounted value income with the shock. Then, I sum these across waves for each household and across all households for the aggregate income loss from the shock.

It is possible that there does not exist any \(t_{i}\) for which \(y^{**}_{i}(t_{i}) \geq y^{*}_{i1}\). Or it could be that \(y^{**}_{i}(t_{i}) \leq y^{*}_{i1} \forall t\). In that case, the household is said not to have recovered. I do not extrapolate out of the sample to look for a recovery time, because the model estimation is only based on the periods for which sample data is available. If I do not see recovery within the sample time frame, then it is impossible to know how long recovery would take.

I calculate the recovery speeds for various demographic subgroups by splitting the sample, repeating the estimation, and simulating the income process for each group of demographic variables. Splitting the sample allows the coefficients to vary and allows for group-wise heteroskedasticity in the error components.

In summary, the approach outlines as follows:

  1. Use either the main sample or a demographic subgroup

  2. Two-step GMM Estimation for \(\Delta y_{it} = \gamma \Delta y_{it-1} + \beta \Delta X_{it} + d_{t} + \Delta \tau_{it}\) with lagged \(y_{it-3}\) through \(y_{it-8}\) as instruments using the Windmeijer’s finite-sample correction for the two-step covariance matrix. It also uses the forward orthogonal deviations transformation and does not use the level equation for instruments.

    1. Compute \(\sigma^{2}_{\epsilon}\), \(\sigma^{2}_{u}\), and \(\sigma^{2}_{v}\) from the parameters \(\gamma, \beta,\) and \(d_{t}\) and the residuals \(\Delta \tau_{it}\).
  3. Linear regression of \(\Delta y_{i2} = \delta + \beta_{0} X_{i1} + \beta_{1} X_{i2}\) and \(y_{it} = \delta_{1} + \beta_{2} X_{i1} + \beta_{3} X_{i2}\) both for \(t = 2\).

    1. Compute \(\sigma^2_{1}\) and \(\sigma_{12}\)

    2. Derive bounds on \(\sigma^{2}_{2}\)

  4. For multiple draws of the parameters and the random errors, draw measurement-error-free values to simulate the income process:

    1. Draw \(\zeta_1, \zeta_2 \sim MVN(0,\Sigma_{\zeta})\) where \(\Sigma_{\zeta} = \begin{pmatrix} \sigma^{2}_{1} & \\ \sigma_{12} & \sigma^{2}_{2} \end{pmatrix}\)

    2. Generate \(y^{*}_{i2} = \delta_{1} + \beta_{2} X_{it-1} + \beta_{3} X_{it} + \zeta_{1}\) and \(\Delta y^{*}_{i2} = \delta_{0} + \beta_{0} X_{it-1} + \beta_{1} X_{it} + \zeta_{2}\)

    3. Back substitution for first values: \(y^{*}_{i1} = y^{*}_{i2} - \Delta y^{*}_{i2}\)

    4. Draw \(\beta_{n}, \gamma_{n}\) and \(d_{tn}\) for \(n = 1, \ldots, N\). from their respective distributions, estimated from GMM to provide heterogeneity in the results.

    5. Draw \(u_{itn} \sim N(0,\sigma^{2}_{u})\) and \(\epsilon_{itn} \sim N(0,\sigma^{2}_{\epsilon})\)

    6. Recursively substitute \(\Delta y^{*}_{it} = \gamma \Delta y^{*}_{it-1} + \beta \Delta X_{it} + u_{it} + \Delta \epsilon_{it}\) if \(t>2\) with parameter values and random errors drawn from their distributions

    7. Substitute \(y^{*}_{it} = y^{*}_{it-1} + \Delta y^{*}_{it}\) for \(t>2\)

  5. Simulate the effect of a shock

    1. Reduce income by 10% or 25% of the household’s income or the sample mean in \(t=2\)
  6. Recovery times

    1. Find first \(t\) such that \(y^{**}_{it} > y^{**}_{1}\). The recovery time is: \(t-2\).

    2. Calculate the half-life as the first period \(t\) in which \(y^{**}_{it} > y^{**}_{i1}-0.5*\text{shock}\). The half-life recovery is \(t-2\).

Notes: all GMM and estimated sample means are weighted by person weights found in the SIPP.

4 Results

4.1 Full Sample

Before turning to the estimates, I perform several model specification tests to assess fit. The first-differenced model is overidentified using income lagged from \(t-3\) and earlier. I use the third through eigth lagged values of income. There are 78 moment conditions.8 The Hansen J-statistic is 70.5 with 62 degrees of freedom, since there a total of 16 regression coefficients, including time indicators. The p-value of \(0.215\) implies that the overidentifying restrictions are not rejected.

I check for instrument relevance by estimating the first-stage regressions period by period. The F-statistics and p-values for the instruments are for each period: 4, 29.74 (0.000); 5, 1.68 (0.19); 6, 8.28 (0.00); 7, 10.76 (0.00); 8, 15.61 (0.00); 9, 9.79 (0.00); 10, 8.71 (0.00); 11, 22.34 (0.00); 12, 7.83 (0.00); 11.52 (0.00); 13, 11.52 (0.00); 14, 1.98 (0.05); 15, 2.29 (0.02); and 16, 11.34 (0.00). Aside from the time effects, the regression coefficients are time-invariant. The relevant measure of instrument strength is the largest F-statistic (Lee et al., 2017; Stock, Wright, & Yogo, 2002). The largest F-statistic is above 10, suggesting that instrument weakness and associated small-sample bias in the IV estimates is not a concern.

Next, I test for the error dynamics in the model. If the equation and measurement errors are uncorrelated over time, then \(\Delta \tau_{it}\) is correlated with \(\Delta \tau_{it-1}\) and \(\Delta \tau_{it-2}\) but not with \(\Delta \tau_{it-3}\). This is equivalent to testing whether the autocovariances are zero (MaCurdy, 1982; Meghir & Pistaferri, 2002). I use the Arellano and Bond test for autocorrelation (Roodman, 2009). The Arellano and Bond z-test statistics of orders 1 and 2 are -7.14 with p-value < \(0.000\) and 2.77 with p-value \(0.006\), which are both significant. The autocorrelation tests for orders three and four have z-test statistics of \(0.10\) (p-value: \(0.917\)) and z=0.46 (p-value: \(0.647\)), both insignificant. Therefore, the hypothesis of no serial correlation cannot be rejected. I find that the AR(3) error test is not statistically different than zero.

Finally, I test whether the variances of the errors \(\sigma^{2}_{\epsilon}\), \(\sigma^{2}_{u}\) and \(\sigma^{2}_{v}\) are zero. The respective statistics are 33.28, 55.24, and 82.61, where the standard errors are calculated from bootstrapping. The one-sided test provides each with near-zero p-values, which confirms the error variances are significantly different than zero.

I conclude from the model specification tests that the household income dynamics may be accurately described by the autoregressive first-order process with a stochastic trend.

The model is estimated with observations on the 2008 data for sample waves \(3\) - \(16\). The estimates can be found in Table \(\ref{main-est}\) of Appendix Section 10. The coefficients are estimated: \(\gamma\) is 0.61 and \(\beta\) is 0.08, both significant at the \(0.01\) level.

The group-wise coefficients can be found in Table \(\ref{gmm8}\). When splitting the sample, the coefficients vary for \(\gamma\), \(\beta\), and \(d_{t}\). By the race category, Black household heads have a greater persistence of income \(\gamma=0.59\) than Whites, \(\gamma=0.54\). High school educated household heads show larger persistence of income, \(\gamma=0.61\) than college-educated (\(\gamma=0.50\)). The persistence of income on marital status are similar within \(0.01\). The coefficient for the metro group is the largest of all the subgroups at \(\gamma=0.63\), while the non metro subgroup has the only non significant coefficient on lagged income. Each subgroup shows positive and significant at the \(0.01\) level effects for the household size.

Now, I turn to the simulation results. Table 5 shows the variance and covariances estimates of the initial conditions with bounds for \(\sigma^{2}_{1}\). The lower bound for \(\sigma^{2}_{1}\) makes \(\Sigma\) just positive semi-definite. The estimated eigenvector corresponding to the lower bound is (0.22, 0.169, 0.069, 0.001)\(^{T}\). The table shows weak correlation between the differenced equation error \(\Delta \epsilon_{3}\) and the projection errors \(\zeta_{1}\) and \(\zeta_{2}\).9

Table 5: Variances and Covariances for Simulation
$\sigma^{2}_{1}$ $\sigma^{2}_{2,Upper}$ $\sigma^{2}_{2,Lower}$ $\sigma_{12}$ $E[\zeta_{1} \Delta \epsilon_{3}]$ $E[\zeta_{2} \Delta \epsilon_{3}]$
Mean 0.18 0.77 0.05 0.08 -0.03 -0.06
Standard Deviation 0.08 0.07 0.05 0.05 0.05 0.06

Figure 1: For illustrative purposes: the corrected income with and without the shock for one replication of one household by measurement error correction type for a 10% shock. Each wave of the sample is four months.

After the results on the simulation parameters, I show the income paths for one household and the aggregate sample. Figure 1 shows for one example household the corrected income with and without a 10% household-specific shock by measurement error types to illustrate the income paths. The income path without the shock is 10% greater than the income path with the shock and these are compared for the income path without measurement error correction for this graph. Throughout the sample time frame, the household experiences other random shocks. These random shocks are the same between both cases of the income with and without the shock. In this example, the income never reaches the initial value within the sample length, so the household does not recover. However, this is only one replication for one household. This household achieves a 0.86 recovery rate across replications. For the replications when this household recovers, the mean recovery time is 1.37 waves or 5.48 months.

Figure 2: Total Real Household Income by Sample Wave for Measurement Error (ME) Correction Types Averaged across Replications for the 25th and 75th Distribution Quantiles. Note: Upper is the upper bound of the projection error variance, none is no measurement error correction, and lower is the lower bound of the projection error variance.

Figure 2 summarizes the income distribution across household-replications by the 25th and 75th quantile of the distribution on sample wave. After generating all the household income paths, they are averaged across replications by households. The income path is plotted by measurement error type on the sample wave, which are in four-month increments. In this graph, the income path without a measurement error correction shows the widest bounds on the 25\(^{th}\) and 75\(^{th}\) quantiles. The tightest bound is the income path with the lower bound on the projection error variance. The income path using the upper bound on the projection error variance is bounded by the income path without a measurement error correction and the measurement error correction using the lower bound on the projection error variance. After sample wave three, the kinks in the income paths smooth out. This feature is likely due to the linear projection for the first two waves until the recursive nature of the income dynamics kicks in. Income is decreasing over the sample period, starting with a mean of 9.8 in the first wave, 9.43 in wave 8, and 8.55 in wave 16 for the income without measurement error. The data period is from May 2008 to November 2013, during which recovery from the Great Recession occurs.

As the main contribution of this paper is to provide a method to estimate the recovery times after an income shock, the results describe the distributions of recovery times. Previous literature has estimated whether or not households recover from a specific type of shock, but this paper also provides the recovery speed. Since the model provides an upper and lower bound on the projection error variance, I examine how correcting for measurement error affects the recovery speed estimates. The results are shown first for the proportion of households who recover and second for the recovery time distributions.

Table 6: Recovery Proportion by Shock Size and Projection Error Variance Bound
10% 25%
Upper 0.57 0.53
Lower 0.60 0.52
None 0.65 0.62

Table 6 shows the proportion of households who recover from an income shock by the shock size of 10% or 25% and the type of measurement error correction: none, a lower bound on the projection error variance, or an upper bound on the projection error variance. The first result is that the proportion recovered is greater for the 10% shock than the 25% shock. About 43% of households do not recover from a 10% shock at the upper bound on the projection error variance. For both shock sizes, not correcting for measurement error gives a larger proportion of households who recover than when using either the lower or upper bound for the projection error variance. Using a bound on the projection error variance to include measurement error decreases the proportion recovered by 8 percentage points to 5% for a 10% shock and by 10 percentage points to 9% for a 25% shock.

(a) 10% shock

(b) 25% shock

Figure 3: Main Sample Recovery Speed Distribution Varying the Measurement Error Correction by Shock Size Conditional on Recovery during the Sample Period in the 2008 Data. A recovery period is four months.

The empirical cumulative distribution function (eCDF) for the recovery times from either a 10% or 25% income shock are shown for three cases: (i) no measurement error, (ii) the variance of the projection error is set at its lower bound, and (iii) the variance of the projection error is set at its upper bound. As some households do not achieve recovery within the time frame of the study, the true cumulative distribution function is not known for these households. Thus, the graphs show only the percentage of household heads who recover for a given four-months.

First, I compare the measurement error corrections on the shock size. Figure 3 shows the comparison between no correction for measurement error and the variance of the projection error set at each the lower and upper bound for a 10% income shock in Figure 3 (a) and a 25% shock in Figure 3 (b). The income path without measurement-error correction is bounded between the lower and upper bound of projection error variance. For three waves of recovery time with a 10% shock, 84% of households recover on the lower bound of the projection error variance, 89% of households recover without correcting for measurement error, and 90% of households recover on the upper bound of the projection error variance. Recall that the upper bound of the projection error variance implies no time-invariant measurement error, in which case the recovery times are faster than observed income. Similarly, the lower bound of the projection error variance implies both time-invariant and time-varying measurement error gives longer recovery times than the observed income.

Next, I show the shock size comparison while keeping constant the measurement error correction type. Figure 4 shows the distribution of recovery times resulting from varying the shock size to 10% or 25% across households using each specification of measurement error correction. As expected, the eCDF for the 25% shock first-order stochastically dominates the 10% shock, since for any cumulative probability value, the 25% shock has a longer recovery time. Larger shocks take longer to recover from. For example, 90% of households have recovered from a 10% shock and 88% have recovered from a 25% shock by three waves - both on the upper bound of the projection error variance. The remaining results will be shown for a 10% shock on the upper bound of the projection error variance for simplicity.

(a) PE: Lower Bound

(b) PE: Upper Bound

(c) No ME correction

Figure 4: Main Sample Recovery Speed Distribution Conditional on Recovery during the Sample Period Varying Shock Size by Measurement Error Correction Type. A recovery period is four months.

Taken together to summarize, these main results show that the proportion of households who recover is larger when not correcting for measurement error. Conditional on recovery, the observed recovery speeds without measurement error are bounded between by the measurement error corrected recovery speeds. Also, larger shocks take longer to recover from.

(a) 10% Shock, Upper Bound PE

(b) 25% Shock, Upper Bound PE

(c) 10% Shock, Lower Bound PE

(d) 25% Shock, Lower Bound PE

(e) 10% Shock, No Correction

(f) 25% Shock, No Correction

Figure 5: Main Sample Full and Half-Life Recovery Speed Distributions Conditional on Recovery during the Sample Period by Measurement Error Correction (rows) and Shock Size (columns). A recovery period is four months.

The previous graphs were computed based on full recovery: obtaining an income level at least as high as the income before the shock. Considering a partial recovery is also of interest. A half-life recovery time is the recovery speed for recovering half of the lost income. Figure 5 compares the eCDF of the half-life recovery times for the 10% and 25% shock size using each measurement error correction. As expected, the half-life recovery first-order stochastically dominates the full recovery. For any cumulative probability, the full recovery times are longer than recovering to the half-life. By three waves, 88% of households have recovered to the half-life and 84% of households have recovered fully from a 10% shock on the lower bound of the projection error variance.

Figure 6: Percentage of Recovered Income by the Final Period for a 10% Shock on the Upper Bound of the Projection Error Variance. Values greater than one represent a complete recovery.

Instead of determining recovery by whether the household has returned to the initial level of income before the shock, a more agnostic approach to recovery is to calculate the percentage of income each household recovers by the final period. This is done by dividing the final period’s 3% present-value discounted income by the household’s initial income. Figure 6 shows the percentage of income recovered by households. If the value is greater than one, then the household’s final income is at least as large as their initial income. On average, households recover about 78% of their income by the end of the sample period. However, note that it is possible that households recover and then receive another shock later on, which explains why only 11.15% of households seem to recover fully in this method of calculating recovery, compared to the previous proportion recovered of 57%.

Introducing a negative shock has impacts on the aggregate economy. I show this by calculating the aggregate income loss from a shock, comparing the income distributions with and without the shock, and calculating the Gini coefficients10 before and after of the income distributions with and without the shock.

(a) Household

(b) Aggregate

Figure 7: Income Loss until Recovery Occurs for a 10% Shock using the Upper Bound of the Projection Error Variance on the 2008 Data.

The aggregate income loss or gain is the sum across periods of the difference between the corrected income with the shock and the corrected income without the shock for each households replication. These income differences are calculated using the present-discounted value of income in each period using a 3% discount rate. The mean log income change for all periods is -1.35. Figure 7 (a) shows the sum of the income lost for each household until recovery occurs, or if not recovered, the sum of income lost across all periods. Thus, this histogram shows the distribution of log income loss resulting from a 10% shock on the upper bound of the projection error variance. For each replication and summing across households, Figure 7 (b) shows the aggregate log income loss. The mean aggregate log income loss for all periods by replications is -238.16.

The income distributions with and without the shock show how a shock impacts the entire distribution and thus the income inequality. Figure 8 shows the corrected income distributions without the shock (left panel) and with the shock (right panel). The corrected income uses the upper bound of the projection error variance. The income distributions are shown with the initial (upper panel) and final (lower panel) to show that the distributional differences are not alone due to the time effect. The initial income distributions are the same, but the final income distribution without the shock exhibits less variance than the final corrected income distribution with the shock.

(a) Initial income, without shock

(b) Initial income, with shock

(c) Final income, without shock

(d) Final income, with shock

Figure 8: Total Real Household Log Income Distributions on the 2008 Data using the Upper Bound of the Projection Error Variance.

Table 7: Gini coefficient on the Initial and Final Corrected Income Distributions with and without the 10% Shock using the Upper Bound of the Projection Error Variance.
Without Shock With Shock
Initial 0.46 0.46
Final 0.61 0.71

To further quantify these distributional differences, the Gini coefficient is calculated for the income distributions with and without the shock to show a measure of the change in the income inequality. The columns of Table 7 show the income distribution with and without the shock. To note that these changes are not due only to the passage of time, the rows of the table show the initial and final distribution. The Gini coeffient of corrected income with the shock in the last sample period is 0.71 compared to 0.61, the distribution of corrected income without the shock in the last period. Introducing a 10% shock on the upper bound of the projection error variance decreases the income inequality. These results add to those from Glewwe (2007) and Praag, Hagenaars, & Eck (1983) that measurement error affects the income inequality estimates.

Figure 9: Main Sample: Recovery Speed Distribution Conditional on Recovery during the Sample Period for the 2004 and 2008 SIPP Data Releases. Shock size is 10% and no measurement error correction is used. A recovery period is four months.

The previous analysis was carried out on the 2008 wave of the SIPP. The 2008 is the longest relevant sample in the SIPP and thus why it was used for the main analysis. The 2004 data only has 12 waves of sample data. Now, I compare the recovery times from the 2008 wave to the 2004 wave to observe if the economic environment has any impact on resilience and recovery. The 2004 data is sampled from February 2004 to January 2008, while the 2008 cohort is interviewed between May 2008 through November 2013. The 2004 data survey was conducted before the 2008 Great Recession period, so it acts as a control environment. By the end of the sample period or 12 waves of 2004 data, 65% of households have recovered, compared to the 69% of households recover by 12 waves in the 2004 data. Conditional on recovery, Figure 9 shows that initially, for shorter recovery times, the 2008 data has a higher cummulative probability of households who have recovered. Then, for longer recovery times, the 2004 data has higher cumulative probabilities. For example, by three waves, 87% of 2004 household heads have recovered and 89% of 2004 household heads have recovered. However, by nine waves the opposite is evident: 99% of 2004 household heads have recovered and 99% of 2004 household heads have recovered. Thus, the economic environment has a lower effect than expected.

4.2 Empirical findings

Table 8: 2008 Empirical Shock Statistics on the Mean of Households
2004 2008
Proportion small shocks 0.53 0.57
Proportion medium shocks 0.04 0.03
Proportion large shocks 0.03 0.03
Proportion any shocks 0.59 0.64
Proportion recover first small shock 0.20 0.18
Proportion recover first medium shock 0.11 0.11
Proportion recover first large shock 0.15 0.16
Proportion recover first shock 0.20 0.18
Recovery time (months) first small shock 1.26 1.23
Recovery time (months) first medium shock 2.52 3.13
Recovery time (months) first large shock 2.08 2.52
Recovery time (months) first shock 1.21 1.18
Number of small shocks 5.75 8.48
Number medium shocks 0.38 0.49
Number large shocks 0.33 0.48
Number any shocks 6.46 9.45

The methods in this paper show recovery speeds from a 10% or 25% shock induced into household income paths in the second period. To what extent are shocks observed in the data and how fast do households recover from them as observed in the data?

While not causal, Table 8 shows the empirical shocks that occur within the data. I calculate a shock as a percentage decrease from one period to the next. For the 2008 data, I calculate the proportion of shocks that occur for households by size category: small as less than 5%, medium as between 5% and 10%, and large as greater than 10% or any decrease in the percentage change in the income level. Small shocks constitute the majority (0.89) of income drops in a period. About 57% of households have a 5% income drop in a period. The proportion of individuals who recover from the first shock ranges between 10% and 20%.

In their discussion of household income dynamics in rural China, Jalan & Ravallion (2002) describe recovery time after shocks of less than 5%, between 5-10%, and greater than a 10% drop in expenditure. I compute these same changes in the SIPP 2008 and 2004 data. I find that the average recovery time in 2008 from a shock larger than 10% is 2.52 waves. For comparison purposes, the average recovery time found in my methods from introducing a 10% shock into the income dynamics model without accounting for measurement error is 1.83. However, these differences must be interpreted with caution as the exercise of this paper introduces an exact 10% shock, whereas the empirical model uses a range of shocks to calculate recovery times as the probability of observing an exact 10% shock for each household in the sample is low. Also, these shocks are endogenous and ignore measurement error in income.

4.2.1 Mean-Reverting Measurement Error

Income measurement error may be mean-reverting (Bound & Krueger, 2015; Pischke, 1995). A model for mean-reverting measurement error Kim & Solon (2005) is the following: \[ y_{it} = \lambda y_{it}^{*} + e_{i} + v_{it}, \] where \(0 < \lambda < 1\). Solon, Barsky, & Parker (1994) estimate \(\hat{\lambda} = 0.67\). I set \(\lambda=0.6\) to re-estimate the model and simulations.

(a) Mean-reverting ME, 10% shock

(b) Main Results, 10% shock

(c) Mean-reverting ME, 25% shock

(d) Main Results, 25% shock

Figure 10: Main Sample Recovery Speed Distribution with Mean-Reverting and non-Mean Reverting Measurement Error (ME) by ME Type and Shock Size (columns) Conditional on Recovery during the Sample Period in the 2008 data. A recovery period is four months.

Without mean-reverting measurement error, the recovery proportions by measurement error correction type for a 10% shock were from lowest to highest: 57% for the upper bound of the projection error variance, 60% for the lower bound of the projection error variance, and 65% for no correction of measurement error. For the mean-reverting measurement error with \(\lambda=0.6\) and a 10% shock, I find that the recovery proportions differ by the measurement error correction type. Now, 65% of households recover with no measurement error correction, 65% of households recover with the upper bound on the projection error variance, and 65% of households recover with the lower bound on the projection error variance. As shown in Figure 10, conditional on recovery, the recovery speed distribution for the upper bound of the projection error variance and the distribution for no measurement error correction are almost overlapping with the distribution of recovery times for the lower bound on the projection error variance stochastically dominating. About 89% of households have recovered by three waves from both the upper bound on the projection error variance and no measurement error correction, while 85% of households have recovered by the same period for the lower bound on the projection error variance on a 10% shock. The results differ by the measurement error correction type because of the how the measurement error terms show up in the model for mean-reverting measurement error. Compared to the previous main results, the mean-reverting measurement error recovery speed distributions are slightly lower in the first period, but by the second period are more similar.

Figure 11: Main Sample Recovery Speed Distribution Conditional on Recovery during the Sample Period while Varying Measurement Error. Shock size is constant in absolute terms across all households; equal to 10% of mean income in the sample. A recovery period is four months.

If measurement error is mean-reverting, then the misreported income has less variance. Therefore, a 10% shock may be smaller in the absolute sense when ignoring the measurement error and a faster biased recovery time. There are two ways to account for this. The first is to introduce the shock as 10% of the average household income instead of 10% of the household specific income. The second is to use the same household shock and parameter values from the reported income but the random draws from the other cases of measurement error types.

In the first method, Figure 11 shows the recovery time eCDF by measurement error bound type for a 10% shock by the mean household income level. The distribution of recovery times for the income path corrected using the lower bound of the projection error variance stochastically dominates the other two distributions of recovery times - the income path corrected with the upper bound of the projection error variance and the income path not corrected with measurement error. In the first period of recovery times, the next period after a 10% shock using the upper bound of the projection error variance, 19% of household heads have recovered. Using the lower bound of the projection error variance, 2% of household heads have recovered. When not accounting for measurement error, 37% of household heads recover in the next period after the shock. For five periods of four-month waves, 83% of households recover using the upper bound of the projection error variance, 60% recover using the lower bound of the projection error variance, and 88% recover when not accounting for any measurement error. Overall, introducing a shock at the mean household income level produces slower recovery times than in Figure 3 where the 10% shock is household specific.

Figure 12: Main Sample Recovery Speed Distribution Conditional on Recovery during the Sample Period Varying Measurement Error but Keeping the Initial Income Values Constant. A recovery period is four months.

Secondly, I compute the household specific shock for the observed income. Then, I use the income in \(t=1\) and \(t=2\) with all the same parameter estimates and applicable random draws to generate the income paths correcting for measurement error. Thus, all households are exposed to the same shock across the different measurement error cases and the only differences come from the differences in the variances used to correct for the measurement error. However, the upper and lower bound of the projection error variance provide measurement error corrections that differ mainly in the initial values of income. Thus, Figure 12 shows little difference between the two measurement error correction types. Between the correction for measurement error and the reported income, the income corrected for measurement error stochastically dominates. The reported income recovery times are longer than when correcting for measurement error, even while using the same initial income values to account for mean-reversion in the measurement error. From the eCDF, 92.74% of households have recovered from a 10% shock in four sample waves with the reported income case and 85.06% in the same time with the lower bound on the projection error variance and 85.09% with the upper bound on the projection error variance.

4.3 Heterogeneity Analysis

Next, I split the sample and re-estimate the parameters, variances, and error-free income paths to capture demographic heterogeneity. I discuss the recovery proportion and recovery time distribution for subgroups in the demographic characteristics of race, education, marital status, and metro status. I compare the recovery rates for a 10% and a 25% income shock across groups. In all cases, the 25% shock gives slower recovery times than the 10% shock, first-order stochastically dominating. While the type of correction for measurement error affects the recovery times, it does not switch which group recovers faster than the other: results are consistent across measurement error types. In summary, I find that a larger proportion of Black household heads recover within their group, but conditional on recovery, White household heads recover faster throughout the distribution. Though roughly similar proportions recovered, married household heads recover faster than single household heads. Recovering in roughly equal proportions, I find that high school educated households recover faster than college educated household heads. Lastly, a larger proportion of non-metro household heads recover, but conditional on recovery, metro household heads recover faster. The results are explained in further detail in the subsequent tables and figures.

4.3.1 Race

Figure 13: Recovery Speed Distribution by Race Conditional on Recovery during the Sample Period for a 10% shock with the Upper Bound on the Projection Error Variance. A recovery period is four months.

69% of Black household heads recover, while 55% of White household heads recover from a 10% shock using the upper bound of the projection error variance. While a larger percentage of Black household heads recover within their demographic group compared to the percentage of White household heads who recover, of the households who recover, Figure 13 shows White household heads recover faster than Blacks. The eCDF for Blacks first-order stochastically dominates the eCDF for Whites. By three waves, 79% of Black household heads have recovered and 90% of White household heads have recovered. Reconciling the discrepancies between the proportion recovered and the recovery speeds conditional on recovery, recall from Table 2 that White household heads comprise about of the 2008 sample.

About 69% of Blacks recover when the 10% shock is given at the household level, while 36% recover when the 10% shock is the mean of each group. The pattern holds for Whites, though their recovery proportions are lower. 55% recover for a 10% shock at the household level and 22% recover when the 10% shock is calculated on the mean of the demographic subgroup. A larger proportion of Blacks recover to Whites regardless of how the shock is introduced, but a smaller proportion recover when the shock is given as the mean of each demographic subgroup.

(a) Household shock

(b) Group mean shock

Figure 14: Recovery Speed Distribution by Race Conditional on Recovery during the Sample Period for a 10% shock with the Upper Bound on the Projection Error Variance. A recovery period is four months.

In Figure 14, both graphs show that the recovery time distribution for the Blacks stochastically dominates that of the Whites. The recovery times are slower when the shock is the mean of the group (Figure 14 (b)) rather than at the household level (Figure 14 (a)). For the Blacks, 95% recover by a household-specific 10% shock, while 84% recover from a shock at the group household income mean.

Now I determine the percentage change such that the average shock in levels in the Black sample is of the same size as the average level shock in the White sample. I find that that a 71% decrease for the Whites would be equivalent to 10% decrease for the Blacks.

4.3.2 Education

Figure 15: Recovery Speed Distribution by Education Conditional on Recovery during the Sample Period for a 10% Shock with the Upper Bound on the Projection Error. A recovery period is four months.

Roughly, 58% and 56% of high school and college-educated household heads, respectively recover at any time within the sample period for a 10% shock at the upper bound of the projection error variance. However, household heads with a high school education recover faster than college-educated household heads, in Figure 15, conditional on recovery. College household heads stochastically dominate high school household heads. By three waves, 91% of high school educated households have recovered and 85% of college educated households have recovered.

By educational status, the proprotion recovered is again much smaller when the shock is at the group mean. High school household heads recover with 58% for a 10% household specific shock and 30% for a shock of the group mean level. Fewer college educated household heads recover in both categories: 21% for a 10% household shock and 57% for a 10% shock by education level mean.

(a) Household shock

(b) Group mean shock

Figure 16: Recovery Speed Distribution by Education Conditional on Recovery during the Sample Period for a 10% shock with the Upper Bound on the Projection Error Variance. A recovery period is four months.

The distribution of college educated household heads’ recovery times stochastically dominates the high schoxol heads’ recovery times distribution in both types of shocks. The eCDF of recovery times for the shock at the group means starts lower than the household shock. In the first period, 76% of high school educated households have recovered from the household shock but only 26% of high school educated households have recovered from the 10% shock at the high school educated household income mean. Similarly, 70% of high school educated households have recovered by the first period from the household shock but only 6% of high school educated households have recovered from the 10% shock at the college educated household income mean in the same period.

I find that that a 37.89% decrease would be required such that the average shock in levels for the high school educated household heads is of the same size in levels as the average shock in the college educated household heads.

4.3.3 Marital Status

Figure 17: Recovery Speed Distribution by Marital Status Conditional on Recovery during the Sample Period for a 10% Shock and the Upper Bound on the Projection Error Variance. A recovery period is four months.

Similar proportions of married and single household heads recover: 0.55 for married and 0.57 for single household heads. These proportions are calculated for a 10% shock on the upper bound of the projection error variance. Figure 17 shows, conditional on recovery, that married household heads recover faster than single household heads, which stochastically dominate. For three waves, 85% of married household heads have recovered and 78% of single household heads have recovered.

The proportion recovered is similar for married and single by each shock type. For the household specific 10% shock at the upper bound of the projection error variance, 56% of married household heads recover and 59% of single household heads recover. For the 10% shock by the subgroup means on the upper bound of the projection error variance, 19% of married household heads recover and 22% of single household heads recover.

(a) Household shock

(b) Group mean shock

Figure 18: Recovery Speed Distribution by Martial Status Conditional on Recovery during the Sample Period for a 10% shock with the Upper Bound on the Projection Error Variance. A recovery period is four months.

Figure 18 shows that the recovery times of married household heads stochastically dominate those of the single household heads. Figure 18 (a) shows faster recovery times for the 10% household shock at the upper bound of the projection error variance than the 10% from the group means shock. In fact, by five periods 91% of married household heads have recovered and 87% of single household heads have recovered from a 10% household specific shock at the upper bound of the projection error variance. In contrast, 70% of married household heads and 58% of single household heads have recovered from a 10% shock at the upper bound of the projection error variance given from the subgroup household income means.

I find that that a 40.18% decrease would be required such that the average shock in levels for the single household heads is of the same size in levels as the average shock in the married household heads.

4.3.4 Metro Status

Figure 19: Recovery Speed Distribution by Metro Status Conditional on Recovery during the Sample Period for a 10% shock and the Upper Bound on the Projection Error Variance. A recovery period is four months.

Finally, the metro demographic variable shows the largest difference in the recovery proportions between the demographic groups considered. 53% of household heads in a metro area recovered by the end of the sample period for a 10% shock on the upper bound of the projection error variance. More non-metro household heads, 83%, have recovered for the same size shock and measurement error bounds. Metro household heads comprise of the sample, from Table 2. Thus, conditional on recovery, Figure 19 shows that the non-metro recovery distribution stochastically dominates the metro recovery times distribution. Metro household heads recover faster than non-metro household heads. For three waves of recovery time, about 78% of non-metro household heads have recovered. For the same time, about 92% of metro-located household heads have recovered.

Non-metro household heads show the largest proportion recovered. From a 10% household specific shock at the upper bound of projection error variance, 81% non-metro household heads recovered. 40% of non-metro household heads recovered from a 10% shock at the mean for the upper bound of the projection error variance. From the same respective shocks, 55% of metro households recover from the household specific shock and 27% of metro households recover from the 10% shock at the mean of household income by metro status.

(a) Household shock

(b) Group mean shock

Figure 20: Recovery Speed Distribution Conditional on Recovery during the Sample Period by Metro status for a 10% Shock with the Upper Bound on the Projection Error Variance. A recovery period is four months.

The distribution of recovery times for the non-metro household heads also stochastically dominates that of the metro household heads throughout the distribution, as shown in Figure 20. The non-metro household heads recover slower from a 10% shock of the respective group mean, comparing Figure 20 (a) and Figure 20 (b). By the first five four-month waves, 95% of non-metro households have recovered from a 10% household-specific shock with the upper bound of the projection error variance while 85% of non-metro households have recovered from a 10% shock using the mean household income of non-metro households. Of the same respective shock and five four-month waves, 88% and 56% non-metro household heads have recovered.

I find that that a 14.5% decrease would be required such that the average shock in levels for the non metro household heads is of the same size in levels as the average shock in the metro household heads.

In conclusion, the groups with higher recovery proportions are the household heads who are Black, single, college educated, and living in a non-metro area. Those household heads who recover faster are those that are White, single, high school educated, and living in a metro area. While accounting for measurement error is important, it does not change the outcome of which groups recover faster or at all. Introducing the shock as the mean of the subgroup instead of at the household level significantly delays recovery times and proportions.

Possible explanations to explain these group differences may be theorized as follows. For example, mobility and family structure may explain part of these differences. Households who have a married household head may be likely to recover faster because they need to support a family. On the educational front, household heads with only a high school education are likely to recover faster, because job availability may be more prevalent for those of high school education. College educated household heads may want to find a better paying job that requires a college education or use of their degree. Thus, they may spend more time looking for job. Alternatively, if the income shock is not due to job loss, then college educated workers may be more likely to be in a salaried position, usually contracted. These types of positions have longer periods before adjustment. If inflation increases and decreases real income, salaried workers may take longer to adjust than an hourly worker, whose pay rate may change more frequently. Similarly, those in a metro area may be likely to have incomes less responsive to change.

4.3.5 Sample Types

Figure 21: Robustness across Sample Types Conditional on Recovery during the Sample Period for a 10% Shock on the Upper Bound of Projection Error Variance. A recovery period is four months.

Table 9: Robustness across Sample Types for the Recovery Proportion on a 10% Shock with the Upper Bound of the Projection Error Variance.
Recovered
Employed 0.56
No change in HH 0.54
Allow self employed 0.66
No change in marital 0.58
Main Sample 0.56

The recovery times previously calculated were on the main sample of household heads. However, what other external factors are driving these changes? Consider a robust sample where household heads experience little external change, i.e. no change in marital status, number of people in the household, or employment. If the recovery rates differ little from the main sample, then these outside factors have a small effect on recovery. Table 9 shows the proportion recovered for each of these other restricted samples using a 10% shock and the upper bound of the projection error variance. The main sample has a proportion recovered of about 56%, which is the same as using only household heads who are employed. This result suggests that the main sample has a similar proportion of recovered household heads to those who are only employed. The largest difference in the recovery proportion from the main sample estimate is for the sample where I allow for household heads to be self-employed. In that case, the proportion recovered increases to 66, a difference of 10% of households. Decreasing the proportion recovered, the sample where I do not allow for any change in the household composition has 54% of households recovering. The sample of no change in marital status has a similar size effect but increasing the proportion from the main sample at 58%. Of the households who recover, Figure 21 next shows the eCDF comparing recovery times of the main sample to those who are only employed, have no change in the household composition, can be self-employed, and have no change in their marital status throughout the panel. Those who recover fastest are those who remain employed, while the sample not allowing for self-employment shows the slowest recovery times. Not allowing for self-employment stochastically dominates all other graphs, with smaller differences between the distributions of no change in the household composition and no change in marital status with the main sample.

Recovery times differ by demographic subgroups. The results show that a larger proportion of Blacks recover than Whites, but Whites recover faster, conditional on recovery. The distribution of recovery times for household heads with a college education stochastically dominates those with a high school education. Single household heads have recovery times that stochastically dominate married household heads. The non-metro household heads recover faster and at a higher proportion than household heads in a metro area. Where do these differences between demographic groups come from? Mechanically, the differences must come through one of the following channels: the parameter estimates \(\gamma\), \(\beta\), or \(d_{t}\); the random errors \(\sigma^{2}_{u}\) or \(\sigma^{2}_{\epsilon}\); the time-varying household size covariate \(X_{it}\) ; or the initial income values \(y_{i1}\) of each group. I seek to answer the following question: how are the recovery times and the recovery proportion affected by using the values for the other demographic subgroup? For example, if I were to give the Black household heads the random shock from the White distribution, how much would the recovery proportion decrease? Or how would giving the Black household heads the parameter values, the covariates, or the initial values from the White household heads decrease the recovery proportion or speeds of that group?

To answer this question, I use the previously generated income paths for each demographic subgroup. Then, using each of the possible seven factors that influence the income path: \(\gamma\), \(\beta\), \(d_{t}\), \(\sigma^{2}_{u}\), \(\sigma^{2}_{\epsilon}\), \(X_{it}\), and \(y_{i1}\), I reconstruct the income path for that group using random samples from other group’s values of that same factor, one at a time. I randomly sample \(\gamma\), \(\beta\), \(\sigma^{2}_{u}\), and \(\sigma^{2}_{\epsilon}\) from their respective distributions. At each \(t\), I sample \(d_{t}\) and \(X_{it}\). Finally, I sample \(y_{i1}\) from the income distribution at \(t=1\). From this reconstructed income paths using the other parameters, random errors, covariates, or initial values from the other group, I calculate the recovery speeds and proportions for each of the seven factors. I compare each factor to the baseline recovery speeds and proportions to show how these statistics would be affected by using the values from the other group. For a more detailed description of how these recovery proportions and times are generated, please see the algorithm in Section 8.

Table 10: Demographic Decomposition by Subgroup Analysis Relative to the Baseline Reference Group in Each Demographic Category

(a) Recovery Times, Conditional on Recovery
$\gamma$ $\beta$ $\sigma^{2}_{u}$ $\sigma^{2}_{\epsilon}$ $d_{t}$ $X_{it}$ $y_{i1}$
Education 1.06 1.00 0.86 1.24 0.78 1.01 1
Marital 1.03 1.01 0.79 1.39 0.90 1.02 1
Metro 0.77 0.99 0.75 1.47 0.76 1.01 1
Race 1.02 1.00 0.84 1.37 1.13 1.01 1
(b) Recovery Proportion
$\gamma$ $\beta$ $\sigma^{2}_{u}$ $\sigma^{2}_{\epsilon}$ $d_{t}$ $X_{it}$ $y_{i1}$
Education 0.95 1.00 0.66 0.90 1.38 1 1
Marital 0.98 1.00 0.85 0.86 1.13 1 1
Metro 1.33 1.01 1.17 0.59 1.49 1 1
Race 0.99 1.00 0.89 0.91 0.99 1 1

Table 10 shows the demographic decomposition by both recovery speeds (Table 10 (a)) and proportions (Table 10 (b)). In both tables, substituting the initial level of income from the other demographic subgroup does not change the recovery speeds or proportions. These values are one, signifying the same value as the baseline group. This is because the income process starts after \(t=2\), requiring two initial values to start the process. Thus, the initial value has no impact on recovery speed. Of interest in the tables are differences from one, or how much faster or slower the recovery times are and how much more does the proportion change from the baseline recovery speed and proportion using the values from its own group. Values farther from one represent a larger difference in a given category than what was previously generated for that demographic group.

In Table 10 (a), values above one in the table represent slower recovery times than the baseline, while values below one represent faster recovery times on average. Across demographic groups, the recovery speeds that have the largest absolute deviation from one are found in the column for \(\sigma^{2}_{\epsilon}\), with differences ranging from 0.24 for the Education group to 0.47 for the Metro group. Thus, \(\sigma^{2}_{\epsilon}\) explains most of the demographic differences in recovery speeds between the subgroups, conditional on recovery. Besides \(y_{i1}\), changing \(\beta\) and \(X_{it}\) have the least impact on recovery times and proportions, with differences of less than \(0.01\) from the baseline. The absolute deviation in recovery times from the baseline for \(\sigma^{2}_{u}\) and \(d_{t}\), while both less than \(\sigma^{2}_{\epsilon}\), are still elevated. \(\sigma^{2}_{u}\) has higher deviations from the baseline for all categories except Education, in which \(d_{t}\) has a higher value than \(\sigma^{2}_{u}\). For comparison, the differences from the baseline in \(\sigma^{2}_{u}\) range from 0.14 for the Education group to 0.25 for the Metro. The greatest deviation from one in the entire table occurs for the demographic group of Metro with a value of 0.47 when changing the value of \(\sigma^{2}_{\epsilon}\). Changing \(\sigma^{2}_{\epsilon}\) from the reference group gives longer recovery times than the baseline. \(\sigma^{2}_{u}\) and \(d_{t}\) except for the Race category provide shorter recovery times than the baseline when using random values from the reference group. The demographic groups show differences in factors also. Metro has the largest difference from one for \(\gamma\); Metro for \(\sigma^{2}_{u}\), Metro for \(\sigma^{2}_{\epsilon}\), and Metro for \(d_{t}\).

In contrast to Table 10 (a), where \(\sigma^{2}_{\epsilon}\) showed the largest deviations from one, \(d_{t}\) and \(\sigma^{2}_{u}\) show the largest deviations from one in Table 10 (b), which shows the proportion of households who recover from an income shock compared to the baseline using the values of the reference group. In this table, values greater than one represent a larger proportion of households who recover relative to the baseline comparison group, while values less than one represent a smaller proportion of households who recover in each group relative to the baseline. \(d_{t}\) shows the largest difference from the baseline for education with a difference of \(0.38\) and metro with a difference of \(0.49\). \(\sigma^{2}_{u}\) shows the largest difference from the baseline for marital status with a difference of \(0.15\) and Race with a difference of \(0.11\). \(y_{i1}\), \(X_{it}\), and \(\beta\) all show the least impact on the proportion recovered with absolute differences of less than \(0.01\). By demographic group, metro also shows the largest deviations from the baseline in the recovery proportion compared to race, education, and marital status. Except for race, \(d_{t} >1\), showing fewer households who recover relative to the baseline when \(d_{t}\) comes from the reference group. On the other hand, substituting \(\gamma\), \(\sigma^{2}_{u}\), and \(\sigma^{2}_{\epsilon}\) from the reference group shows a larger proportion of households who recover relative to the baseline. Metro shows the largest difference from one for \(\gamma\), Metro shows the largest difference for \(\beta\), Education for \(\sigma^{2}_{u}\), Metro for \(\sigma^{2}_{\epsilon}\), and Metro for \(d_{t}\).

Between the two tables, there are differing factor effects by demographic group. For example, note that \(\sigma^{2}_{u}\) is less than one for all values except the Metro variable in Table 10 (b). Yet \(\sigma^{2}_{\epsilon}\) is less than one in Table 10 (a), but greater than one in Table 10 (b). Similarly, \(d_{t}\) has values of less than one for recovery speeds but greater than one values for recovery proportions, except race shows opposite values from this pattern. Also, \(\gamma\) shows values greater than one in Table 10 (a) and values less than one in Table 10 (b), except for the metro variable. \(\sigma^{2}_{\epsilon}\) has the largest values for the recovery speeds, while \(d_{t}\) and \(\sigma^{2}_{u}\) are larger for the recovery proportions.

Since the reason households are less likely to recover relative to their reference group comes largely from \(\sigma^{2}_{u}\) and \(d_{t}\), then policy implications can work to decrease the effects of time shocks and time trends. Conditional on recovery, households who have slower recovery speeds compared to their reference group counterpart are those experiencing the \(\sigma^{2}_{\epsilon}\) shocks. In the case of race, \(\sigma^{2}_{u}\) and \(\sigma^{2}_{\epsilon}\) give lower recovery proportions for the White group than otherwise. These same parameters contribute to slower recovery speeds of the same group. Policy measures can work towards helping households who experience random shocks.

5 Discussion

This paper investigates the household income shock recovery speed distributions. It models the income process accounting for measurement error. Measurement error-free incomes are simulated using parameters estimated from an autoregressive model with the initial corrected income values from linear projections. The income dynamics allow for general measurement error types. Incorporating the measurement error into the model for income dynamics corrects for bias when assessing household income recovery length.

When accounting for measurement errors, the recovery times are longer. The recovery speeds are generally right-skewed with the majority of households recovering in the next period, fewer households recovering more gradually, and some households never recovering within the estimated sample period.

Household heads that recover slower from income shocks are Black, college-educated, single, and live in a non-metro area. Most of the heterogeneity in recovery times seems driven by differences in the variances of shocks across groups. This research shows that households that are more mobile overcome income shocks. Policies designed to ease geographical constraints will help households. Specifically during the 2008 recession, the households who were projected not to recover or recover more slowly from using this method could have been targeted to aid in their recovery.

This paper develops a methodology for calculating income shock recovery speeds. The method could be applied to other settings to predict how long households will recover from a specific income shock or a financial crisis. For example, income shock recovery speed may be evaluated in other surveys similar to the SIPP.

Another avenue for further research is understanding the mechanisms underlying the differences in recovery rates and proportions between groups. The methods presented in this paper could be extended to incorporate seasonality in the measurement error or measurement error within a nonlinear model of income dynamics. Other time-varying demographic characteristics could also be added to the model.

Income shock recovery is an undervalued component of the income inequality gap. Income shocks are difficult for households to offset because income shocks affect current and future income states. By seeking to alleviate the differences in recovery speeds, households can return to their previous income levels faster and reduce the income inequality gap. Policies focused on mobility and household level constraints may also reduce the income inequality gap. Various potential policy approaches include monetary support, mobility assistance, and family support.

When faced with an economic crisis, the methods from this paper provide vital insights to which types of households should be targeted and how long these households may require additional assistance. By understanding the recovery speeds, policymakers can more effectively design strategies to address welfare and income inequality. This paper enables policymakers with tools to determine the most efficient methods for helping households overcome income shocks.

References

Allaire, J. J., Dervieux, C., Scheidegger, C., Teague, C., & Xie, Y. (2022). Quarto (Version 0.3). https://doi.org/10.5281/zenodo.5960048
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58(2), 277–297. https://doi.org/10.2307/2297968
Arellano, M., & Bover, O. (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics, 68(1), 29–51. https://doi.org/10.1016/0304-4076(94)01642-D
Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28(25), 3083–3107. https://doi.org/10.1002/sim.3697
Backman, M. (2021). How Are Americans Dealing With Income Loss During COVID-19? Retrieved from https://www.fool.com/the-ascent/research/income-during-covid-19/
Bollinger, C. R., & Chandra, A. (2004). Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data [{SSRN} {Scholarly} {Paper}]. Rochester, NY. https://doi.org/10.2139/ssrn.527007
Bound, J., Brown, C., & Mathiowetz, N. (2001). Chapter 59 - Measurement Error in Survey Data. In J. J. Heckman & E. Leamer (Eds.), Handbook of Econometrics (Vol. 5, pp. 3705–3843). Elsevier. https://doi.org/10.1016/S1573-4412(01)05012-7
Bound, J., & Krueger, A. B. (2015). The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right? Journal of Labor Economics. https://doi.org/10.1086/298256
Cantor, D., Brandt, S., & Green, J. (1991). Results of First Wave of SIPP Interviews. Unpublished Westat Report to the U.S. Census Bureau (Memorandum to Chet Bowie).
Chesher, A., & Schluter, C. (2002). Welfare Measurement and Measurement Error. Review of Economic Studies, 69, 357–378. Retrieved from https://christianschluter.github.io/files/published_papers/2002_REStud_Schluter.pdf
Cristia, J., & Schwabish, J. A. (2007). Measurement Error in the SIPP: Evidence from Matched Administrative Records. 27.
Dunford, B., & Gottlieb, G. C. (2016). Resilience at USAID 2016 Progress Report. United States Agency International Development: Center for Resilience. Retrieved from United States Agency International Development: Center for Resilience website: https://2017-2020.usaid.gov/sites/default/files/documents/1867/082816_Resilience_FinalB.PDF
Glewwe, P. (2007). Measurement Error Bias in Estimates of Income and Income Growth among the Poor: Analytical Results and a Correction Formula. Economic Development and Cultural Change, 56(1), 163–189. https://doi.org/10.1086/520559
Greifer, N. (2022). Cobalt: Covariate balance tables and plots. Retrieved from https://CRAN.R-project.org/package=cobalt
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013
Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating Vector Autoregressions with Panel Data. Econometrica, 56(6), 1371–1395. https://doi.org/10.2307/1913103
Jalan, J., & Ravallion, M. (2002). Discussion Paper No. 2002/10. 20.
Jappelli, T., & Pistaferri, L. (2010). The Consumption Response to Income Changes [Working {Paper}]. National Bureau of Economic Research. https://doi.org/10.3386/w15739
Kim, B., & Solon, G. (2005). Implications of Mean-Reverting Measurement Error for Longitudinal Studies of Wages and Employment. The Review of Economics and Statistics, 87(1), 193–196. Retrieved from https://www.jstor.org/stable/40042933
Lee, N., Ridder, G., & Strauss, J. (2017). Estimation of Poverty Transition Matrices with Noisy Data. Journal of Applied Econometrics, 32(1), 37–55. https://doi.org/10.1002/jae.2506
Li, H., Millimet, D., & Roychowdhury, P. (2019). Measuring Economic Mobility in India Using Noisy Data: A Partial Identification Approach. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3435380
MaCurdy, T. E. (1982). The use of time series processes to model the error structure of earnings in a longitudinal data analysis. Journal of Econometrics, 18(1), 83–114. https://doi.org/10.1016/0304-4076(82)90096-3
Meghir, C., & Pistaferri, L. (2002). Income Variance Dynamics and Heterogeneity [{SSRN} {Scholarly} {Paper}]. Rochester, NY. Retrieved from https://papers.ssrn.com/abstract=359580
Moffitt, R., & Zhang, S. (2022). Estimating Trends in Male Earnings Volatility with the Panel Study of Income Dynamics. Journal of Business & Economic Statistics, 1–6. https://doi.org/10.1080/07350015.2022.2102024
Pischke, J.-S. (1995). Measurement Error and Earnings Dynamics: Some Estimates from the PSID Validation Study. Journal of Business & Economic Statistics, 13(3), 305–314. https://doi.org/10.2307/1392190
Praag, B. van, Hagenaars, A., & Eck, W. van. (1983). The Influence of Classification and Observation Errors on the Measurement of Income Inequality. Econometrica, 51(4), 1093–1108. https://doi.org/10.2307/1912053
R Core Team. (2022). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Roodman, D. (2009). How to do Xtabond2: An Introduction to Difference and System GMM in Stata. The Stata Journal: Promoting Communications on Statistics and Stata, 9(1), 86–136. https://doi.org/10.1177/1536867X0900900106
Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., … Crowley, J. (2021). GGally: Extension to ’ggplot2’. Retrieved from https://CRAN.R-project.org/package=GGally
Solon, G., Barsky, R., & Parker, J. A. (1994). Measuring the Cyclicality of Real Wages: How Important is Composition Bias?*. The Quarterly Journal of Economics, 109(1), 1–25. https://doi.org/10.2307/2118426
StataCorp. (2021). Stata statistical software: Release 17.
Stevens, G. (1936). Swing Time. RKO Radio Pictures. Retrieved from https://www.youtube.com/watch?v=AGUsRGuZb6k
Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518–529. https://doi.org/10.1198/073500102288618658
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., & Müller, K. (2022). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr
Wickham, H., & Girlich, M. (2022). Tidyr: Tidy messy data. Retrieved from https://CRAN.R-project.org/package=tidyr
Wickham, H., Miller, E., & Smith, D. (2022). Haven: Import and export ’SPSS’, ’stata’ and ’SAS’ files. Retrieved from https://CRAN.R-project.org/package=haven
Wieland, T. (2019). REAT: A Regional Economic Analysis Toolbox for R. REGION, 6(3), R1–R57. Retrieved from https://doi.org/10.18335/region.v6i3.267
Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of Econometrics, 126(1), 25–51. https://doi.org/10.1016/j.jeconom.2004.02.005
Xie, Y. (2014). Knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC. Retrieved from http://www.crcpress.com/product/isbn/9781466561595
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Boca Raton, Florida: Chapman; Hall/CRC. Retrieved from https://yihui.org/knitr/
Xie, Y. (2022). Knitr: A general-purpose package for dynamic report generation in r. Retrieved from https://yihui.org/knitr/
Zhu, H. (2021). kableExtra: Construct complex table with ’kable’ and pipe syntax. Retrieved from https://CRAN.R-project.org/package=kableExtra

Appendix

6 Mathematical Details

6.1 Instrument Matrix

\[\begin{align*} \begin{pmatrix} y_{i1} & 0 & \ldots & & 0 \\ 0 & y_{i1} & y_{i2} & 0 & \ldots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & y_{iT-8} & \ldots & y_{iT-3} \end{pmatrix} \end{align*}\]

Proof: \(y_{it-1}\) is independent of \(\Delta \tau_{it}\)

To show that the lagged values of \(y_{it}\) are independent of the GMM residuals, it suffices to show that \(E[y_{it-1}|\Delta \tau_{it}] = E[y_{it}]\). Given the previous definition of \(\Delta \tau_{it}\) and the properties of the expectation, \(E[y_{it-1}|u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it}]\). However, \(u_{it}\) and \(\epsilon_{it}\) are assumed to be white noise and \(v_{it}\) is assumed to be classical measurement error. Thus \(y_{it}\) is independent in expectation.

6.2 Variance Matrix of Errors for Simulation

\[\begin{align*} \Sigma = \begin{pmatrix} \sigma^{2}_{1} & \sigma_{12} & E[\zeta_{1} \Delta \epsilon_{3} & 0 \\ & \sigma^{2}_{2} & E[\zeta_{2} \Delta \epsilon_{3}] & 0 \\ & & \sigma^{2}_{u} + 2 \sigma^{2}_{\epsilon} & -\sigma^{2}_{\epsilon} \\ & & & \sigma^{2}_{u} + 2 \sigma^{2}_{\epsilon} \\ \end{pmatrix} \end{align*}\]

6.3 Derivation of the Standard Deviation of the Variance Estimates

To find the standard deviation of the estimated variances, solve the inverse of the Jacobian multiplied by the estimated residuals in the moment conditions.

6.4 Derivation of the Moment Conditions using estimated GMM residuals

6.4.1 Derived Assumptions

\[\begin{align*} E[v_{it}v_{it-1}] &= E[v_{it-1}E[v_{it}|v_{it-1}]] = 0 \\ E[\Delta v_{it} \Delta v_{it}] &= E[( v_{it}-v_{it-1})( v_{it}-v_{it-1})] = E[ v_{it}^{2}-2v_{it}v_{it-1} + v_{it-1}^{2}] = \sigma^{2}_{v} - 0 + \sigma^{2}_{v} \\ &= 2 \sigma^{2}_{v} \\ E[\Delta v_{it} \Delta v_{it-1}] &= E[( v_{it}-v_{it-1})( v_{it-1}-v_{it-2})] \\ &= E[ v_{it}v_{it-1}-v_{it-1}^{2} -v_{it-2}v_{it} + v_{it-2}v_{it-1}] \\ &= -\sigma^{2}_{v} \\ E[\Delta v_{it} \Delta v_{it-2}] &= E[( v_{it}-v_{it-1})( v_{it-2}-v_{it-3})]\\ &= E[ v_{it}v_{it-2}-v_{it-1}v_{it-3} -v_{it-2}v_{it-1} + v_{it-3}v_{it-1}] = 0 \\ \end{align*}\]

These are similar for \(u_{it}\) and \(\epsilon_{it}\).

\[\begin{align*} Cov(\Delta v_{i2},\Delta v_{i3}) &= E[\Delta v_{i2}\Delta v_{i3}] - E[\Delta v_{i2}]E[\Delta v_{i3}] \\ &= E[( v_{i2}-v_{i3})(v_{i3}-v_{i4})] =E[ v_{i2}v_{i3}-v_{i3}^{2}-v_{i4}v_{i2}+v_{i4}v_{i3}] = -\sigma^{2}_{v} \\ Cov(\Delta v_{i2},\Delta v_{i2}) &= Var(\Delta v_{i2}) = Var(v_{i2} -v_{i3}) = Var(v_{i2}) + Var(v_{i3}) - 2Cov(v_{i2},v_{i1}) = 2\sigma^{2}_{v} \\ Cov(v_{i2},\Delta v_{i2}) &= E[v_{i2}\Delta v_{i2}] = E[v_{i2}v_{i3} - v_{i2}^{2}] = -\sigma^{2}_{v} \\ Cov(v_{i2},\Delta v_{i3}) &= E[v_{i2}\Delta v_{i3}] = E[v_{i2}^{2} - v_{i2}v_{i1}] = \sigma^{2}_{v} \\ Cov(\zeta_{i1},\Delta \epsilon_{i3}) &= E[\zeta_{i1}\Delta \epsilon_{i3}] - E[\zeta_{i1}]E[\Delta \epsilon_{i3}] = E[\zeta_{i1}\Delta \epsilon_{i3}]\\ \end{align*}\]

7 Computational Details

The results in this paper were obtained using Stata 17 (StataCorp, 2021) and R version 4.2.3 (2023-03-15 ucrt) (R Core Team, 2022) with the command xtabond2 (Roodman, 2009) and packages: haven 2.5.0 (Wickham, Miller, & Smith, 2022), dplyr 1.0.10 (Wickham, François, Henry, & Müller, 2022), tidyr 1.2.1 (Wickham & Girlich, 2022), cobalt 4.4.1 (Greifer, 2022), ggplot2 3.4.0 (Wickham, 2016), GGally 2.1.2 (Schloerke et al., 2021), REAT 3.0.3 (Wieland (2019)), knitr 1.42 Xie (2014), and kableExtra 1.3.4 (Zhu, 2021). Quarto (Allaire, Dervieux, Scheidegger, Teague, & Xie, 2022) was used for manuscript preparation and analysis reproducibility.

8 Algorithms

For each demographic group of race, education, marital and metro status, use the previously generated income paths \(y^{*}_{it}\) to:

  1. Choose the comparison group to reconstruct income paths and the reference group from which to randomly sample values as shown in the following table.

  2. Sample \(\gamma\), \(\beta\), \(\sigma^{2}_{u}\), \(\sigma^{2}\_{\epsilon}\), \(d_{t}\), \(X_{it}\), and \(y_{i0}\) from the reference group, where \(\gamma\), \(\beta\), \(\sigma^{2}_{u}\), \(\sigma^{2}_{\epsilon}\) are sampled from all \(i\) and \(t\); \(d_{t}\) and \(X_{it}\) are sampled for all \(i\) and at each \(t\); and \(y_{i0}\) is only sampled at \(t=1\) for all \(i\).

  3. For each \(\gamma\), \(\beta\), \(d_{t}\), \(\sigma^{2}_{u}\), \(\sigma^{2}_{\epsilon}\), \(d_{t}\), \(X_{it}\), and \(y_{i0}\) and for each replication \(j = 1, \ldots, 100\)

    1. The comparison group baseline is the income paths without using any values from the reference group

    2. Compute the new income path using \[\Delta y^{*}_{it} = \gamma \Delta y^{*}_{it-1} + \beta \Delta X_{it} + u_{it} + \Delta \epsilon_{it} \text{if} t>2\] where the respective reference value is substituted into the equation for \(\Delta y^{*}_{it}\) and all other values remain constant from the comparison group. Calculate the adjusted income path in levels: substitute \(y_{it} = y_{it-1} + \Delta y_{it}\) for \(t>2\).

    3. Generate the recovery times as previously defined in the main paper

  4. Average the proportion recovered and recovery times across households and replications

  5. Compare these proportion recovered and recovery times from each factor to the comparison group baseline to see how much faster or slower recovery times and proportions are for each demographic subgroup using the other group’s characteristics.

9 Figures

10 Tables

Table 11: Balance of Main Data and the Estimated Sample by Covariate Statistics and Standardized Mean Differences
Full Cleaned Sample
Estimation Sample
Difference
Type Mean SD Mean SD Mean
Household Log Income Contin. 9.57 1.09 9.82 0.92 0.24
Income: Proportion Missing Binary 0.02 0.14 0.01 0.09 -0.1
Household Size Contin. 2.94 1.59 3.1 1.6 0.1
Household Head Age Contin. 41.36 8.8 43.62 7.04 0.28
Gender Binary 0.48 0.5 0.51 0.5 0.04
Marital Status Binary 0.47 0.5 0.37 0.48 -0.19
Education Binary 0.54 0.5 0.55 0.5 0.02
Race Binary 0.15 0.36 0.12 0.33 -0.09
Race: Proportion Missing Binary 0.08 0.27 0.07 0.25 -0.03
Metro Binary 0.84 0.37 0.83 0.38 -0.02
Metro: Proportion Missing Binary 0.04 0.2 0.03 0.17 -0.05
Effective Sample Size N 201559 33619 167939
Table 12: Comparison and Reference Group used for each demographic variable
Comparison Group Reference Group
Race Black White
Education College High School
Marital Status Single Married
Metro Non-metro Metro

(a) Total Real Household Income

(b) Age of Household Head

(c) Household Size

(d) Gender

(e) Race

(f) Marital Status

(g) Metro

(h) Education

Figure 22: Balance of Main Data and the Estimated Sample by Covariate Distribution

6.4.2 Moment Conditions using Estimated GMM Residuals

\[\begin{align*} E[\Delta \tau_{it} \Delta \tau_{it-2}] &= E[(u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it}) (u_{it-2} + \Delta v_{it-2} - \gamma \Delta v_{it-3} + \Delta \epsilon_{it-2})] \\ &= E[u_{it}u_{it-2} + u_{it-2}\Delta v_{it} - u_{it-2}\gamma \Delta v_{it-1} + u_{it-2} \Delta \epsilon_{it} \\ &+ u_{it}\Delta v_{it-2} + \Delta v_{it}\Delta v_{it-2} - \gamma \Delta v_{it-1}\Delta v_{it-2} + \Delta v_{it-2}\Delta \epsilon_{it} \\ &- \gamma u_{it}\Delta v_{it-3} - \gamma \Delta v_{it-3} \Delta v_{it} + \gamma^{2} \Delta v_{it-3} \Delta v_{it-1} - \gamma \Delta v_{it-3}\Delta \epsilon_{it} \\ &+ u_{it} \Delta \epsilon_{it-2}+ \Delta v_{it} \Delta \epsilon_{it-2} - \gamma \Delta v_{it-1} \Delta \epsilon_{it-2} + \Delta \epsilon_{it} \Delta \epsilon_{it-2} ] \\ &= E[- \gamma \Delta v_{it-1}\Delta v_{it-2}] = \gamma^{2}_{v} \\ \end{align*}\] \[\begin{align*} E[\Delta \tau_{it} \Delta \tau_{it-1}] &= E[(u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it}) (u_{it-1} + \Delta v_{it-1} - \gamma \Delta v_{it-2} + \Delta \epsilon_{it-1})] \\ &= E[u_{it}u_{it-1} + u_{it-1}\Delta v_{it} - \gamma u_{it-1}\Delta v_{it-1} + u_{it-1}\Delta \epsilon_{it} + u_{it} \Delta v_{it-1} \\ &+ \Delta v_{it} \Delta v_{it-1} - \gamma \Delta v_{it-1} \Delta v_{it-1} + \Delta v_{it-1} \Delta \epsilon_{it} \\ &- \gamma u_{it} \Delta v_{it-2} - \gamma \Delta v_{it-2} \Delta v_{it} + \gamma^{2} \Delta v_{it-2} \Delta v_{it-1} - \gamma \Delta v_{it-2} \Delta \epsilon_{it} \\ &+ u_{it} \Delta \epsilon_{it-1} + \Delta v_{it} \Delta \epsilon_{it-1} - \gamma \Delta v_{it-1} \Delta \epsilon_{it-1} + \Delta \epsilon_{it} \Delta \epsilon_{it-1}] \\ & = E[ -\sigma^{2}_{v}- 2\gamma \sigma^{2}_{v} - \gamma^{2} \sigma^{2}_{v} -\sigma^{2}_\epsilon]\\ &= -\sigma^{2}_{v}- 2\gamma \sigma^{2}_{v} - \gamma^{2} \sigma^{2}_{v} -\sigma^{2}_\epsilon = -\sigma^{2}_{v}(1+ 2\gamma + \gamma^{2}) -\sigma^{2}_\epsilon \\ \end{align*}\] \[\begin{align*} E[(\Delta \tau_{it})^{2}] &= E[( u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it}2)^{2}] \\ &= E[( u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it})( u_{it} + \Delta v_{it} - \gamma \Delta v_{it-1} + \Delta \epsilon_{it})] \\ &= E[ u_{it} u_{it} + u_{it}\Delta v_{it} - \gamma u_{it} \Delta v_{it-1} + u_{it} \Delta \epsilon_{it} \\ &+ u_{it} \Delta v_{it} + \Delta v_{it} \Delta v_{it} - \gamma \Delta v_{it} \Delta v_{it-1} + \Delta v_{it} \Delta \epsilon_{it} \\ &- u_{it} \gamma \Delta v_{it-1} - \gamma \Delta v_{it-1}\Delta v_{it} + \gamma \Delta v_{it-1}\gamma \Delta v_{it-1} - \gamma \Delta v_{it-1}\Delta \epsilon_{it} \\ &+ u_{it} \Delta \epsilon_{it} + \Delta v_{it} \Delta \epsilon_{it} - \gamma \Delta v_{it-1} \Delta \epsilon_{it} + \Delta \epsilon_{it} \Delta \epsilon_{it}] \\ &= E[u_{it}u_{it-1}+ \Delta v_{it} \Delta v_{it} - \gamma \Delta v_{it} \Delta v_{it-1} - \gamma \Delta v_{it-1}\Delta v_{it}\\ &+ \gamma \Delta v_{it-1}\gamma \Delta v_{it-1} + \Delta \epsilon_{it} \Delta \epsilon_{it}] \\ &= \sigma^{2}_{u} + \sigma^{2}_{u} + 2\sigma^{2}_{v} - \gamma (-\sigma^{2}_{v}) - \gamma (-\sigma^{2}_{v}) + \gamma^{2} (2\sigma^{2}_{v}) +2\sigma^{2}_{\epsilon} \\ &= \sigma^{2}_{u}+ 2\sigma^{2}_{v} + \gamma \sigma^{2}_{v} + \gamma \sigma^{2}_{v} + 2\gamma^{2}\sigma^{2}_{v} +2\sigma^{2}_{\epsilon} = \sigma^{2}_{u}+ 2\sigma^{2}_{v}(1 + \gamma + \gamma^{2}) +2\sigma^{2}_{\epsilon} \end{align*}\]

6.4.3 Moment Conditions using Errors in Linear Projections and Estimated GMM Residuals

\[\begin{align*} Cov(\psi_{1},\Delta \tau_{3}) &= Cov(\Delta v_{it2} + \zeta_{i1}, \Delta v_{i3} - \gamma \Delta v_{i2} + \Delta \epsilon_{i3} + u_{i3}) \\ &= Cov(\Delta v_{i2},\Delta v_{i3}) - \gamma Cov(\Delta v_{i2},\Delta v_{i2}) + Cov( \Delta v_{i2}, \Delta \epsilon_{i3}) +Cov(\Delta v_{i2},u_{i3})\\ &+ Cov(\zeta_{i1}, \Delta v_{i3}) - \gamma Cov(\zeta_{i1}, \Delta v_{i2}) + Cov(\zeta_{i1},\Delta \epsilon_{i3}) + Cov(\zeta_{i1},u_{i3}) \\ &= -\sigma^{2}_{v} - 2\gamma \sigma^{2}_{v} + E[\zeta_{i1}\Delta \epsilon_{i3}] \\ &= E[\zeta_{i1}\Delta \epsilon_{i3}] -\sigma^{2}_{v} (1 + 2\gamma)\\ Cov(\psi_{2},\Delta \tau_{3})\\ &= Cov(e_{i} + v_{i2} + \zeta_{i2}, \Delta v_{i3} - \gamma v_{i2} + \Delta \epsilon_{i3} + u_{it}) \\ &= Cov(e_{i}, \Delta v_{i3}) - \gamma Cov(e_{i}, v_{i2}) + Cov(e_{i},\Delta \epsilon_{i3}) + Cov(e_{i},u_{it}) + Cov( v_{i2}, \Delta v_{i3}) \\ &- \gamma Cov( v_{i2},v_{i2}) + Cov( v_{i2},\Delta \epsilon_{i3}) + Cov( v_{i2},u_{it}) \\ &+ Cov(\zeta_{i2}, \Delta v_{i3}) - \gamma Cov(\zeta_{i2},v_{i2}) + Cov(\zeta_{i2},\Delta \epsilon_{i3}) + Cov(\zeta_{i2},u_{it}) \\ &= -\sigma^{2}_{v}(1 + \gamma ) + Cov(\zeta_{i2},\Delta \epsilon_{i3}) \end{align*}\]

Footnotes

  1. Cantor, Brandt, & Green (1991) provides examples of how SIPP participants attempt to reconstruct monthly income.↩︎

  2. Jappelli & Pistaferri (2010) state that “agricultural sectors are almost exclusively family businesses with few employees,” so they may not be representative or they may follow different income dynamics.↩︎

  3. Moffitt & Zhang (2022) exclude self-employment earnings when estimating income volatility.↩︎

  4. Table 11 shows a comparison of the balance of covariates between the main data and the estimation sample, which is created by restricting to a balanced panel. This table compares the mean and standard deviation of each covariate between the main sample and the estimated data. It also shows the standardized mean difference, which is a common approach to assessing balance among the covariates, suggested by Ho, Imai, King, & Stuart (2007). In addition, Figure 22 provides a visual diagnostic of kernel density plots and proportions of how the covariate distributions differ (Austin, 2009; Ho et al., 2007). From Figure 22 (a), the total real household income distribution has a similar shape in both samples with the mean differing by 0.24. Figure 22 (b) shows that both distributions of age are increasing, but at different slopes, with a mean difference of 0.28. Household size is 1.6 persons larger in the restricted sample, also shown in Figure 22 (c). The samples are most balanced for Gender, Education and Metro, which have the smallest standardized mean differences of 0.04, 0.02, and -0.02 respectively. Of the binary variables, Race and Marital status show the largest standardized mean differences of \(-0.09\) and \(-0.19\) respectively, which are also seen in Figure 22 (e) and Figure 22 (f). The distribution is most balanced across the demographic binary variables and less across household log income and age. However, given the similar shape of the distribution of total real household income and that age does not enter into the function for income dynamics nor is it considered later as a demographic subgroup, I proceed with the description of the variables.↩︎

  5. The instrument matrix is available in the Appendix Section 6.1 with its associated proof that the lagged values of income are independent of the residuals.↩︎

  6. If \(w\) is a variable, then the transform is \(w_{i,t+1}^{\perp} \equiv c_{it} \left( w_{it} - \frac{1}{T_{it}} \sum_{s >t} w_{is} \right)\). The sum is over all future available observations. \(T_{it}\) is the number of such observations, \(c_{it}\) is \(\sqrt{T_{it}/(T_{it}+1)}\).↩︎

  7. \(\Sigma\) is defined in the appendix. The errors required in the simulated equations are: \(\zeta_{1}, \zeta_{2},\) and \(\Delta \epsilon_{t} + u_{t}\). The variances and covariances needed for simulation are: \(\sigma^{2}_{1}, \sigma^{2}_{2}, \sigma_{12}, E[\zeta_{1}\Delta \epsilon_{3}], E[\zeta_{2}\Delta \epsilon_{3}], \sigma^{2}_{u} + 2 \sigma^{2}_{\epsilon}, \text{ and } \sigma^{2}_{\epsilon}.\)↩︎

  8. The instrumental variables are income in \(t=1\) for (\(t=4\)); income in \(t=1,2\) for \(t=5\); income in \(t=1,2,3\) for (\(t=6\)); income in \(t=1,2,3,4\) for (\(t=7\)); income in \(t=1,2,3,4,5\) for (\(t=8\)); income in \(t=1,2,3,4,5,6\) for (\(t=9\)); income in \(t=1,2,3,4,5,6,7\) for (\(t=10\)); income in \(t=1,2,3,4,5,6,7,8\) for (\(t=11\)), income in \(t=2,3,4,5,6,7,8,9\) for (\(t=12\)); income in \(t=3,4,5,6,7,8,9,10\) for (\(t=13\)); income in \(t=4,5,6,7,8,9,10,11\) for (\(t=14\)); income in \(t=5,6,7,8,9,10,11,12\) for (\(t=15\)); and income in \(t=6,7,8,9,10,11,12,13\) for (\(t=16\)).↩︎

  9. See Table \(\ref{proj-level}\) in the Appendix Section 10 for coefficient estimates of the projection error.↩︎

  10. Chesher & Schluter (2002) examine the sensitivity of welfare measures, including the Gini coefficient to measurement error.↩︎