Data Economics Development & Policy MIT 740x

Homework Week 8, Savings

This problem set deals with the savings response to permanent and transitory income shocks discussed in Christina Paxson’s paper “Using Weather Variability to Estimate the Response of Savings to Transitory Income in Thailand (1992)”.

Load the needed libraries

library(tidyverse) # provides functions like 'as.tibble', 'mutate' and 'spread'
library(car) # needed for 'linearHypothesis'
library(plotly) # a plotting library
library(broom) # needed to build ribbons for loess regression plot

Load required data file

Load Paxson’s data and show list of variables.

variable name	variable label
\(year\)	year of measurement
\(region\)	region of measurement
\(inc\)	income
\(save2\)	income minus expenditure on nondurable goods
\(m12\)	males aged 6-11
\(m18\)	males aged 12-17
\(m18elem\), \(m18sec\), \(m18post\)	males aged 18-65 + education level
\(m65\)	females aged 65+
\(f12\)	females aged 6-11
\(f18\)	females aged 12-17
\(f18elem\), \(f18sec\), \(f18post\)	females aged 18-65 + education level
\(f65\)	females aged 65+
\(p0to5\), \(p6to11\), \(p12to17\), \(p18to64\), \(p65\)	people in various age categories
\(of0\)	farm renter
\(of1\), \(of2\), \(of3\), \(of4\), \(of5\)	dummy variables for land ownership
\(dev1\), \(dev2\), \(dev3\), \(dev4\)	deviation of rainfall from mean. four seasons
\(dvsq1\), \(dvsq2\), \(dvsq3\), \(dvsq4\)	deviation of rainfall from mean, squared. four seasons
\(sd1\), \(sd2\), \(sd3\), \(sd4\)	standard deviation of rainfall deviation. four seasons

d <- as.tibble(read.csv('./paxson.csv'))

colnames(d) # show list of variables

##  [1] "hid"     "save3"   "year"    "region"  "inc"     "save1"   "save2"  
##  [8] "m12"     "m18"     "m65"     "m18elem" "m18sec"  "m18pos"  "f12"    
## [15] "f18"     "f65"     "f18elem" "f18sec"  "f18pos"  "of1"     "of2"    
## [22] "of3"     "of4"     "of5"     "p0to5"   "p6to11"  "p12to17" "p18to64"
## [29] "p65"     "of0"     "month"   "cpi"     "dev1"    "dev2"    "dev3"   
## [36] "dev4"    "sd1"     "sd2"     "sd3"     "sd4"     "dvsq1"   "dvsq2"  
## [43] "dvsq3"   "dvsq4"

Create dummy variables from \(year\) and \(region\)

Creates dummy variables of type \(year.X\), in this case \(year.76\), \(year.81\) and \(year.86\) and remove \(year\).

Create list of names of \(year.X\) dummy variables, store in \(year.name\).

Creates dummy variables of type \(region.X\) (e.g. \(region.34\)) and removes \(region\).

Create list of names of \(region.X\) dummy variables, store in \(region.name\)

year.name <- paste('year', unique(d$year), sep='.')
# a list of the soon to be created 'year.X' dummy variables
d <- d %>%
  mutate(new = 1) %>% # the 'new' 'dummy' variable will disappear with the 'spread' command
  spread(year, new, fill = 0, sep = '.')
# a variable will be created for each year, and the '1' from the 'new' variable will
# be placed in the variable if the original 'year' variable contained the year
#
# for example, if a row contains '81' in the 'year' variable, then
# the new variable 'year.81' will be created, and set to '1' in that row
# other created variables such as 'year.86' will contain '0' for that row
#
# after year.X dummy variables are created and filled
# then the original 'year' and 'new' variable will be removed

region.name <- paste('region', unique(d$region), sep='.')
# a list of soon to be created 'region.X' dummy variables
d <- d %>%
  mutate(new = 1) %>%
  spread(region, new, fill = 0, sep = '.')

colnames(d) # show the updated list of variable names

##  [1] "hid"       "save3"     "inc"       "save1"     "save2"    
##  [6] "m12"       "m18"       "m65"       "m18elem"   "m18sec"   
## [11] "m18pos"    "f12"       "f18"       "f65"       "f18elem"  
## [16] "f18sec"    "f18pos"    "of1"       "of2"       "of3"      
## [21] "of4"       "of5"       "p0to5"     "p6to11"    "p12to17"  
## [26] "p18to64"   "p65"       "of0"       "month"     "cpi"      
## [31] "dev1"      "dev2"      "dev3"      "dev4"      "sd1"      
## [36] "sd2"       "sd3"       "sd4"       "dvsq1"     "dvsq2"    
## [41] "dvsq3"     "dvsq4"     "year.76"   "year.81"   "year.86"  
## [46] "region.2"  "region.5"  "region.6"  "region.8"  "region.9" 
## [51] "region.10" "region.11" "region.13" "region.14" "region.15"
## [56] "region.17" "region.19" "region.20" "region.24" "region.26"
## [61] "region.27" "region.28" "region.29" "region.31" "region.32"
## [66] "region.34"

Income against predictors of income regression

Income will be regressed predictors of ‘permanent’ and ‘transient’ income.

co-efficients (betas) will be stored in \(ds\).

Predictors of ‘transient’ income

Seasonal rainfall, deviation from mean \(dev1\), \(dev2\), \(dev3\), \(dev4\)
Square of seasonal rainfall deviation from mean \(dvsq1\), \(dvsq2\), \(dvsq3\), \(dvsq4\)
Region fixed effects, of form \(region.X\)
Year fixed effects, of form \(year.X\)

Predictors of ‘permanent’ income

Demographic/education variables of form \(pX\), \(mX\) and \(fX\)
Land rent/ownership dummy variables of form \(ofX\)
Year fixed effects, of form \(year.X\)

variable of form \(ofX\) are categorical dummy variables, the ‘omitted’ categorical variable is land ownership > 40 rai.

Region and year fixed effects are also categorical dummy variables, and will also require an ‘omitted’ variable to avoid collinearity.

first.regression <- 'inc ~ dev1+dev2+dev3+dev4+
dvsq1+dvsq2+dvsq3+dvsq4+
p0to5+m12+f12+m18+f18+m18elem+m18sec+m18pos+f18elem+f18sec+f18pos+m65+f65+
of0+of1+of2+of3+of4+of5+'

first.regression <- paste(first.regression,
                          paste(year.name[-1],collapse='+'),
                          '+',
                          paste(region.name[-1],collapse='+'))
# add to the regression the 'year.X' dummy variables and 'region.X' dummy variables
#
# in both 'year.X' and 'region.x' the first item in the list is 'omitted' with '[-1]'

ds <- summary(lm(as.formula(first.regression),d))

ds

## 
## Call:
## lm(formula = as.formula(first.regression), data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3912.3  -625.3  -138.6   392.4 20135.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.692e+03  1.151e+02  23.382  < 2e-16 ***
## dev1         1.909e+00  7.565e-01   2.524 0.011642 *  
## dev2         1.250e+00  2.252e-01   5.552 2.97e-08 ***
## dev3         2.282e-01  2.274e-01   1.004 0.315598    
## dev4         1.610e+00  6.269e-01   2.568 0.010272 *  
## dvsq1       -4.499e-02  1.127e-02  -3.993 6.63e-05 ***
## dvsq2        8.775e-04  1.322e-03   0.664 0.506826    
## dvsq3        4.446e-04  7.157e-04   0.621 0.534433    
## dvsq4       -9.472e-03  3.323e-03  -2.851 0.004382 ** 
## p0to5        3.769e+01  2.178e+01   1.731 0.083518 .  
## m12          5.973e+01  2.606e+01   2.292 0.021929 *  
## f12          7.955e+01  2.515e+01   3.162 0.001575 ** 
## m18          2.206e+02  2.718e+01   8.114 6.18e-16 ***
## f18          1.930e+02  2.724e+01   7.084 1.61e-12 ***
## m18elem      3.494e+02  2.659e+01  13.138  < 2e-16 ***
## m18sec       7.657e+02  9.344e+01   8.195 3.19e-16 ***
## m18pos       1.043e+03  1.356e+02   7.690 1.77e-14 ***
## f18elem      6.231e+01  3.850e+01   1.618 0.105641    
## f18sec       3.456e+02  1.336e+02   2.588 0.009682 ** 
## f18pos       6.769e+02  2.040e+02   3.318 0.000912 ***
## m65          1.355e+02  6.817e+01   1.988 0.046881 *  
## f65          1.597e+02  6.129e+01   2.605 0.009211 ** 
## of0         -1.339e+03  7.071e+01 -18.933  < 2e-16 ***
## of1         -1.700e+03  3.110e+02  -5.465 4.87e-08 ***
## of2         -1.769e+03  1.084e+02 -16.317  < 2e-16 ***
## of3         -1.583e+03  7.549e+01 -20.973  < 2e-16 ***
## of4         -1.368e+03  6.481e+01 -21.114  < 2e-16 ***
## of5         -1.008e+03  6.308e+01 -15.986  < 2e-16 ***
## year.81      3.017e+02  4.721e+01   6.391 1.81e-10 ***
## year.86     -4.023e+02  8.294e+01  -4.850 1.28e-06 ***
## region.32   -6.524e+02  9.500e+01  -6.867 7.38e-12 ***
## region.26   -8.330e+02  9.398e+01  -8.864  < 2e-16 ***
## region.28    8.232e+02  1.295e+02   6.357 2.24e-10 ***
## region.29    5.713e+02  1.281e+02   4.460 8.38e-06 ***
## region.14   -7.978e+02  1.287e+02  -6.198 6.20e-10 ***
## region.17   -4.043e+02  1.405e+02  -2.878 0.004024 ** 
## region.13   -3.362e+02  1.045e+02  -3.216 0.001309 ** 
## region.24    1.034e+02  1.120e+02   0.924 0.355626    
## region.9    -7.152e+02  1.532e+02  -4.668 3.12e-06 ***
## region.6     2.989e+01  1.419e+02   0.211 0.833150    
## region.31   -2.529e+02  1.134e+02  -2.230 0.025809 *  
## region.11   -1.125e+03  1.314e+02  -8.561  < 2e-16 ***
## region.15   -3.039e+01  1.727e+02  -0.176 0.860303    
## region.34    1.008e+03  2.104e+02   4.791 1.71e-06 ***
## region.5    -2.309e+02  1.557e+02  -1.484 0.137973    
## region.10   -2.096e+02  1.867e+02  -1.123 0.261630    
## region.27   -3.640e+02  1.305e+02  -2.789 0.005310 ** 
## region.20    8.351e+02  2.060e+02   4.054 5.11e-05 ***
## region.8    -6.910e+02  2.022e+02  -3.417 0.000637 ***
## region.2    -2.367e+02  1.411e+02  -1.678 0.093437 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1243 on 4805 degrees of freedom
## Multiple R-squared:  0.3426, Adjusted R-squared:  0.3359 
## F-statistic: 51.11 on 49 and 4805 DF,  p-value: < 2.2e-16

Estimate ‘permanent’ income

Stored in variable \(incperm\).

Estimated ‘permanent’ income based on

demographics (including education sub-categories)
land holdings
region fixed effects
year fixed effects.

\(region.19\) and \(year.76\) are the ‘omitted variables’

# there is probably a better way to add in all those region fixed effects!

# multiply co-efficients (stored in 'ds') with variables stored in 'd'

d <- d %>%
  mutate(incperm =
           coef(ds)['p0to5','Estimate']*p0to5 +
           coef(ds)['m12','Estimate']*m12 +
           coef(ds)['f12','Estimate']*f12 +
           coef(ds)['m18','Estimate']*m18 +
           coef(ds)['f18','Estimate']*f18 +
           coef(ds)['m65','Estimate']*m65 +
           coef(ds)['f65','Estimate']*f65 +
           coef(ds)['m18elem','Estimate']*m18elem +
           coef(ds)['f18elem','Estimate']*f18elem +
           coef(ds)['m18sec','Estimate']*m18sec +
           coef(ds)['f18sec','Estimate']*f18sec +
           coef(ds)['m18pos','Estimate']*m18pos +
           coef(ds)['f18pos','Estimate']*f18pos +
           coef(ds)['of0','Estimate']*of0 +
           coef(ds)['of1','Estimate']*of1 +
           coef(ds)['of2','Estimate']*of2 +
           coef(ds)['of3','Estimate']*of3 +
           coef(ds)['of4','Estimate']*of4 +
           coef(ds)['of5','Estimate']*of5 +
           coef(ds)['region.32','Estimate']*region.32 +
           coef(ds)['region.26','Estimate']*region.26 +
           coef(ds)['region.28','Estimate']*region.28 +
           coef(ds)['region.29','Estimate']*region.29 +
           coef(ds)['region.14','Estimate']*region.14 +
           coef(ds)['region.17','Estimate']*region.17 +
           coef(ds)['region.13','Estimate']*region.13 +
           coef(ds)['region.24','Estimate']*region.24 +
           coef(ds)['region.9','Estimate']*region.9 +
           coef(ds)['region.6','Estimate']*region.6 +
           coef(ds)['region.31','Estimate']*region.31 +
           coef(ds)['region.11','Estimate']*region.11 +
           coef(ds)['region.15','Estimate']*region.15 +
           coef(ds)['region.34','Estimate']*region.34 +
           coef(ds)['region.5','Estimate']*region.5 +
           coef(ds)['region.10','Estimate']*region.10 +
           coef(ds)['region.27','Estimate']*region.27 +
           coef(ds)['region.20','Estimate']*region.20 +
           coef(ds)['region.8','Estimate']*region.8 +
           coef(ds)['region.2','Estimate']*region.2 +
           coef(ds)['year.81','Estimate']*year.81 +
           coef(ds)['year.86','Estimate']*year.86
           )

print (paste("Standard deviation of estimated permanent income:", sd(d$incperm))) # standard deviation

## [1] "Standard deviation of estimated permanent income: 906.971430727161"

hist(d$incperm, breaks=30) # draw a histogram

Transient income estimate

Stored in variable \(inctrans\).

Components:

Rainfall related
Year fixed effects

\(year.76\) is the omitted dummy category variable.

# multiple coefficents/betas stored in 'ds' with variables stored in 'd'

d <- d %>%
  mutate(inctrans =
           coef(ds)['dev1','Estimate']*dev1 +
           coef(ds)['dev2','Estimate']*dev2 +
           coef(ds)['dev3','Estimate']*dev3 +
           coef(ds)['dev4','Estimate']*dev4 +
           coef(ds)['dvsq1','Estimate']*dvsq1 +
           coef(ds)['dvsq2','Estimate']*dvsq2 +
           coef(ds)['dvsq3','Estimate']*dvsq3 +
           coef(ds)['dvsq4','Estimate']*dvsq4 +
           coef(ds)['year.81','Estimate']*year.81 +
           coef(ds)['year.86','Estimate']*year.86

           )

print(paste("Standard deviation of estimated transient income:",sd(d$inctrans)))

## [1] "Standard deviation of estimated transient income: 325.410827363855"

hist(d$inctrans, breaks=30)

Component Income not explained by ‘permanent’ or ‘transient’ components

Stored in variable \(incunexp\).

d <- d %>%
  mutate(incunexp =
           inc-incperm-inctrans)

print(paste("Standard deviation of unexplained income:",sd(d$incunexp)))

## [1] "Standard deviation of unexplained income: 1258.25912521469"

hist(d$incunexp, breaks=50)

Estimated effect of income on savings

Regression to estimate the effect of permanent and transitory income on savings, result stored in \(mod.d\).

\(save2\) is used as the measure of savings.

Regression of savings against:

Permanent income \(incperm\), transitory income \(inctrans\) and unexplained income \(incunexp\)
Demographics. Variables of form \(pXtoY\) (no education sub-categories)
Variability of rainfall. Variables of form \(sdX\)
Year. Variables of form \(year.X\)

\(year.76\) is the omitted dummy categorical variable.

second.regression <- 'save2 ~ incperm+inctrans+incunexp+
p0to5+p6to11+p12to17+p18to64+p65+
sd1+sd2+sd3+sd4+'

second.regression <- paste(second.regression,
                           paste(year.name[-1],collapse='+'))
# add year to regression

mod.d <- lm(as.formula(second.regression),d)

summary(mod.d)

## 
## Call:
## lm(formula = as.formula(second.regression), data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -122338    -226     174     473    4095 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.754e+03  4.415e+02  -3.973 7.20e-05 ***
## incperm      4.400e-01  4.854e-02   9.064  < 2e-16 ***
## inctrans     8.039e-01  1.633e-01   4.922 8.86e-07 ***
## incunexp     6.925e-01  2.331e-02  29.707  < 2e-16 ***
## p0to5       -5.285e+01  3.454e+01  -1.530   0.1260    
## p6to11       7.832e+00  2.950e+01   0.266   0.7906    
## p12to17     -4.973e+01  3.388e+01  -1.468   0.1422    
## p18to64     -3.881e+01  3.725e+01  -1.042   0.2975    
## p65         -1.228e+02  6.401e+01  -1.918   0.0552 .  
## sd1          1.738e+00  2.958e+00   0.587   0.5570    
## sd2         -3.075e+00  1.588e+00  -1.937   0.0528 .  
## sd3          4.007e+00  2.177e+00   1.841   0.0657 .  
## sd4          3.473e+00  2.037e+00   1.705   0.0883 .  
## year.81     -6.984e+01  8.273e+01  -0.844   0.3986    
## year.86     -2.881e+02  1.374e+02  -2.097   0.0361 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2008 on 4840 degrees of freedom
## Multiple R-squared:  0.1868, Adjusted R-squared:  0.1844 
## F-statistic: 79.39 on 14 and 4840 DF,  p-value: < 2.2e-16

Savings plotted against permanent income

Localized Loess regression and 95% standard error ribbon.

m <- augment(loess(save2 ~ incperm,
                   data = d))

d %>%
  plot_ly(x=~incperm, y=~save2, opacity=0.25, 
          type='scatter', mode='markers') %>%
  add_lines(y = ~fitted(loess(save2 ~ incperm)),
            line = list(color = 'rgba(255,0,0,1)'),
            showlegend = FALSE) %>%
  add_ribbons(data = m,
              x = ~incperm,
              ymin = ~.fitted - 1.96 * .se.fit,
              ymax = ~.fitted + 1.96 * .se.fit,
              line = list(color = 'rgba(7,240,120,0.05)'),
              fillcolor = 'rgba(7,240,120,0.2)',
              showlegend = FALSE) %>% 
  layout(title = "Savings (including durables) vs 'Permanent' Income
         Response of Savings to Transitory Income in Thailand
         Paxson 2014",
         xaxis = list(title = 'Permanent income',
                      range = c(-3000,3000)),
         yaxis = list(title = 'Savings',
                      range = c(-5000,7500)))

Savings plotted against transitory income