This problem set deals with the savings response to permanent and transitory income shocks discussed in Christina Paxson’s paper “Using Weather Variability to Estimate the Response of Savings to Transitory Income in Thailand (1992)”.
library(tidyverse) # provides functions like 'as.tibble', 'mutate' and 'spread'
library(car) # needed for 'linearHypothesis'
library(plotly) # a plotting library
library(broom) # needed to build ribbons for loess regression plot
Load Paxson’s data and show list of variables.
| variable name | variable label |
|---|---|
| \(year\) | year of measurement |
| \(region\) | region of measurement |
| \(inc\) | income |
| \(save2\) | income minus expenditure on nondurable goods |
| \(m12\) | males aged 6-11 |
| \(m18\) | males aged 12-17 |
| \(m18elem\), \(m18sec\), \(m18post\) | males aged 18-65 + education level |
| \(m65\) | females aged 65+ |
| \(f12\) | females aged 6-11 |
| \(f18\) | females aged 12-17 |
| \(f18elem\), \(f18sec\), \(f18post\) | females aged 18-65 + education level |
| \(f65\) | females aged 65+ |
| \(p0to5\), \(p6to11\), \(p12to17\), \(p18to64\), \(p65\) | people in various age categories |
| \(of0\) | farm renter |
| \(of1\), \(of2\), \(of3\), \(of4\), \(of5\) | dummy variables for land ownership |
| \(dev1\), \(dev2\), \(dev3\), \(dev4\) | deviation of rainfall from mean. four seasons |
| \(dvsq1\), \(dvsq2\), \(dvsq3\), \(dvsq4\) | deviation of rainfall from mean, squared. four seasons |
| \(sd1\), \(sd2\), \(sd3\), \(sd4\) | standard deviation of rainfall deviation. four seasons |
d <- as.tibble(read.csv('./paxson.csv'))
colnames(d) # show list of variables
## [1] "hid" "save3" "year" "region" "inc" "save1" "save2"
## [8] "m12" "m18" "m65" "m18elem" "m18sec" "m18pos" "f12"
## [15] "f18" "f65" "f18elem" "f18sec" "f18pos" "of1" "of2"
## [22] "of3" "of4" "of5" "p0to5" "p6to11" "p12to17" "p18to64"
## [29] "p65" "of0" "month" "cpi" "dev1" "dev2" "dev3"
## [36] "dev4" "sd1" "sd2" "sd3" "sd4" "dvsq1" "dvsq2"
## [43] "dvsq3" "dvsq4"
Creates dummy variables of type \(year.X\), in this case \(year.76\), \(year.81\) and \(year.86\) and remove \(year\).
Create list of names of \(year.X\) dummy variables, store in \(year.name\).
Creates dummy variables of type \(region.X\) (e.g. \(region.34\)) and removes \(region\).
Create list of names of \(region.X\) dummy variables, store in \(region.name\)
year.name <- paste('year', unique(d$year), sep='.')
# a list of the soon to be created 'year.X' dummy variables
d <- d %>%
mutate(new = 1) %>% # the 'new' 'dummy' variable will disappear with the 'spread' command
spread(year, new, fill = 0, sep = '.')
# a variable will be created for each year, and the '1' from the 'new' variable will
# be placed in the variable if the original 'year' variable contained the year
#
# for example, if a row contains '81' in the 'year' variable, then
# the new variable 'year.81' will be created, and set to '1' in that row
# other created variables such as 'year.86' will contain '0' for that row
#
# after year.X dummy variables are created and filled
# then the original 'year' and 'new' variable will be removed
region.name <- paste('region', unique(d$region), sep='.')
# a list of soon to be created 'region.X' dummy variables
d <- d %>%
mutate(new = 1) %>%
spread(region, new, fill = 0, sep = '.')
colnames(d) # show the updated list of variable names
## [1] "hid" "save3" "inc" "save1" "save2"
## [6] "m12" "m18" "m65" "m18elem" "m18sec"
## [11] "m18pos" "f12" "f18" "f65" "f18elem"
## [16] "f18sec" "f18pos" "of1" "of2" "of3"
## [21] "of4" "of5" "p0to5" "p6to11" "p12to17"
## [26] "p18to64" "p65" "of0" "month" "cpi"
## [31] "dev1" "dev2" "dev3" "dev4" "sd1"
## [36] "sd2" "sd3" "sd4" "dvsq1" "dvsq2"
## [41] "dvsq3" "dvsq4" "year.76" "year.81" "year.86"
## [46] "region.2" "region.5" "region.6" "region.8" "region.9"
## [51] "region.10" "region.11" "region.13" "region.14" "region.15"
## [56] "region.17" "region.19" "region.20" "region.24" "region.26"
## [61] "region.27" "region.28" "region.29" "region.31" "region.32"
## [66] "region.34"
Income will be regressed predictors of ‘permanent’ and ‘transient’ income.
co-efficients (betas) will be stored in \(ds\).
variable of form \(ofX\) are categorical dummy variables, the ‘omitted’ categorical variable is land ownership > 40 rai.
Region and year fixed effects are also categorical dummy variables, and will also require an ‘omitted’ variable to avoid collinearity.
first.regression <- 'inc ~ dev1+dev2+dev3+dev4+
dvsq1+dvsq2+dvsq3+dvsq4+
p0to5+m12+f12+m18+f18+m18elem+m18sec+m18pos+f18elem+f18sec+f18pos+m65+f65+
of0+of1+of2+of3+of4+of5+'
first.regression <- paste(first.regression,
paste(year.name[-1],collapse='+'),
'+',
paste(region.name[-1],collapse='+'))
# add to the regression the 'year.X' dummy variables and 'region.X' dummy variables
#
# in both 'year.X' and 'region.x' the first item in the list is 'omitted' with '[-1]'
ds <- summary(lm(as.formula(first.regression),d))
ds
##
## Call:
## lm(formula = as.formula(first.regression), data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3912.3 -625.3 -138.6 392.4 20135.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.692e+03 1.151e+02 23.382 < 2e-16 ***
## dev1 1.909e+00 7.565e-01 2.524 0.011642 *
## dev2 1.250e+00 2.252e-01 5.552 2.97e-08 ***
## dev3 2.282e-01 2.274e-01 1.004 0.315598
## dev4 1.610e+00 6.269e-01 2.568 0.010272 *
## dvsq1 -4.499e-02 1.127e-02 -3.993 6.63e-05 ***
## dvsq2 8.775e-04 1.322e-03 0.664 0.506826
## dvsq3 4.446e-04 7.157e-04 0.621 0.534433
## dvsq4 -9.472e-03 3.323e-03 -2.851 0.004382 **
## p0to5 3.769e+01 2.178e+01 1.731 0.083518 .
## m12 5.973e+01 2.606e+01 2.292 0.021929 *
## f12 7.955e+01 2.515e+01 3.162 0.001575 **
## m18 2.206e+02 2.718e+01 8.114 6.18e-16 ***
## f18 1.930e+02 2.724e+01 7.084 1.61e-12 ***
## m18elem 3.494e+02 2.659e+01 13.138 < 2e-16 ***
## m18sec 7.657e+02 9.344e+01 8.195 3.19e-16 ***
## m18pos 1.043e+03 1.356e+02 7.690 1.77e-14 ***
## f18elem 6.231e+01 3.850e+01 1.618 0.105641
## f18sec 3.456e+02 1.336e+02 2.588 0.009682 **
## f18pos 6.769e+02 2.040e+02 3.318 0.000912 ***
## m65 1.355e+02 6.817e+01 1.988 0.046881 *
## f65 1.597e+02 6.129e+01 2.605 0.009211 **
## of0 -1.339e+03 7.071e+01 -18.933 < 2e-16 ***
## of1 -1.700e+03 3.110e+02 -5.465 4.87e-08 ***
## of2 -1.769e+03 1.084e+02 -16.317 < 2e-16 ***
## of3 -1.583e+03 7.549e+01 -20.973 < 2e-16 ***
## of4 -1.368e+03 6.481e+01 -21.114 < 2e-16 ***
## of5 -1.008e+03 6.308e+01 -15.986 < 2e-16 ***
## year.81 3.017e+02 4.721e+01 6.391 1.81e-10 ***
## year.86 -4.023e+02 8.294e+01 -4.850 1.28e-06 ***
## region.32 -6.524e+02 9.500e+01 -6.867 7.38e-12 ***
## region.26 -8.330e+02 9.398e+01 -8.864 < 2e-16 ***
## region.28 8.232e+02 1.295e+02 6.357 2.24e-10 ***
## region.29 5.713e+02 1.281e+02 4.460 8.38e-06 ***
## region.14 -7.978e+02 1.287e+02 -6.198 6.20e-10 ***
## region.17 -4.043e+02 1.405e+02 -2.878 0.004024 **
## region.13 -3.362e+02 1.045e+02 -3.216 0.001309 **
## region.24 1.034e+02 1.120e+02 0.924 0.355626
## region.9 -7.152e+02 1.532e+02 -4.668 3.12e-06 ***
## region.6 2.989e+01 1.419e+02 0.211 0.833150
## region.31 -2.529e+02 1.134e+02 -2.230 0.025809 *
## region.11 -1.125e+03 1.314e+02 -8.561 < 2e-16 ***
## region.15 -3.039e+01 1.727e+02 -0.176 0.860303
## region.34 1.008e+03 2.104e+02 4.791 1.71e-06 ***
## region.5 -2.309e+02 1.557e+02 -1.484 0.137973
## region.10 -2.096e+02 1.867e+02 -1.123 0.261630
## region.27 -3.640e+02 1.305e+02 -2.789 0.005310 **
## region.20 8.351e+02 2.060e+02 4.054 5.11e-05 ***
## region.8 -6.910e+02 2.022e+02 -3.417 0.000637 ***
## region.2 -2.367e+02 1.411e+02 -1.678 0.093437 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1243 on 4805 degrees of freedom
## Multiple R-squared: 0.3426, Adjusted R-squared: 0.3359
## F-statistic: 51.11 on 49 and 4805 DF, p-value: < 2.2e-16
Stored in variable \(incperm\).
Estimated ‘permanent’ income based on
\(region.19\) and \(year.76\) are the ‘omitted variables’
# there is probably a better way to add in all those region fixed effects!
# multiply co-efficients (stored in 'ds') with variables stored in 'd'
d <- d %>%
mutate(incperm =
coef(ds)['p0to5','Estimate']*p0to5 +
coef(ds)['m12','Estimate']*m12 +
coef(ds)['f12','Estimate']*f12 +
coef(ds)['m18','Estimate']*m18 +
coef(ds)['f18','Estimate']*f18 +
coef(ds)['m65','Estimate']*m65 +
coef(ds)['f65','Estimate']*f65 +
coef(ds)['m18elem','Estimate']*m18elem +
coef(ds)['f18elem','Estimate']*f18elem +
coef(ds)['m18sec','Estimate']*m18sec +
coef(ds)['f18sec','Estimate']*f18sec +
coef(ds)['m18pos','Estimate']*m18pos +
coef(ds)['f18pos','Estimate']*f18pos +
coef(ds)['of0','Estimate']*of0 +
coef(ds)['of1','Estimate']*of1 +
coef(ds)['of2','Estimate']*of2 +
coef(ds)['of3','Estimate']*of3 +
coef(ds)['of4','Estimate']*of4 +
coef(ds)['of5','Estimate']*of5 +
coef(ds)['region.32','Estimate']*region.32 +
coef(ds)['region.26','Estimate']*region.26 +
coef(ds)['region.28','Estimate']*region.28 +
coef(ds)['region.29','Estimate']*region.29 +
coef(ds)['region.14','Estimate']*region.14 +
coef(ds)['region.17','Estimate']*region.17 +
coef(ds)['region.13','Estimate']*region.13 +
coef(ds)['region.24','Estimate']*region.24 +
coef(ds)['region.9','Estimate']*region.9 +
coef(ds)['region.6','Estimate']*region.6 +
coef(ds)['region.31','Estimate']*region.31 +
coef(ds)['region.11','Estimate']*region.11 +
coef(ds)['region.15','Estimate']*region.15 +
coef(ds)['region.34','Estimate']*region.34 +
coef(ds)['region.5','Estimate']*region.5 +
coef(ds)['region.10','Estimate']*region.10 +
coef(ds)['region.27','Estimate']*region.27 +
coef(ds)['region.20','Estimate']*region.20 +
coef(ds)['region.8','Estimate']*region.8 +
coef(ds)['region.2','Estimate']*region.2 +
coef(ds)['year.81','Estimate']*year.81 +
coef(ds)['year.86','Estimate']*year.86
)
print (paste("Standard deviation of estimated permanent income:", sd(d$incperm))) # standard deviation
## [1] "Standard deviation of estimated permanent income: 906.971430727161"
hist(d$incperm, breaks=30) # draw a histogram
Stored in variable \(inctrans\).
Components:
\(year.76\) is the omitted dummy category variable.
# multiple coefficents/betas stored in 'ds' with variables stored in 'd'
d <- d %>%
mutate(inctrans =
coef(ds)['dev1','Estimate']*dev1 +
coef(ds)['dev2','Estimate']*dev2 +
coef(ds)['dev3','Estimate']*dev3 +
coef(ds)['dev4','Estimate']*dev4 +
coef(ds)['dvsq1','Estimate']*dvsq1 +
coef(ds)['dvsq2','Estimate']*dvsq2 +
coef(ds)['dvsq3','Estimate']*dvsq3 +
coef(ds)['dvsq4','Estimate']*dvsq4 +
coef(ds)['year.81','Estimate']*year.81 +
coef(ds)['year.86','Estimate']*year.86
)
print(paste("Standard deviation of estimated transient income:",sd(d$inctrans)))
## [1] "Standard deviation of estimated transient income: 325.410827363855"
hist(d$inctrans, breaks=30)
Stored in variable \(incunexp\).
d <- d %>%
mutate(incunexp =
inc-incperm-inctrans)
print(paste("Standard deviation of unexplained income:",sd(d$incunexp)))
## [1] "Standard deviation of unexplained income: 1258.25912521469"
hist(d$incunexp, breaks=50)
Regression to estimate the effect of permanent and transitory income on savings, result stored in \(mod.d\).
\(save2\) is used as the measure of savings.
Regression of savings against:
\(year.76\) is the omitted dummy categorical variable.
second.regression <- 'save2 ~ incperm+inctrans+incunexp+
p0to5+p6to11+p12to17+p18to64+p65+
sd1+sd2+sd3+sd4+'
second.regression <- paste(second.regression,
paste(year.name[-1],collapse='+'))
# add year to regression
mod.d <- lm(as.formula(second.regression),d)
summary(mod.d)
##
## Call:
## lm(formula = as.formula(second.regression), data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -122338 -226 174 473 4095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.754e+03 4.415e+02 -3.973 7.20e-05 ***
## incperm 4.400e-01 4.854e-02 9.064 < 2e-16 ***
## inctrans 8.039e-01 1.633e-01 4.922 8.86e-07 ***
## incunexp 6.925e-01 2.331e-02 29.707 < 2e-16 ***
## p0to5 -5.285e+01 3.454e+01 -1.530 0.1260
## p6to11 7.832e+00 2.950e+01 0.266 0.7906
## p12to17 -4.973e+01 3.388e+01 -1.468 0.1422
## p18to64 -3.881e+01 3.725e+01 -1.042 0.2975
## p65 -1.228e+02 6.401e+01 -1.918 0.0552 .
## sd1 1.738e+00 2.958e+00 0.587 0.5570
## sd2 -3.075e+00 1.588e+00 -1.937 0.0528 .
## sd3 4.007e+00 2.177e+00 1.841 0.0657 .
## sd4 3.473e+00 2.037e+00 1.705 0.0883 .
## year.81 -6.984e+01 8.273e+01 -0.844 0.3986
## year.86 -2.881e+02 1.374e+02 -2.097 0.0361 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2008 on 4840 degrees of freedom
## Multiple R-squared: 0.1868, Adjusted R-squared: 0.1844
## F-statistic: 79.39 on 14 and 4840 DF, p-value: < 2.2e-16
Localized Loess regression and 95% standard error ribbon.
m <- augment(loess(save2 ~ incperm,
data = d))
d %>%
plot_ly(x=~incperm, y=~save2, opacity=0.25,
type='scatter', mode='markers') %>%
add_lines(y = ~fitted(loess(save2 ~ incperm)),
line = list(color = 'rgba(255,0,0,1)'),
showlegend = FALSE) %>%
add_ribbons(data = m,
x = ~incperm,
ymin = ~.fitted - 1.96 * .se.fit,
ymax = ~.fitted + 1.96 * .se.fit,
line = list(color = 'rgba(7,240,120,0.05)'),
fillcolor = 'rgba(7,240,120,0.2)',
showlegend = FALSE) %>%
layout(title = "Savings (including durables) vs 'Permanent' Income
Response of Savings to Transitory Income in Thailand
Paxson 2014",
xaxis = list(title = 'Permanent income',
range = c(-3000,3000)),
yaxis = list(title = 'Savings',
range = c(-5000,7500)))
Localized Loess regression and 95% standard error ribbon.
m <- augment(loess(save2 ~ inctrans,
data = d))
d %>%
plot_ly(x=~inctrans, y=~save2, opacity=0.25,
type='scatter', mode='markers') %>%
add_lines(y = ~fitted(loess(save2 ~ inctrans)),
line = list(color = 'rgba(255,0,0,1)'),
showlegend = FALSE) %>%
add_ribbons(data = m,
x = ~inctrans,
ymin = ~.fitted - 1.96 * .se.fit,
ymax = ~.fitted + 1.96 * .se.fit,
line = list(color = 'rgba(7,240,120,0.05)'),
fillcolor = 'rgba(7,240,120,0.2)',
showlegend = FALSE) %>%
layout(title = "Savings (including durables) vs 'Transient' Income
Response of Savings to Transitory Income in Thailand
Paxson 2014",
xaxis = list(title = 'Transient income',
range = c(-3000,3000)),
yaxis = list(title = 'Savings',
range = c(-5000,7500)))
Is the marginal propoensity to save out of each dollar of transitory income \(inctrans\) equal to the marginal propensity to save out of each dollar of permanent income \(incperm\)?
Can the null hypothesis be rejected?
linearHypothesis(mod.d, 'inctrans-incperm = 0')
## Linear hypothesis test
##
## Hypothesis:
## - incperm + inctrans = 0
##
## Model 1: restricted model
## Model 2: save2 ~ incperm + inctrans + incunexp + p0to5 + p6to11 + p12to17 +
## p18to64 + p65 + sd1 + sd2 + sd3 + sd4 + year.81 + year.86
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 4841 1.9538e+10
## 2 4840 1.9519e+10 1 19460285 4.8254 0.02809 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1