1 The hypothesis

Let \(Z\) indicates whether an indivudual hold CCP membership (equals 1 if hold, equals 0 other wise).
Let \(Y\) is the interested outcomes, which are household income or log household income.
Let \(Y^1\) is the potential outcomes for an individual if he/she holds a CCP membership (since the interested grouop is the one who are treated, this term would be the observed outcomes.), \(Y^0\) is the potential outcomes for the individual if he/she does not hold a CCP membership.
The causal statement is that the Average Treatment Effect for Treated (ATT) is positive. That is holding the CCP membership can increase household income. The hypothesis is as follow:
\[ATT=E(Y^1-Y^0|Z=1)>0\]

2 Pre check and data clean

2.1 Use the CGSS 2010 data.

setwd('/Users/Tina/Documents/IPM/psm')
load('data/cgss2010.rdata')
attach(newdata)

2.2 Check the summary statistics of the dataset

summary(newdata)
##     hincome          CCPmember           male        age        
##  Min.   :      0   Min.   :0.0000   Min.   :0   Min.   :  17.0  
##  1st Qu.:   3000   1st Qu.:0.0000   1st Qu.:0   1st Qu.:  36.0  
##  Median :  10000   Median :0.0000   Median :0   Median :  46.0  
##  Mean   :  19211   Mean   :0.1242   Mean   :0   Mean   :  47.8  
##  3rd Qu.:  20000   3rd Qu.:0.0000   3rd Qu.:0   3rd Qu.:  58.0  
##  Max.   :6000000   Max.   :1.0000   Max.   :0   Max.   :2013.0  
##  NA's   :1625      NA's   :16                                   
##       race             edu            height          weight     
##  Min.   :-3.000   Min.   :1.000   Min.   :110.0   Min.   : 70.0  
##  1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:158.0   1st Qu.:105.0  
##  Median : 1.000   Median :2.000   Median :164.0   Median :120.0  
##  Mean   : 1.456   Mean   :2.148   Mean   :163.9   Mean   :121.3  
##  3rd Qu.: 1.000   3rd Qu.:3.000   3rd Qu.:170.0   3rd Qu.:135.0  
##  Max.   : 8.000   Max.   :5.000   Max.   :193.0   Max.   :246.0  
##                   NA's   :15      NA's   :21      NA's   :26     
##      faEdu        faCCPmember        english         mandarin    
##  Min.   :1.000   Min.   :0.0000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:2.000  
##  Median :1.000   Median :0.0000   Median :1.000   Median :3.000  
##  Mean   :1.443   Mean   :0.1607   Mean   :1.389   Mean   :3.074  
##  3rd Qu.:2.000   3rd Qu.:0.0000   3rd Qu.:2.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :1.0000   Max.   :5.000   Max.   :5.000  
##  NA's   :489     NA's   :222      NA's   :20      NA's   :22

The summary statistics show that there is some NA value in this dataset. Also, the min. of the variable race is -3, which is strange and should be coded as missing values. Following, I check how many rows have missing value.

library("car")
## Loading required package: carData
race.recode <- recode(race,"-3 = NA") #recode -3 into NA
newdata$race <- race.recode
table(rowSums(is.na(newdata)))
## 
##    0    1    2    3    4    5    6   10 
## 9642 1884  207   41    4    1    1    3

The result shows that only 9,642 (in total 11,783 observations have no missing value). In order to conduct the following analysis, I delete all observations with any missing value.

data.nomissing <- newdata[complete.cases(newdata), ]

In the following analysis, I am going to use the new dataset (named as data.nomissing) to analysis.

detach(newdata)
attach(data.nomissing)

I also check for the types of each variables.

str(data.nomissing)
## 'data.frame':    9642 obs. of  12 variables:
##  $ hincome    : num  0 0 3000 20000 8500 4200 8500 18000 5000 0 ...
##  $ CCPmember  : num  0 0 1 0 0 0 0 1 0 0 ...
##  $ male       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ age        : num  39 62 58 47 41 37 29 76 46 39 ...
##  $ race       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ edu        : num  1 1 1 2 3 1 3 3 2 1 ...
##  $ height     : num  158 145 170 167 175 160 173 170 170 150 ...
##  $ weight     : num  140 100 112 150 130 110 145 105 155 85 ...
##  $ faEdu      : num  1 1 1 1 1 1 3 1 2 1 ...
##  $ faCCPmember: num  0 0 0 0 0 0 1 0 1 0 ...
##  $ english    : num  1 1 1 1 1 1 3 1 1 1 ...
##  $ mandarin   : num  3 2 2 2 2 2 5 1 2 2 ...

Some of the types are not correct. I convert these variables into proper types.

data.nomissing$male <- as.factor(male)
data.nomissing$race <- as.factor(race)
data.nomissing$edu <- as.ordered(edu)
data.nomissing$faEdu <- as.ordered(faEdu)
data.nomissing$faCCPmember <- as.factor(faCCPmember)
data.nomissing$english <- as.ordered(english)
data.nomissing$mandarin <- as.ordered(mandarin)

2.3 Treatment variable: CCPmember

I want to get the frequency table of CCPmember first.

count <- table(CCPmember)
perc <- prop.table(count)
CCPmemer.table <- cbind(count,perc)
CCPmemer.table
##   count      perc
## 0  8384 0.8695291
## 1  1258 0.1304709

The result indicates that the number of observations in the treatment group (aka. with CPC membership) is smaller than the number of observations in the control group.

2.4 Outcome variable: hincome

I want to know the distribution of the outcomes variable, household income.

summary(hincome)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3000   10000   19332   20000 6000000
par(mfrow=c(1,2))
hist(hincome, main = "Histogram of household income", xlab = "")
boxplot(hincome, main = "Boxplot of household income")

#check how many observation have income equal to 0.
noincome <- (hincome ==0)
count <- table(noincome)
perc <- prop.table(count)
noicome.table <- cbind(count,perc)
noicome.table
##       count      perc
## FALSE  8542 0.8859158
## TRUE   1100 0.1140842

The summary statistics as well as both the histogram and box figure show that the distribution of household income is extremely positively skewed, with an outlier has an income of 6,000,000 dollars. Also, a high percentage (11.4%) of observations have income equal to zero. Dealing with this issue is necessary before estimating the effect of CCP membership on household income.
Firstly, I try taking log. To deal with the 0s, I add 1 to each value.

lnhincome <- log(hincome+1)
data.nomissing <- cbind(data.nomissing,lnhincome)
par(mfrow=c(1,2))
hist(lnhincome, main = "Histogram of log household income", xlab = "")
boxplot(lnhincome, main = "Boxplot of log household income")

The histogram shows that the distribution of log income is much nearer to a normal distribution than the previous one. However, the values of 0 are still annoying. One explanation is that the income of these observations is not really equal to 0, but just another type of missing. This variable is income instead of earnings or salary, a household (rather than individual) with no any income does not make sense, at least they should have something like rent revenue, subsidy income, or pension income. Hence, I decided to exclude the observation with no income in the following analysis.

data.nomissing <- data.nomissing[hincome != 0, ]
detach(data.nomissing)
attach(data.nomissing)
## The following object is masked _by_ .GlobalEnv:
## 
##     lnhincome

3 Estimate propensity score

Estimate propensity score by using logit model:

#Only male in the dataset, so I do not include the variable male.
ps1 <- glm(CCPmember ~ age + race + edu + height + weight + english +
  mandarin + faEdu + faCCPmember, family = binomial, data = data.nomissing)
summary(ps1)
## 
## Call:
## glm(formula = CCPmember ~ age + race + edu + height + weight + 
##     english + mandarin + faEdu + faCCPmember, family = binomial, 
##     data = data.nomissing)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2200  -0.5060  -0.3148  -0.1830   3.3160  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -16.358147  35.766410  -0.457  0.64741    
## age            0.062913   0.002979  21.122  < 2e-16 ***
## race2          0.689317   0.508335   1.356  0.17509    
## race3         -0.376056   0.414123  -0.908  0.36384    
## race4          0.050520   0.268938   0.188  0.85099    
## race5          1.497205   0.476193   3.144  0.00167 ** 
## race6         -0.099728   0.408653  -0.244  0.80720    
## race7          0.258124   0.421621   0.612  0.54039    
## race8          0.340691   0.217983   1.563  0.11807    
## edu.L          3.781815   0.220059  17.185  < 2e-16 ***
## edu.Q          0.497228   0.170102   2.923  0.00347 ** 
## edu.C          0.046574   0.112626   0.414  0.67922    
## edu^4         -0.145172   0.075862  -1.914  0.05567 .  
## height         0.051731   0.005900   8.768  < 2e-16 ***
## weight         0.002804   0.001927   1.455  0.14576    
## english.L     -0.525491   0.360648  -1.457  0.14510    
## english.Q     -0.304533   0.296095  -1.028  0.30372    
## english.C      0.172648   0.236303   0.731  0.46501    
## english^4      0.319314   0.161395   1.978  0.04788 *  
## mandarin.L     0.297112   0.131854   2.253  0.02424 *  
## mandarin.Q    -0.329774   0.104231  -3.164  0.00156 ** 
## mandarin.C     0.046905   0.092468   0.507  0.61198    
## mandarin^4     0.075792   0.074111   1.023  0.30646    
## faEdu.L       -8.653729 113.066296  -0.077  0.93899    
## faEdu.Q       -6.886321  95.558459  -0.072  0.94255    
## faEdu.C       -4.087590  56.533230  -0.072  0.94236    
## faEdu^4       -1.772875  21.367831  -0.083  0.93388    
## faCCPmember1   0.351880   0.090010   3.909 9.25e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6998.2  on 8541  degrees of freedom
## Residual deviance: 5349.5  on 8514  degrees of freedom
## AIC: 5405.5
## 
## Number of Fisher Scoring iterations: 12

The following line combines the fitted value in the data frame.

pscore <- ps1$fitted.values
data.nomissing <- cbind(data.nomissing,pscore)

4 Estimate ATT

Estimate the Average Treatment Effect for Treated (ATT):

4.1 With 1 vs. 1 nearest neighborhood matching

library("Matching")
## Loading required package: MASS
## ## 
## ##  Matching (Version 4.9-6, Build Date: 2019-04-07)
## ##  See http://sekhon.berkeley.edu/matching for additional documentation.
## ##  Please cite software as:
## ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ##   Software with Automated Balance Optimization: The Matching package for R.''
## ##   Journal of Statistical Software, 42(7): 1-52. 
## ##
psm1 <- Match(Y = hincome, Tr = CCPmember, X = pscore, estimand = "ATT", 
                M = 1, replace = TRUE)
summary(psm1)
## 
## Estimate...  -204.34 
## AI SE......  6013.6 
## T-stat.....  -0.03398 
## p.val......  0.97289 
## 
## Original number of observations..............  8542 
## Original number of treated obs...............  1218 
## Matched number of observations...............  1218 
## Matched number of observations  (unweighted).  14791

The point estimate is negative (\(~\beta=-204.34\)) but not significantly different from zero (\(~p=0.97289\)).

4.2 With 1 vs. 5 nearest neighborhood matching

psm2 <- Match(Y = hincome, Tr = CCPmember, X = pscore, estimand = "ATT", 
                M = 5, replace = TRUE)
summary(psm2)
## 
## Estimate...  2808.9 
## AI SE......  4610 
## T-stat.....  0.60931 
## p.val......  0.54232 
## 
## Original number of observations..............  8542 
## Original number of treated obs...............  1218 
## Matched number of observations...............  1218 
## Matched number of observations  (unweighted).  17421

The point estimate is positive (\(~\beta=2808.9\)) but still not significantly different from zero (\(~p=0.54232\)).


5 Check balance and common support

5.1 Comparing the means

I check for the case of 1 vs. 5 nearest neighborhood matching.

library("MatchIt")
match1 <- matchit(CCPmember ~ age + race + edu + height + weight + english +
  mandarin + faEdu + faCCPmember, data = data.nomissing, 
  method = "nearest", ratio=5, replace = TRUE)
summary(match1)
## 
## Call:
## matchit(formula = CCPmember ~ age + race + edu + height + weight + 
##     english + mandarin + faEdu + faCCPmember, data = data.nomissing, 
##     method = "nearest", ratio = 5, replace = TRUE)
## 
## Summary of balance for all data:
##              Means Treated Means Control SD Control Mean Diff eQQ Med
## distance            0.3344        0.1107     0.1305    0.2237  0.2195
## age                52.0722       46.9177    14.5738    5.1546  5.0000
## race1               0.9245        0.9035     0.2953    0.0210  0.0000
## race2               0.0057        0.0031     0.0560    0.0026  0.0000
## race3               0.0074        0.0085     0.0916   -0.0011  0.0000
## race4               0.0164        0.0216     0.1453   -0.0052  0.0000
## race5               0.0074        0.0035     0.0595    0.0038  0.0000
## race6               0.0066        0.0112     0.1052   -0.0046  0.0000
## race7               0.0074        0.0089     0.0938   -0.0015  0.0000
## race8               0.0246        0.0397     0.1953   -0.0151  0.0000
## edu.L               0.0096       -0.3085     0.3193    0.3181  0.3162
## edu.Q              -0.2166       -0.0077     0.4390   -0.2089  0.0000
## edu.C              -0.1581        0.0215     0.4600   -0.1796  0.0000
## edu^4              -0.0762       -0.0312     0.4441   -0.0451  0.0000
## height            167.7767      163.9440     7.8356    3.8327  4.0000
## weight            131.5189      121.5943    22.2328    9.9245 10.0000
## english.L          -0.4455       -0.5328     0.2080    0.0872  0.0000
## english.Q           0.1791        0.3397     0.3666   -0.1606  0.0000
## english.C          -0.0436       -0.1521     0.3487    0.1085  0.0000
## english^4           0.0400        0.0529     0.2750   -0.0129  0.0000
## mandarin.L          0.1643       -0.0021     0.3916    0.1665  0.3162
## mandarin.Q         -0.1420       -0.1247     0.4310   -0.0173  0.0000
## mandarin.C         -0.0314        0.0008     0.4315   -0.0322  0.0000
## mandarin^4          0.0984        0.0758     0.5035    0.0226  0.0000
## faEdu.L            -0.4577       -0.5092     0.2395    0.0515  0.0000
## faEdu.Q             0.2530        0.3117     0.3907   -0.0587  0.0000
## faEdu.C            -0.1607       -0.1638     0.3395    0.0031  0.0000
## faEdu^4             0.0425        0.0686     0.2947   -0.0261  0.0000
## faCCPmember1        0.2537        0.1509     0.3580    0.1028  0.0000
##              eQQ Mean eQQ Max
## distance       0.2235  0.4163
## age            5.1667  8.0000
## race1          0.0213  1.0000
## race2          0.0025  1.0000
## race3          0.0016  1.0000
## race4          0.0057  1.0000
## race5          0.0033  1.0000
## race6          0.0049  1.0000
## race7          0.0016  1.0000
## race8          0.0156  1.0000
## edu.L          0.3180  0.6325
## edu.Q          0.2087  0.8018
## edu.C          0.1797  0.6325
## edu^4          0.1393  0.5976
## height         3.8539 19.0000
## weight         9.9639 26.0000
## english.L      0.0867  0.3162
## english.Q      0.1606  0.8018
## english.C      0.1132  0.9487
## english^4      0.1040  0.5976
## mandarin.L     0.1664  0.3162
## mandarin.Q     0.0173  0.8018
## mandarin.C     0.0602  0.3162
## mandarin^4     0.0226  0.5976
## faEdu.L        0.0519  0.3162
## faEdu.Q        0.0586  0.8018
## faEdu.C        0.0296  0.6325
## faEdu^4        0.0407  0.5976
## faCCPmember1   0.1026  1.0000
## 
## 
## Summary of balance for matched data:
##              Means Treated Means Control SD Control Mean Diff eQQ Med
## distance            0.3344        0.3341     0.2257    0.0003  0.1154
## age                52.0722       51.7644    15.6361    0.3079  2.0000
## race1               0.9245        0.9305     0.2543   -0.0061  0.0000
## race2               0.0057        0.0043     0.0652    0.0015  0.0000
## race3               0.0074        0.0064     0.0798    0.0010  0.0000
## race4               0.0164        0.0176     0.1314   -0.0011  0.0000
## race5               0.0074        0.0038     0.0614    0.0036  0.0000
## race6               0.0066        0.0067     0.0818   -0.0002  0.0000
## race7               0.0074        0.0094     0.0963   -0.0020  0.0000
## race8               0.0246        0.0213     0.1446    0.0033  0.0000
## edu.L               0.0096        0.0138     0.3395   -0.0042  0.0000
## edu.Q              -0.2166       -0.2261     0.3301    0.0096  0.0000
## edu.C              -0.1581       -0.1501     0.4757   -0.0080  0.0000
## edu^4              -0.0762       -0.0685     0.5208   -0.0078  0.0000
## height            167.7767      168.1281     7.7595   -0.3514  1.0000
## weight            131.5189      132.5445    24.6256   -1.0256  4.0000
## english.L          -0.4455       -0.4410     0.2695   -0.0045  0.0000
## english.Q           0.1791        0.1794     0.4444   -0.0003  0.0000
## english.C          -0.0436       -0.0423     0.4041   -0.0014  0.0000
## english^4           0.0400        0.0442     0.3693   -0.0041  0.0000
## mandarin.L          0.1643        0.1712     0.3494   -0.0069  0.0000
## mandarin.Q         -0.1420       -0.1301     0.4340   -0.0118  0.0000
## mandarin.C         -0.0314       -0.0236     0.4221   -0.0078  0.0000
## mandarin^4          0.0984        0.0983     0.5052    0.0001  0.0000
## faEdu.L            -0.4577       -0.4421     0.3040   -0.0156  0.0000
## faEdu.Q             0.2530        0.2348     0.4238    0.0182  0.0000
## faEdu.C            -0.1607       -0.1594     0.3687   -0.0013  0.0000
## faEdu^4             0.0425        0.0379     0.3388    0.0046  0.0000
## faCCPmember1        0.2537        0.2542     0.4355   -0.0005  0.0000
##              eQQ Mean eQQ Max
## distance       0.1267  0.2538
## age            2.0082  6.0000
## race1          0.0025  1.0000
## race2          0.0016  1.0000
## race3          0.0033  1.0000
## race4          0.0000  0.0000
## race5          0.0033  1.0000
## race6          0.0025  1.0000
## race7          0.0008  1.0000
## race8          0.0041  1.0000
## edu.L          0.1337  0.3162
## edu.Q          0.0391  0.8018
## edu.C          0.1355  0.6325
## edu^4          0.0569  0.5976
## height         1.3662  4.0000
## weight         4.0599 26.0000
## english.L      0.0382  0.3162
## english.Q      0.0709  0.8018
## english.C      0.0467  0.6325
## english^4      0.0456  0.5976
## mandarin.L     0.0576  0.3162
## mandarin.Q     0.0103  0.8018
## mandarin.C     0.0223  0.3162
## mandarin^4     0.0064  0.5976
## faEdu.L        0.0184  0.3162
## faEdu.Q        0.0160  0.8018
## faEdu.C        0.0091  0.6325
## faEdu^4        0.0177  0.5976
## faCCPmember1   0.0378  1.0000
## 
## Percent Balance Improvement:
##              Mean Diff.  eQQ Med  eQQ Mean  eQQ Max
## distance        99.8549  47.4178   43.3413  39.0198
## age             94.0270  60.0000   61.1314  25.0000
## race1           71.0665   0.0000   88.4615   0.0000
## race2           43.3078   0.0000   33.3333   0.0000
## race3            8.4500   0.0000 -100.0000   0.0000
## race4           77.6921   0.0000  100.0000 100.0000
## race5            5.9052   0.0000    0.0000   0.0000
## race6           96.4519   0.0000   50.0000   0.0000
## race7          -32.6211   0.0000   50.0000   0.0000
## race8           78.2538   0.0000   73.6842   0.0000
## edu.L           98.6943 100.0000   57.9592  50.0000
## edu.Q           95.4205 100.0000   81.2829   0.0000
## edu.C           95.5180   0.0000   24.5665   0.0000
## edu^4           82.7960 100.0000   59.1549   0.0000
## height          90.8316  75.0000   64.5505  78.9474
## weight          89.6659  60.0000   59.2535   0.0000
## english.L       94.8215   0.0000   55.9880   0.0000
## english.Q       99.8088   0.0000   55.8743   0.0000
## english.C       98.7551   0.0000   58.7156  33.3333
## english^4       68.0573   0.0000   56.1321   0.0000
## mandarin.L      95.8824 100.0000   65.3666   0.0000
## mandarin.Q      31.4235   0.0000   40.5063   0.0000
## mandarin.C      75.6439   0.0000   62.9310   0.0000
## mandarin^4      99.5665   0.0000   71.7391   0.0000
## faEdu.L         69.7285   0.0000   64.5000   0.0000
## faEdu.Q         69.0340   0.0000   72.6592   0.0000
## faEdu.C         58.1658   0.0000   69.2982   0.0000
## faEdu^4         82.3399   0.0000   56.6265   0.0000
## faCCPmember1    99.5209   0.0000   63.2000   0.0000
## 
## Sample sizes:
##           Control Treated
## All          7324    1218
## Matched      2653    1218
## Unmatched    4671       0
## Discarded       0       0
library("cobalt")
## 
## Attaching package: 'cobalt'
## The following object is masked from 'package:MatchIt':
## 
##     lalonde
bal.tab(match1)
## Call
##  matchit(formula = CCPmember ~ age + race + edu + height + weight + 
##     english + mandarin + faEdu + faCCPmember, data = data.nomissing, 
##     method = "nearest", ratio = 5, replace = TRUE)
## 
## Balance Measures
##                 Type Diff.Adj
## distance    Distance   0.0014
## age          Contin.   0.0202
## race_1        Binary  -0.0061
## race_2        Binary   0.0015
## race_3        Binary   0.0010
## race_4        Binary  -0.0011
## race_5        Binary   0.0036
## race_6        Binary  -0.0002
## race_7        Binary  -0.0020
## race_8        Binary   0.0033
## edu_1         Binary   0.0094
## edu_2         Binary  -0.0026
## edu_3         Binary  -0.0107
## edu_4         Binary   0.0049
## edu_5         Binary  -0.0010
## height       Contin.  -0.0498
## weight       Contin.  -0.0464
## english_1     Binary   0.0026
## english_2     Binary   0.0026
## english_3     Binary  -0.0028
## english_4     Binary   0.0015
## english_5     Binary  -0.0039
## mandarin_1    Binary   0.0005
## mandarin_2    Binary   0.0003
## mandarin_3    Binary   0.0064
## mandarin_4    Binary   0.0059
## mandarin_5    Binary  -0.0131
## faEdu_1       Binary   0.0205
## faEdu_2       Binary  -0.0030
## faEdu_3       Binary  -0.0064
## faEdu_4       Binary  -0.0112
## faEdu_5       Binary   0.0000
## faCCPmember   Binary  -0.0005
## 
## Sample sizes
##           Control Treated
## All          7324    1218
## Matched      2653    1218
## Unmatched    4671       0
love.plot(match1, abs = F)

The plot shows that for unadjusted cases (before matching), the standardized mean difference is large (even though none of them greater than 1.96). However, for adjusted cases (after matching), the standardized mean difference is small, the absolute value even smaller than 0.1.
For more formal check, I run the command MatchBalance.

## I've tried 5000 times, unfortuntely, my poor laptop hardly can run it, haha!
balance1 <- MatchBalance(CCPmember ~ age + race + edu + height + weight + english +
  mandarin + faEdu + faCCPmember,
  match.out = psm1, nboots = 1000, data = data.nomissing)
## 
## ***** (V1) age *****
##                        Before Matching        After Matching
## mean treatment........     52.072             52.072 
## mean control..........     46.918             51.564 
## std mean diff.........     33.796             3.3349 
## 
## mean raw eQQ diff.....     5.1667             1.0741 
## med  raw eQQ diff.....          5                  1 
## max  raw eQQ diff.....          8                  9 
## 
## mean eCDF diff........   0.067003           0.013934 
## med  eCDF diff........   0.060643          0.0087891 
## max  eCDF diff........    0.15073           0.059631 
## 
## var ratio (Tr/Co).....     1.0952            0.97383 
## T-test p-value........ < 2.22e-16            0.39098 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.15073           0.059631 
## 
## 
## ***** (V2) race2 *****
##                        Before Matching        After Matching
## mean treatment........  0.0057471          0.0057471 
## mean control..........  0.0031404          0.0040935 
## std mean diff.........     3.4471             2.1867 
## 
## mean raw eQQ diff.....  0.0024631          0.0029748 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0013034          0.0014874 
## med  eCDF diff........  0.0013034          0.0014874 
## max  eCDF diff........  0.0026068          0.0029748 
## 
## var ratio (Tr/Co).....     1.8265             1.4016 
## T-test p-value........    0.24962            0.56078 
## 
## 
## ***** (V3) race3 *****
##                        Before Matching        After Matching
## mean treatment........  0.0073892          0.0073892 
## mean control..........  0.0084653           0.009978 
## std mean diff.........    -1.2561            -3.0216 
## 
## mean raw eQQ diff.....   0.001642          0.0025015 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........ 0.00053808          0.0012508 
## med  eCDF diff........ 0.00053808          0.0012508 
## max  eCDF diff........  0.0010762          0.0025015 
## 
## var ratio (Tr/Co).....    0.87442            0.74248 
## T-test p-value........    0.68787            0.49302 
## 
## 
## ***** (V4) race4 *****
##                        Before Matching        After Matching
## mean treatment........    0.01642            0.01642 
## mean control..........   0.021573           0.018772 
## std mean diff.........    -4.0527            -1.8495 
## 
## mean raw eQQ diff.....  0.0057471          0.0068961 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0025763           0.003448 
## med  eCDF diff........  0.0025763           0.003448 
## max  eCDF diff........  0.0051525          0.0068961 
## 
## var ratio (Tr/Co).....    0.76569            0.87683 
## T-test p-value........    0.20001            0.65831 
## 
## 
## ***** (V5) race5 *****
##                        Before Matching        After Matching
## mean treatment........  0.0073892          0.0073892 
## mean control..........    0.00355          0.0028792 
## std mean diff.........      4.481             5.2638 
## 
## mean raw eQQ diff.....  0.0032841          0.0016902 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0019196         0.00084511 
## med  eCDF diff........  0.0019196         0.00084511 
## max  eCDF diff........  0.0038392          0.0016902 
## 
## var ratio (Tr/Co).....     2.0749             2.5547 
## T-test p-value........    0.13262            0.12026 
## 
## 
## ***** (V6) race6 *****
##                        Before Matching        After Matching
## mean treatment........  0.0065681          0.0065681 
## mean control..........   0.011196          0.0053493 
## std mean diff.........    -5.7269             1.5082 
## 
## mean raw eQQ diff.....  0.0049261         0.00047326 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.002314         0.00023663 
## med  eCDF diff........   0.002314         0.00023663 
## max  eCDF diff........  0.0046279         0.00047326 
## 
## var ratio (Tr/Co).....     0.5898             1.2263 
## T-test p-value........   0.077679            0.69685 
## 
## 
## ***** (V7) race7 *****
##                        Before Matching        After Matching
## mean treatment........  0.0073892          0.0073892 
## mean control..........  0.0088749           0.012637 
## std mean diff.........    -1.7341            -6.1248 
## 
## mean raw eQQ diff.....   0.001642          0.0060172 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........ 0.00074288          0.0030086 
## med  eCDF diff........ 0.00074288          0.0030086 
## max  eCDF diff........  0.0014858          0.0060172 
## 
## var ratio (Tr/Co).....    0.83441            0.58785 
## T-test p-value........    0.58058            0.19555 
## 
## 
## ***** (V8) race8 *****
##                        Before Matching        After Matching
## mean treatment........   0.024631           0.024631 
## mean control..........   0.039732           0.021889 
## std mean diff.........    -9.7394             1.7677 
## 
## mean raw eQQ diff.....   0.015599          0.0086539 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0075509           0.004327 
## med  eCDF diff........  0.0075509           0.004327 
## max  eCDF diff........   0.015102          0.0086539 
## 
## var ratio (Tr/Co).....    0.63009             1.1221 
## T-test p-value........  0.0025328            0.65361 
## 
## 
## ***** (V9) edu.L *****
##                        Before Matching        After Matching
## mean treatment........  0.0096063          0.0096063 
## mean control..........   -0.30854           0.016892 
## std mean diff.........     92.238            -2.1123 
## 
## mean raw eQQ diff.....    0.31805          0.0089795 
## med  raw eQQ diff.....    0.31623                  0 
## max  raw eQQ diff.....    0.63246            0.31623 
## 
## mean eCDF diff........    0.20121          0.0056791 
## med  eCDF diff........    0.26148          0.0060172 
## max  eCDF diff........    0.39828           0.012913 
## 
## var ratio (Tr/Co).....      1.167             1.0028 
## T-test p-value........ < 2.22e-16            0.46286 
## KS Bootstrap p-value.. < 2.22e-16               0.04 
## KS Naive p-value...... < 2.22e-16            0.16967 
## KS Statistic..........    0.39828           0.012913 
## 
## 
## ***** (V10) edu.Q *****
##                        Before Matching        After Matching
## mean treatment........   -0.21657           -0.21657 
## mean control.......... -0.0076631           -0.21696 
## std mean diff.........    -62.214            0.11508 
## 
## mean raw eQQ diff.....    0.20867           0.011709 
## med  raw eQQ diff..... 3.3307e-16                  0 
## max  raw eQQ diff.....    0.80178            0.80178 
## 
## mean eCDF diff........    0.17109           0.006051 
## med  eCDF diff........    0.15673          0.0052397 
## max  eCDF diff........    0.37091           0.013725 
## 
## var ratio (Tr/Co).....    0.58515             1.0091 
## T-test p-value........ < 2.22e-16            0.97554 
## KS Bootstrap p-value.. < 2.22e-16              0.026 
## KS Naive p-value...... < 2.22e-16             0.1233 
## KS Statistic..........    0.37091           0.013725 
## 
## 
## ***** (V11) edu.C *****
##                        Before Matching        After Matching
## mean treatment........   -0.15811           -0.15811 
## mean control..........   0.021459           -0.15879 
## std mean diff.........      -37.8            0.14197 
## 
## mean raw eQQ diff.....    0.17966            0.02076 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.63246            0.63246 
## 
## mean eCDF diff........    0.11357            0.01313 
## med  eCDF diff........    0.10943           0.018119 
## max  eCDF diff........    0.29156           0.021567 
## 
## var ratio (Tr/Co).....     1.0667            0.99459 
## T-test p-value........ < 2.22e-16            0.96873 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16          0.0020564 
## KS Statistic..........    0.29156           0.021567 
## 
## 
## ***** (V12) edu^4 *****
##                        Before Matching        After Matching
## mean treatment........  -0.076247          -0.076247 
## mean control..........  -0.031186          -0.080147 
## std mean diff.........    -8.7472            0.75704 
## 
## mean raw eQQ diff.....    0.13935           0.011354 
## med  raw eQQ diff.....  9.992e-16                  0 
## max  raw eQQ diff.....    0.59761            0.59761 
## 
## mean eCDF diff........    0.12648          0.0055304 
## med  eCDF diff........    0.10673           0.003448 
## max  eCDF diff........    0.29156           0.016361 
## 
## var ratio (Tr/Co).....     1.3458              1.004 
## T-test p-value........  0.0040318            0.84836 
## KS Bootstrap p-value.. < 2.22e-16              0.007 
## KS Naive p-value...... < 2.22e-16           0.038148 
## KS Statistic..........    0.29156           0.016361 
## 
## 
## ***** (V13) height *****
##                        Before Matching        After Matching
## mean treatment........     167.78             167.78 
## mean control..........     163.94              168.3 
## std mean diff.........     54.349            -7.4255 
## 
## mean raw eQQ diff.....     3.8539             0.8184 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....         19                  4 
## 
## mean eCDF diff........   0.068411           0.015671 
## med  eCDF diff........   0.029911          0.0093638 
## max  eCDF diff........    0.23164           0.062673 
## 
## var ratio (Tr/Co).....    0.80998            0.83895 
## T-test p-value........ < 2.22e-16           0.065974 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.23164           0.062673 
## 
## 
## ***** (V14) weight *****
##                        Before Matching        After Matching
## mean treatment........     131.52             131.52 
## mean control..........     121.59             133.38 
## std mean diff.........     44.862            -8.4079 
## 
## mean raw eQQ diff.....     9.9639             1.0723 
## med  raw eQQ diff.....         10                  0 
## max  raw eQQ diff.....         26                 30 
## 
## mean eCDF diff........   0.078543          0.0080914 
## med  eCDF diff........   0.054683          0.0076736 
## max  eCDF diff........    0.20151           0.029207 
## 
## var ratio (Tr/Co).....    0.99007            0.76899 
## T-test p-value........ < 2.22e-16           0.044873 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         6.6277e-06 
## KS Statistic..........    0.20151           0.029207 
## 
## 
## ***** (V15) english.L *****
##                        Before Matching        After Matching
## mean treatment........   -0.44552           -0.44552 
## mean control..........   -0.53276           -0.43368 
## std mean diff.........     33.317            -4.5225 
## 
## mean raw eQQ diff.....   0.086716           0.010391 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.31623            0.31623 
## 
## mean eCDF diff........   0.055174          0.0065716 
## med  eCDF diff........   0.010158           0.006558 
## max  eCDF diff........    0.17688           0.014874 
## 
## var ratio (Tr/Co).....     1.5843            0.86419 
## T-test p-value........ < 2.22e-16            0.26526 
## KS Bootstrap p-value.. < 2.22e-16              0.003 
## KS Naive p-value...... < 2.22e-16           0.075837 
## KS Statistic..........    0.17688           0.014874 
## 
## 
## ***** (V16) english.Q *****
##                        Before Matching        After Matching
## mean treatment........    0.17905            0.17905 
## mean control..........     0.3397               0.18 
## std mean diff.........    -36.211            -0.2136 
## 
## mean raw eQQ diff.....    0.16062          0.0094683 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.80178            0.80178 
## 
## mean eCDF diff........   0.084037          0.0042762 
## med  eCDF diff........   0.080664          0.0028734 
## max  eCDF diff........    0.17482           0.011358 
## 
## var ratio (Tr/Co).....     1.4646            0.99311 
## T-test p-value........ < 2.22e-16            0.95596 
## KS Bootstrap p-value.. < 2.22e-16              0.028 
## KS Naive p-value...... < 2.22e-16            0.29575 
## KS Statistic..........    0.17482           0.011358 
## 
## 
## ***** (V17) english.C *****
##                        Before Matching        After Matching
## mean treatment........  -0.043618          -0.043618 
## mean control..........   -0.15207          -0.049119 
## std mean diff.........      26.72             1.3554 
## 
## mean raw eQQ diff.....     0.1132            0.01022 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.94868            0.63246 
## 
## mean eCDF diff........    0.07183          0.0064634 
## med  eCDF diff........   0.090107          0.0069637 
## max  eCDF diff........    0.16878           0.011832 
## 
## var ratio (Tr/Co).....     1.3548             1.0211 
## T-test p-value........ < 2.22e-16            0.72978 
## KS Bootstrap p-value.. < 2.22e-16               0.02 
## KS Naive p-value...... < 2.22e-16            0.25174 
## KS Statistic..........    0.16878           0.011832 
## 
## 
## ***** (V18) english^4 *****
##                        Before Matching        After Matching
## mean treatment........   0.040037           0.040037 
## mean control..........    0.05294           0.047825 
## std mean diff.........    -3.4949            -2.1094 
## 
## mean raw eQQ diff.....    0.10402          0.0067879 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.59761            0.59761 
## 
## mean eCDF diff........   0.052319          0.0038537 
## med  eCDF diff........   0.076615          0.0030424 
## max  eCDF diff........   0.098205           0.010006 
## 
## var ratio (Tr/Co).....     1.8018            0.99889 
## T-test p-value........    0.24338            0.60318 
## KS Bootstrap p-value.. < 2.22e-16              0.025 
## KS Naive p-value......  3.571e-09            0.44952 
## KS Statistic..........   0.098205           0.010006 
## 
## 
## ***** (V19) mandarin.L *****
##                        Before Matching        After Matching
## mean treatment........    0.16434            0.16434 
## mean control.......... -0.0021157            0.16801 
## std mean diff.........     48.059            -1.0594 
## 
## mean raw eQQ diff.....    0.16642           0.016591 
## med  raw eQQ diff.....    0.31623                  0 
## max  raw eQQ diff.....    0.31623            0.31623 
## 
## mean eCDF diff........    0.10528           0.010493 
## med  eCDF diff........    0.10163          0.0080454 
## max  eCDF diff........    0.18083           0.021973 
## 
## var ratio (Tr/Co).....    0.78226            0.97584 
## T-test p-value........ < 2.22e-16             0.7829 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16          0.0015837 
## KS Statistic..........    0.18083           0.021973 
## 
## 
## ***** (V20) mandarin.Q *****
##                        Before Matching        After Matching
## mean treatment........   -0.14197           -0.14197 
## mean control..........   -0.12469           -0.13078 
## std mean diff.........    -4.0292            -2.6096 
## 
## mean raw eQQ diff.....   0.017335            0.02938 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.80178            0.80178 
## 
## mean eCDF diff........   0.032699           0.018728 
## med  eCDF diff........   0.019265           0.023088 
## max  eCDF diff........   0.092265           0.028734 
## 
## var ratio (Tr/Co).....    0.99019            0.98612 
## T-test p-value........    0.19339            0.52265 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 3.7946e-08         9.9427e-06 
## KS Statistic..........   0.092265           0.028734 
## 
## 
## ***** (V21) mandarin.C *****
##                        Before Matching        After Matching
## mean treatment........  -0.031415          -0.031415 
## mean control.......... 0.00077718            -0.0287 
## std mean diff.........     -7.608           -0.64154 
## 
## mean raw eQQ diff.....   0.060234          0.0098988 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.31623            0.31623 
## 
## mean eCDF diff........    0.03804          0.0062606 
## med  eCDF diff........   0.034834          0.0062876 
## max  eCDF diff........   0.079205           0.014401 
## 
## var ratio (Tr/Co).....    0.96168            0.98881 
## T-test p-value........   0.014322            0.87536 
## KS Bootstrap p-value.. < 2.22e-16              0.023 
## KS Naive p-value...... 4.0785e-06            0.09308 
## KS Statistic..........   0.079205           0.014401 
## 
## 
## ***** (V22) mandarin^4 *****
##                        Before Matching        After Matching
## mean treatment........   0.098425           0.098425 
## mean control..........   0.075787           0.089898 
## std mean diff.........     4.4409             1.6728 
## 
## mean raw eQQ diff.....    0.02257           0.017172 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.59761            0.59761 
## 
## mean eCDF diff........   0.043743          0.0066121 
## med  eCDF diff........    0.02547          0.0030424 
## max  eCDF diff........    0.11404           0.023731 
## 
## var ratio (Tr/Co).....     1.0249             1.0156 
## T-test p-value........    0.15074            0.68116 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 3.1947e-12         0.00048261 
## KS Statistic..........    0.11404           0.023731 
## 
## 
## ***** (V23) faEdu.L *****
##                        Before Matching        After Matching
## mean treatment........   -0.45773           -0.45773 
## mean control..........   -0.50919           -0.43605 
## std mean diff.........     17.628            -7.4251 
## 
## mean raw eQQ diff.....   0.051926            0.02743 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.31623            0.31623 
## 
## mean eCDF diff........   0.032928           0.021685 
## med  eCDF diff........   0.041521           0.022717 
## max  eCDF diff........   0.068011           0.041309 
## 
## var ratio (Tr/Co).....     1.4859            0.87018 
## T-test p-value........ 6.6079e-09           0.070234 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 0.00012745         2.1855e-11 
## KS Statistic..........   0.068011           0.041309 
## 
## 
## ***** (V24) faEdu.Q *****
##                        Before Matching        After Matching
## mean treatment........      0.253              0.253 
## mean control..........    0.31167            0.23517 
## std mean diff.........    -14.073             4.2756 
## 
## mean raw eQQ diff.....   0.058587           0.043023 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.80178            0.80178 
## 
## mean eCDF diff........   0.034177             0.0299 
## med  eCDF diff........   0.033872           0.039145 
## max  eCDF diff........   0.068966           0.041309 
## 
## var ratio (Tr/Co).....     1.1388            0.98003 
## T-test p-value........ 4.8289e-06            0.28311 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 9.6957e-05         2.1855e-11 
## KS Statistic..........   0.068966           0.041309 
## 
## 
## ***** (V25) faEdu.C *****
##                        Before Matching        After Matching
## mean treatment........   -0.16071           -0.16071 
## mean control..........   -0.16381           -0.16817 
## std mean diff.........     0.8541             2.0537 
## 
## mean raw eQQ diff.....   0.029598           0.013106 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.63246            0.63246 
## 
## mean eCDF diff........   0.018953           0.010344 
## med  eCDF diff........   0.013856          0.0021297 
## max  eCDF diff........   0.042477           0.037117 
## 
## var ratio (Tr/Co).....     1.1452            0.96407 
## T-test p-value........    0.78063             0.6175 
## KS Bootstrap p-value..      0.002         < 2.22e-16 
## KS Naive p-value......   0.046169         2.8266e-09 
## KS Statistic..........   0.042477           0.037117 
## 
## 
## ***** (V26) faEdu^4 *****
##                        Before Matching        After Matching
## mean treatment........    0.04249            0.04249 
## mean control..........   0.068607           0.027254 
## std mean diff.........    -7.9389             4.6316 
## 
## mean raw eQQ diff.....   0.040724           0.024687 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....    0.59761            0.59761 
## 
## mean eCDF diff........   0.024624           0.011375 
## med  eCDF diff........   0.012633          0.0042255 
## max  eCDF diff........   0.056333            0.03705 
## 
## var ratio (Tr/Co).....      1.246            0.95137 
## T-test p-value........  0.0093446            0.25289 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value......   0.002645         3.0443e-09 
## KS Statistic..........   0.056333            0.03705 
## 
## 
## ***** (V27) faCCPmember1 *****
##                        Before Matching        After Matching
## mean treatment........    0.25369            0.25369 
## mean control..........    0.15087              0.236 
## std mean diff.........      23.62              4.066 
## 
## mean raw eQQ diff.....    0.10263           0.029477 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.05141           0.014739 
## med  eCDF diff........    0.05141           0.014739 
## max  eCDF diff........    0.10282           0.029477 
## 
## var ratio (Tr/Co).....     1.4789             1.0501 
## T-test p-value........ 1.0214e-14            0.29984 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age edu.L edu.Q edu.C edu^4 height weight english.L english.Q english.C english^4 mandarin.L mandarin.Q mandarin.C mandarin^4 faEdu.L faEdu.Q faEdu^4  Number(s): 1 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 
## 
## After Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age edu.C height weight mandarin.L mandarin.Q mandarin^4 faEdu.L faEdu.Q faEdu.C faEdu^4  Number(s): 1 11 13 14 19 20 22 23 24 25 26

The result shows that though for most variables, t-tests provide insignificant results, KS tests suggest a significnat difference between treatment group and control group. To detect the problem, I try all the variables one by one.

5.2 Comparing the distributions

bal.plot(match1, var.name = "age")
bal.plot(match1, var.name = "race")
bal.plot(match1, var.name = "edu")
bal.plot(match1, var.name = "height")
bal.plot(match1, var.name = "weight")
bal.plot(match1, var.name = "faEdu")
bal.plot(match1, var.name = "faCCPmember")
bal.plot(match1, var.name = "english")
bal.plot(match1, var.name = "mandarin")

The figures show that the main problem displays in continuous variables, which are age, height, and weight. Besides, in the age cases, the two groups do not meet the common support.

library("psych")
## 
## Attaching package: 'psych'
## The following object is masked from 'package:car':
## 
##     logit
describeBy(age, group = CCPmember)
## 
##  Descriptive statistics by group 
## group: 0
##    vars    n  mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 7324 46.92 14.57     46   46.44 14.83  18  96    78 0.29    -0.46
##      se
## X1 0.17
## -------------------------------------------------------- 
## group: 1
##    vars    n  mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 1218 52.07 15.25     52   51.89 17.79  21  90    69  0.1    -0.85
##      se
## X1 0.44

The range of age for the treatment group is 21 to 90 while the range of age for the control group is 18 to 96 (This must be that people under 20 can not be a member of CCP).
On the other hand, there probably has an interaction relationship between education, age, and income.

scatterplot(lnhincome ~ age | edu , data = data.nomissing, smooth = FALSE)

The scatter plot shows that, for people with higher education level, their income increase as age increase (aka. the grey and orange line). However, for people with lower education level, their income decrease as age increase (aka. the blue, and pink line.)

5.3 Necessary adjustment

Therefore, I am going to:
1. Drop the observations whose age greater than 90 or lower than 21.
2. Include age-squared and BMI((weight/2)/(height/100)^2) (Note: according to the questionnaire, the unit of weight is 500g instead of kg. The unit of height is cm.).
3. Add an interaction term between education and age.

data.nomissing <- subset(data.nomissing, age>=21 & age<=90)
detach(data.nomissing)
attach(data.nomissing)
## The following objects are masked _by_ .GlobalEnv:
## 
##     lnhincome, pscore
data.nomissing$age.squared <- age^2
data.nomissing$BMI <- (weight/2)/((height/100)^2)

Now, I am going to do everything again. Besides, I choose 1 vs. 5 nearest neighborhood matching method.

match3 <- matchit(CCPmember ~ age + age.squared + race + edu + age:edu + height + weight + 
                  BMI +english + mandarin + faEdu + faCCPmember, data = data.nomissing, 
                  method = "nearest", ratio=5, replace = TRUE)
summary(match3)
## 
## Call:
## matchit(formula = CCPmember ~ age + age.squared + race + edu + 
##     age:edu + height + weight + BMI + english + mandarin + faEdu + 
##     faCCPmember, data = data.nomissing, method = "nearest", ratio = 5, 
##     replace = TRUE)
## 
## Summary of balance for all data:
##              Means Treated Means Control SD Control Mean Diff  eQQ Med
## distance            0.3352        0.1122     0.1316    0.2230   0.2326
## age                52.0722       47.2894    14.2569    4.7829   5.0000
## age.squared      2943.9507     2439.5139  1433.2486  504.4369 485.0000
## race1               0.9245        0.9031     0.2958    0.0213   0.0000
## race2               0.0057        0.0030     0.0551    0.0027   0.0000
## race3               0.0074        0.0083     0.0908   -0.0009   0.0000
## race4               0.0164        0.0216     0.1454   -0.0052   0.0000
## race5               0.0074        0.0036     0.0599    0.0038   0.0000
## race6               0.0066        0.0112     0.1054   -0.0047   0.0000
## race7               0.0074        0.0090     0.0945   -0.0016   0.0000
## race8               0.0246        0.0400     0.1961   -0.0154   0.0000
## edu.L               0.0096       -0.3099     0.3201    0.3195   0.3162
## edu.Q              -0.2166       -0.0040     0.4395   -0.2126   0.0000
## edu.C              -0.1581        0.0181     0.4596   -0.1763   0.0000
## edu^4              -0.0762       -0.0310     0.4425   -0.0452   0.0000
## height            167.7767      163.9055     7.8070    3.8712   4.0000
## weight            131.5189      121.7589    22.2139    9.7600  10.0000
## BMI                23.3063       22.5816     3.2859    0.7247   0.7840
## english.L          -0.4455       -0.5353     0.2057    0.0897   0.0000
## english.Q           0.1791        0.3443     0.3632   -0.1652   0.0000
## english.C          -0.0436       -0.1554     0.3461    0.1118   0.0000
## english^4           0.0400        0.0537     0.2719   -0.0136   0.0000
## mandarin.L          0.1643       -0.0050     0.3917    0.1693   0.3162
## mandarin.Q         -0.1420       -0.1244     0.4310   -0.0175   0.0000
## mandarin.C         -0.0314        0.0010     0.4315   -0.0324   0.0000
## mandarin^4          0.0984        0.0757     0.5034    0.0227   0.0000
## faEdu.L            -0.4577       -0.5108     0.2389    0.0531   0.0000
## faEdu.Q             0.2530        0.3154     0.3888   -0.0624   0.0000
## faEdu.C            -0.1607       -0.1677     0.3358    0.0070   0.0000
## faEdu^4             0.0425        0.0708     0.2923   -0.0283   0.0000
## faCCPmember1        0.2537        0.1511     0.3581    0.1026   0.0000
## age:edu.L          -2.0098      -16.3133    17.1550   14.3035  14.8627
## age:edu.Q         -10.4594        1.8436    22.5601  -12.3030   6.4143
## age:edu.C          -6.8342        0.2327    21.5258   -7.0670   7.5895
## age:edu^4          -2.6015       -0.5594    20.3800   -2.0421   6.2152
##              eQQ Mean  eQQ Max
## distance       0.2229   0.3732
## age            4.7808   8.0000
## age.squared  503.5148 973.0000
## race1          0.0213   1.0000
## race2          0.0025   1.0000
## race3          0.0008   1.0000
## race4          0.0057   1.0000
## race5          0.0033   1.0000
## race6          0.0049   1.0000
## race7          0.0016   1.0000
## race8          0.0156   1.0000
## edu.L          0.3191   0.6325
## edu.Q          0.2126   0.8018
## edu.C          0.1760   0.6325
## edu^4          0.1423   0.5976
## height         3.8916  19.0000
## weight         9.8046  26.0000
## BMI            0.7478   5.0821
## english.L      0.0893   0.3162
## english.Q      0.1650   0.8018
## english.C      0.1168   0.9487
## english^4      0.1070   0.5976
## mandarin.L     0.1693   0.3162
## mandarin.Q     0.0173   0.8018
## mandarin.C     0.0613   0.3162
## mandarin^4     0.0226   0.5976
## faEdu.L        0.0535   0.3162
## faEdu.Q        0.0619   0.8018
## faEdu.C        0.0335   0.6325
## faEdu^4        0.0437   0.5976
## faCCPmember1   0.1026   1.0000
## age:edu.L     14.3216  25.2982
## age:edu.Q     12.3197  37.4166
## age:edu.C      8.1944  24.0333
## age:edu^4      9.5542  24.0241
## 
## 
## Summary of balance for matched data:
##              Means Treated Means Control SD Control Mean Diff  eQQ Med
## distance            0.3352        0.3348     0.2122    0.0005   0.1221
## age                52.0722       51.5200    15.2625    0.5522   1.0000
## age.squared      2943.9507     2887.1696  1626.3585   56.7811 108.0000
## race1               0.9245        0.9353     0.2460   -0.0108   0.0000
## race2               0.0057        0.0053     0.0723    0.0005   0.0000
## race3               0.0074        0.0049     0.0700    0.0025   0.0000
## race4               0.0164        0.0143     0.1187    0.0021   0.0000
## race5               0.0074        0.0021     0.0462    0.0053   0.0000
## race6               0.0066        0.0066     0.0808    0.0000   0.0000
## race7               0.0074        0.0072     0.0847    0.0002   0.0000
## race8               0.0246        0.0243     0.1540    0.0003   0.0000
## edu.L               0.0096        0.0166     0.3496   -0.0070   0.0000
## edu.Q              -0.2166       -0.2073     0.3456   -0.0093   0.0000
## edu.C              -0.1581       -0.1707     0.4662    0.0126   0.0000
## edu^4              -0.0762       -0.0677     0.5140   -0.0085   0.0000
## height            167.7767      167.9844     7.5622   -0.2077   2.0000
## weight            131.5189      132.3353    23.0429   -0.8164   4.0000
## BMI                23.3063       23.3842     3.3359   -0.0778   0.3211
## english.L          -0.4455       -0.4392     0.2664   -0.0063   0.0000
## english.Q           0.1791        0.1706     0.4476    0.0084   0.0000
## english.C          -0.0436       -0.0446     0.4032    0.0010   0.0000
## english^4           0.0400        0.0482     0.3742   -0.0081   0.0000
## mandarin.L          0.1643        0.1692     0.3417   -0.0049   0.0000
## mandarin.Q         -0.1420       -0.1461     0.4228    0.0041   0.0000
## mandarin.C         -0.0314       -0.0477     0.4290    0.0163   0.0000
## mandarin^4          0.0984        0.0818     0.5116    0.0166   0.0000
## faEdu.L            -0.4577       -0.4466     0.2973   -0.0111   0.0000
## faEdu.Q             0.2530        0.2347     0.4237    0.0183   0.0000
## faEdu.C            -0.1607       -0.1502     0.3727   -0.0105   0.0000
## faEdu^4             0.0425        0.0376     0.3388    0.0049   0.0000
## faCCPmember1        0.2537        0.2557     0.4363   -0.0020   0.0000
## age:edu.L          -2.0098       -1.7677    19.7204   -0.2421   6.3246
## age:edu.Q         -10.4594       -9.8750    20.3538   -0.5844   1.3363
## age:edu.C          -6.8342       -7.5014    24.2928    0.6672   5.0596
## age:edu^4          -2.6015       -1.9272    27.5066   -0.6743   2.8685
##              eQQ Mean  eQQ Max
## distance       0.1252   0.2322
## age            1.2660   3.0000
## age.squared  138.3103 393.0000
## race1          0.0057   1.0000
## race2          0.0025   1.0000
## race3          0.0000   0.0000
## race4          0.0033   1.0000
## race5          0.0041   1.0000
## race6          0.0016   1.0000
## race7          0.0008   1.0000
## race8          0.0074   1.0000
## edu.L          0.1485   0.3162
## edu.Q          0.0601   0.8018
## edu.C          0.1205   0.6325
## edu^4          0.0628   0.5976
## height         1.5813   5.0000
## weight         3.8883  26.0000
## BMI            0.3004   3.3302
## english.L      0.0421   0.3162
## english.Q      0.0812   0.8018
## english.C      0.0587   0.6325
## english^4      0.0525   0.5976
## mandarin.L     0.0644   0.3162
## mandarin.Q     0.0118   0.8018
## mandarin.C     0.0275   0.3162
## mandarin^4     0.0137   0.5976
## faEdu.L        0.0244   0.3162
## faEdu.Q        0.0250   0.8018
## faEdu.C        0.0132   0.6325
## faEdu^4        0.0177   0.5976
## faCCPmember1   0.0517   1.0000
## age:edu.L      8.0189  18.0250
## age:edu.Q      4.2650  37.4166
## age:edu.C      6.3290  25.9307
## age:edu^4      4.3500  20.3189
## 
## Percent Balance Improvement:
##              Mean Diff.  eQQ Med eQQ Mean  eQQ Max
## distance        99.7893  47.4759  43.8360  37.7886
## age             88.4543  80.0000  73.5188  62.5000
## age.squared     88.7437  77.7320  72.5310  59.6095
## race1           49.2021   0.0000  73.0769   0.0000
## race2           81.7440   0.0000   0.0000   0.0000
## race3         -166.0767   0.0000 100.0000 100.0000
## race4           58.9354   0.0000  42.8571   0.0000
## race5          -38.7859   0.0000 -25.0000   0.0000
## race6          100.0000   0.0000  66.6667   0.0000
## race7           89.8552   0.0000  50.0000   0.0000
## race8           97.8702   0.0000  52.6316   0.0000
## edu.L           97.8224 100.0000  53.4581  50.0000
## edu.Q           95.6233 100.0000  71.7234   0.0000
## edu.C           92.8411   0.0000  31.5634   0.0000
## edu^4           81.1151 100.0000  55.8621   0.0000
## height          94.6343  50.0000  59.3671  73.6842
## weight          91.6350  60.0000  60.3417   0.0000
## BMI             89.2588  59.0428  59.8272  34.4726
## english.L       92.9402   0.0000  52.9070   0.0000
## english.Q       94.9010   0.0000  50.7979   0.0000
## english.C       99.1177   0.0000  49.7778  33.3333
## english^4       40.3107   0.0000  50.9174   0.0000
## mandarin.L      97.1176 100.0000  61.9632   0.0000
## mandarin.Q      76.4589   0.0000  31.6456   0.0000
## mandarin.C      49.8728   0.0000  55.0847   0.0000
## mandarin^4      26.9832   0.0000  39.1304   0.0000
## faEdu.L         79.0815   0.0000  54.3689   0.0000
## faEdu.Q         70.7658   0.0000  59.5745   0.0000
## faEdu.C        -50.7656   0.0000  60.4651   0.0000
## faEdu^4         82.6841   0.0000  59.5506   0.0000
## faCCPmember1    98.0803   0.0000  49.6000   0.0000
## age:edu.L       98.3076  57.4468  44.0086  28.7500
## age:edu.Q       95.2501  79.1667  65.3807   0.0000
## age:edu.C       90.5590  33.3333  22.7647  -7.8947
## age:edu^4       66.9804  53.8462  54.4699  15.4229
## 
## Sample sizes:
##           Control Treated
## All          7216    1218
## Matched      2643    1218
## Unmatched    4573       0
## Discarded       0       0
library(cobalt)
bal.tab(match3)
## Call
##  matchit(formula = CCPmember ~ age + age.squared + race + edu + 
##     age:edu + height + weight + BMI + english + mandarin + faEdu + 
##     faCCPmember, data = data.nomissing, method = "nearest", ratio = 5, 
##     replace = TRUE)
## 
## Balance Measures
##                 Type Diff.Adj
## distance    Distance   0.0022
## age          Contin.   0.0362
## age.squared  Contin.   0.0348
## race_1        Binary  -0.0108
## race_2        Binary   0.0005
## race_3        Binary   0.0025
## race_4        Binary   0.0021
## race_5        Binary   0.0053
## race_6        Binary   0.0000
## race_7        Binary   0.0002
## race_8        Binary   0.0003
## edu_1         Binary  -0.0056
## edu_2         Binary   0.0167
## edu_3         Binary  -0.0011
## edu_4         Binary  -0.0036
## edu_5         Binary  -0.0064
## height       Contin.  -0.0295
## weight       Contin.  -0.0369
## BMI          Contin.  -0.0238
## english_1     Binary   0.0072
## english_2     Binary   0.0043
## english_3     Binary  -0.0103
## english_4     Binary  -0.0010
## english_5     Binary  -0.0002
## mandarin_1    Binary   0.0021
## mandarin_2    Binary   0.0028
## mandarin_3    Binary   0.0097
## mandarin_4    Binary  -0.0209
## mandarin_5    Binary   0.0062
## faEdu_1       Binary   0.0207
## faEdu_2       Binary  -0.0103
## faEdu_3       Binary  -0.0062
## faEdu_4       Binary  -0.0041
## faEdu_5       Binary   0.0000
## faCCPmember   Binary  -0.0020
## 
## Sample sizes
##           Control Treated
## All          7216    1218
## Matched      2643    1218
## Unmatched    4573       0
love.plot(match3, abs = F)
bal.plot(match3, var.name = "age")
bal.plot(match3, var.name = "age.squared")
bal.plot(match3, var.name = "height")
bal.plot(match3, var.name = "weight")
bal.plot(match3, var.name = "BMI")

Now, the figures look much better. However, the distribution of age for the two groups is still not similar. The main difference lies in the age group of 45 to 55. Therefore, I am going to add two new dummies variables which indicate whether an individual is below 45 and above 55.

data.nomissing$below45 <- age < 45
data.nomissing$above55 <- age > 55
match4 <- matchit(CCPmember ~ age + age.squared + below45 + above55 + race + edu + age:edu +
                  height + weight + BMI +english + mandarin + faEdu + faCCPmember, 
                  data = data.nomissing, method = "nearest", ratio=5, replace = TRUE)
summary(match4)
## 
## Call:
## matchit(formula = CCPmember ~ age + age.squared + below45 + above55 + 
##     race + edu + age:edu + height + weight + BMI + english + 
##     mandarin + faEdu + faCCPmember, data = data.nomissing, method = "nearest", 
##     ratio = 5, replace = TRUE)
## 
## Summary of balance for all data:
##              Means Treated Means Control SD Control Mean Diff  eQQ Med
## distance            0.3365        0.1120     0.1323    0.2245   0.2353
## age                52.0722       47.2894    14.2569    4.7829   5.0000
## age.squared      2943.9507     2439.5139  1433.2486  504.4369 485.0000
## below45FALSE        0.6544        0.5443     0.4981    0.1100   0.0000
## below45TRUE         0.3456        0.4557     0.4981   -0.1100   0.0000
## above55TRUE         0.4343        0.2873     0.4525    0.1470   0.0000
## race2               0.0057        0.0030     0.0551    0.0027   0.0000
## race3               0.0074        0.0083     0.0908   -0.0009   0.0000
## race4               0.0164        0.0216     0.1454   -0.0052   0.0000
## race5               0.0074        0.0036     0.0599    0.0038   0.0000
## race6               0.0066        0.0112     0.1054   -0.0047   0.0000
## race7               0.0074        0.0090     0.0945   -0.0016   0.0000
## race8               0.0246        0.0400     0.1961   -0.0154   0.0000
## edu.L               0.0096       -0.3099     0.3201    0.3195   0.3162
## edu.Q              -0.2166       -0.0040     0.4395   -0.2126   0.0000
## edu.C              -0.1581        0.0181     0.4596   -0.1763   0.0000
## edu^4              -0.0762       -0.0310     0.4425   -0.0452   0.0000
## height            167.7767      163.9055     7.8070    3.8712   4.0000
## weight            131.5189      121.7589    22.2139    9.7600  10.0000
## BMI                23.3063       22.5816     3.2859    0.7247   0.7840
## english.L          -0.4455       -0.5353     0.2057    0.0897   0.0000
## english.Q           0.1791        0.3443     0.3632   -0.1652   0.0000
## english.C          -0.0436       -0.1554     0.3461    0.1118   0.0000
## english^4           0.0400        0.0537     0.2719   -0.0136   0.0000
## mandarin.L          0.1643       -0.0050     0.3917    0.1693   0.3162
## mandarin.Q         -0.1420       -0.1244     0.4310   -0.0175   0.0000
## mandarin.C         -0.0314        0.0010     0.4315   -0.0324   0.0000
## mandarin^4          0.0984        0.0757     0.5034    0.0227   0.0000
## faEdu.L            -0.4577       -0.5108     0.2389    0.0531   0.0000
## faEdu.Q             0.2530        0.3154     0.3888   -0.0624   0.0000
## faEdu.C            -0.1607       -0.1677     0.3358    0.0070   0.0000
## faEdu^4             0.0425        0.0708     0.2923   -0.0283   0.0000
## faCCPmember1        0.2537        0.1511     0.3581    0.1026   0.0000
## age:edu.L          -2.0098      -16.3133    17.1550   14.3035  14.8627
## age:edu.Q         -10.4594        1.8436    22.5601  -12.3030   6.4143
## age:edu.C          -6.8342        0.2327    21.5258   -7.0670   7.5895
## age:edu^4          -2.6015       -0.5594    20.3800   -2.0421   6.2152
##              eQQ Mean  eQQ Max
## distance       0.2244   0.3748
## age            4.7808   8.0000
## age.squared  503.5148 973.0000
## below45FALSE   0.1100   1.0000
## below45TRUE    0.1100   1.0000
## above55TRUE    0.1470   1.0000
## race2          0.0025   1.0000
## race3          0.0008   1.0000
## race4          0.0057   1.0000
## race5          0.0033   1.0000
## race6          0.0049   1.0000
## race7          0.0016   1.0000
## race8          0.0156   1.0000
## edu.L          0.3191   0.6325
## edu.Q          0.2126   0.8018
## edu.C          0.1760   0.6325
## edu^4          0.1423   0.5976
## height         3.8916  19.0000
## weight         9.8046  26.0000
## BMI            0.7478   5.0821
## english.L      0.0893   0.3162
## english.Q      0.1650   0.8018
## english.C      0.1168   0.9487
## english^4      0.1070   0.5976
## mandarin.L     0.1693   0.3162
## mandarin.Q     0.0173   0.8018
## mandarin.C     0.0613   0.3162
## mandarin^4     0.0226   0.5976
## faEdu.L        0.0535   0.3162
## faEdu.Q        0.0619   0.8018
## faEdu.C        0.0335   0.6325
## faEdu^4        0.0437   0.5976
## faCCPmember1   0.1026   1.0000
## age:edu.L     14.3216  25.2982
## age:edu.Q     12.3197  37.4166
## age:edu.C      8.1944  24.0333
## age:edu^4      9.5542  24.0241
## 
## 
## Summary of balance for matched data:
##              Means Treated Means Control SD Control Mean Diff  eQQ Med
## distance            0.3365        0.3362     0.2127    0.0003   0.1322
## age                52.0722       51.8360    15.4260    0.2363   1.0000
## age.squared      2943.9507     2924.8379  1634.7174   19.1128 128.0000
## below45FALSE        0.6544        0.6373     0.4809    0.0171   0.0000
## below45TRUE         0.3456        0.3627     0.4809   -0.0171   0.0000
## above55TRUE         0.4343        0.4342     0.4957    0.0002   0.0000
## race2               0.0057        0.0061     0.0777   -0.0003   0.0000
## race3               0.0074        0.0039     0.0627    0.0034   0.0000
## race4               0.0164        0.0153     0.1227    0.0011   0.0000
## race5               0.0074        0.0039     0.0627    0.0034   0.0000
## race6               0.0066        0.0071     0.0837   -0.0005   0.0000
## race7               0.0074        0.0062     0.0788    0.0011   0.0000
## race8               0.0246        0.0220     0.1467    0.0026   0.0000
## edu.L               0.0096        0.0103     0.3476   -0.0007   0.0000
## edu.Q              -0.2166       -0.2115     0.3381   -0.0051   0.0000
## edu.C              -0.1581       -0.1546     0.4776   -0.0035   0.0000
## edu^4              -0.0762       -0.0818     0.5114    0.0056   0.0000
## height            167.7767      168.0517     7.6437   -0.2750   2.0000
## weight            131.5189      132.1369    23.1860   -0.6181   4.0000
## BMI                23.3063       23.3247     3.3201   -0.0184   0.3646
## english.L          -0.4455       -0.4414     0.2694   -0.0041   0.0000
## english.Q           0.1791        0.1801     0.4448   -0.0011   0.0000
## english.C          -0.0436       -0.0453     0.4024    0.0017   0.0000
## english^4           0.0400        0.0468     0.3692   -0.0068   0.0000
## mandarin.L          0.1643        0.1678     0.3474   -0.0035   0.0000
## mandarin.Q         -0.1420       -0.1367     0.4252   -0.0052   0.0000
## mandarin.C         -0.0314       -0.0399     0.4321    0.0085   0.0000
## mandarin^4          0.0984        0.0737     0.5081    0.0247   0.0000
## faEdu.L            -0.4577       -0.4511     0.2990   -0.0066   0.0000
## faEdu.Q             0.2530        0.2481     0.4174    0.0049   0.0000
## faEdu.C            -0.1607       -0.1631     0.3663    0.0024   0.0000
## faEdu^4             0.0425        0.0356     0.3309    0.0069   0.0000
## faCCPmember1        0.2537        0.2583     0.4378   -0.0046   0.0000
## age:edu.L          -2.0098       -2.0717    19.6979    0.0619   6.3246
## age:edu.Q         -10.4594      -10.1322    20.0345   -0.3272   1.6036
## age:edu.C          -6.8342       -6.5871    25.2091   -0.2471   5.3759
## age:edu^4          -2.6015       -2.9077    27.5137    0.3061   2.8685
##              eQQ Mean  eQQ Max
## distance       0.1268   0.2218
## age            1.3998   3.0000
## age.squared  148.7036 393.0000
## below45FALSE   0.0255   1.0000
## below45TRUE    0.0246   1.0000
## above55TRUE    0.0394   1.0000
## race2          0.0016   1.0000
## race3          0.0025   1.0000
## race4          0.0016   1.0000
## race5          0.0033   1.0000
## race6          0.0016   1.0000
## race7          0.0008   1.0000
## race8          0.0057   1.0000
## edu.L          0.1482   0.3162
## edu.Q          0.0507   0.8018
## edu.C          0.1358   0.6325
## edu^4          0.0550   0.5976
## height         1.5567   5.0000
## weight         4.1018  20.0000
## BMI            0.3444   5.0821
## english.L      0.0423   0.3162
## english.Q      0.0821   0.8018
## english.C      0.0582   0.6325
## english^4      0.0535   0.5976
## mandarin.L     0.0659   0.3162
## mandarin.Q     0.0050   0.8018
## mandarin.C     0.0231   0.3162
## mandarin^4     0.0108   0.5976
## faEdu.L        0.0265   0.3162
## faEdu.Q        0.0274   0.8018
## faEdu.C        0.0158   0.6325
## faEdu^4        0.0201   0.5976
## faCCPmember1   0.0484   1.0000
## age:edu.L      7.9431  18.0250
## age:edu.Q      3.8496  37.1493
## age:edu.C      7.0658  26.5631
## age:edu^4      3.9909  19.9603
## 
## Percent Balance Improvement:
##              Mean Diff.  eQQ Med  eQQ Mean  eQQ Max
## distance        99.8521  43.8139   43.5115  40.8121
## age             95.0597  80.0000   70.7196  62.5000
## age.squared     96.2111  73.6082   70.4669  59.6095
## below45FALSE    84.4761   0.0000   76.8657   0.0000
## below45TRUE     84.4761   0.0000   77.6119   0.0000
## above55TRUE     99.8883   0.0000   73.1844   0.0000
## race2           87.8293   0.0000   33.3333   0.0000
## race3         -272.5074   0.0000 -200.0000   0.0000
## race4           77.8883   0.0000   71.4286   0.0000
## race5            8.9217   0.0000    0.0000   0.0000
## race6           89.4219   0.0000   66.6667   0.0000
## race7           28.9864   0.0000   50.0000   0.0000
## race8           82.9613   0.0000   63.1579   0.0000
## edu.L           99.7725 100.0000   53.5395  50.0000
## edu.Q           97.6052 100.0000   76.1610   0.0000
## edu.C           98.0262   0.0000   22.8614   0.0000
## edu^4           87.6271 100.0000   61.3793   0.0000
## height          92.8952  50.0000   60.0000  73.6842
## weight          93.6674  60.0000   58.1645  23.0769
## BMI             97.4610  53.4955   53.9510   0.0000
## english.L       95.4285   0.0000   52.6163   0.0000
## english.Q       99.3361   0.0000   50.2660   0.0000
## english.C       98.4676   0.0000   50.2222  33.3333
## english^4       50.3788   0.0000   50.0000   0.0000
## mandarin.L      97.9455 100.0000   61.0429   0.0000
## mandarin.Q      70.1980   0.0000   70.8861   0.0000
## mandarin.C      73.7352   0.0000   62.2881   0.0000
## mandarin^4      -8.8771   0.0000   52.1739   0.0000
## faEdu.L         87.4880   0.0000   50.4854   0.0000
## faEdu.Q         92.1292   0.0000   55.6738   0.0000
## faEdu.C         65.6672   0.0000   52.7132   0.0000
## faEdu^4         75.7578   0.0000   53.9326   0.0000
## faCCPmember1    95.5206   0.0000   52.8000   0.0000
## age:edu.L       99.5673  57.4468   44.5379  28.7500
## age:edu.Q       97.3404  75.0000   68.7523   0.7143
## age:edu.C       96.5032  29.1667   13.7729 -10.5263
## age:edu^4       85.0082  53.8462   58.2291  16.9154
## 
## Sample sizes:
##           Control Treated
## All          7216    1218
## Matched      2660    1218
## Unmatched    4556       0
## Discarded       0       0
bal.tab(match4)
## Call
##  matchit(formula = CCPmember ~ age + age.squared + below45 + above55 + 
##     race + edu + age:edu + height + weight + BMI + english + 
##     mandarin + faEdu + faCCPmember, data = data.nomissing, method = "nearest", 
##     ratio = 5, replace = TRUE)
## 
## Balance Measures
##                 Type Diff.Adj
## distance    Distance   0.0016
## age          Contin.   0.0155
## age.squared  Contin.   0.0117
## below45       Binary  -0.0171
## above55       Binary   0.0002
## race_1        Binary  -0.0110
## race_2        Binary  -0.0003
## race_3        Binary   0.0034
## race_4        Binary   0.0011
## race_5        Binary   0.0034
## race_6        Binary  -0.0005
## race_7        Binary   0.0011
## race_8        Binary   0.0026
## edu_1         Binary  -0.0005
## edu_2         Binary  -0.0033
## edu_3         Binary   0.0067
## edu_4         Binary   0.0007
## edu_5         Binary  -0.0036
## height       Contin.  -0.0390
## weight       Contin.  -0.0279
## BMI          Contin.  -0.0056
## english_1     Binary   0.0007
## english_2     Binary   0.0059
## english_3     Binary  -0.0043
## english_4     Binary   0.0011
## english_5     Binary  -0.0034
## mandarin_1    Binary  -0.0003
## mandarin_2    Binary  -0.0039
## mandarin_3    Binary   0.0205
## mandarin_4    Binary  -0.0169
## mandarin_5    Binary   0.0007
## faEdu_1       Binary   0.0069
## faEdu_2       Binary  -0.0010
## faEdu_3       Binary   0.0023
## faEdu_4       Binary  -0.0082
## faEdu_5       Binary   0.0000
## faCCPmember   Binary  -0.0046
## 
## Sample sizes
##           Control Treated
## All          7216    1218
## Matched      2660    1218
## Unmatched    4556       0
bal.plot(match4, var.name = "age")
bal.plot(match4, var.name = "below45")
bal.plot(match4, var.name = "above55")

It seems much better now.


6 The causal effect of CCP membership

I then estimate the Pscore again and conduct the matching estimation.

ps3 <- glm(CCPmember ~ age + age.squared + below45 + above55 + race + edu + age:edu + 
          height + weight + BMI + english + mandarin + faEdu + faCCPmember, 
          family = binomial, data = data.nomissing)
summary(ps3)
## 
## Call:
## glm(formula = CCPmember ~ age + age.squared + below45 + above55 + 
##     race + edu + age:edu + height + weight + BMI + english + 
##     mandarin + faEdu + faCCPmember, family = binomial, data = data.nomissing)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8380  -0.5222  -0.3046  -0.1421   3.5133  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.631e+01  3.642e+01  -0.997 0.318776    
## age           1.114e-01  2.324e-02   4.792 1.65e-06 ***
## age.squared  -6.214e-04  1.972e-04  -3.151 0.001626 ** 
## below45TRUE   3.130e-01  1.456e-01   2.150 0.031539 *  
## above55TRUE   4.602e-01  1.359e-01   3.388 0.000705 ***
## race2         7.161e-01  5.133e-01   1.395 0.163031    
## race3        -3.870e-01  4.092e-01  -0.946 0.344327    
## race4         1.016e-01  2.710e-01   0.375 0.707757    
## race5         1.315e+00  4.955e-01   2.654 0.007963 ** 
## race6        -7.970e-02  4.110e-01  -0.194 0.846255    
## race7         2.639e-01  4.194e-01   0.629 0.529227    
## race8         4.151e-01  2.197e-01   1.889 0.058875 .  
## edu.L         6.906e+00  9.413e-01   7.336 2.19e-13 ***
## edu.Q         6.597e-01  7.721e-01   0.854 0.392858    
## edu.C         7.564e-01  5.032e-01   1.503 0.132783    
## edu^4         2.104e-02  3.189e-01   0.066 0.947394    
## height        1.628e-01  3.147e-02   5.172 2.32e-07 ***
## weight       -6.868e-02  1.990e-02  -3.451 0.000559 ***
## BMI           4.045e-01  1.122e-01   3.604 0.000313 ***
## english.L    -6.003e-01  3.609e-01  -1.663 0.096279 .  
## english.Q    -3.161e-01  2.960e-01  -1.068 0.285478    
## english.C     1.611e-01  2.360e-01   0.683 0.494862    
## english^4     3.047e-01  1.610e-01   1.892 0.058511 .  
## mandarin.L    2.552e-01  1.319e-01   1.935 0.053016 .  
## mandarin.Q   -3.220e-01  1.040e-01  -3.096 0.001960 ** 
## mandarin.C    4.184e-02  9.234e-02   0.453 0.650474    
## mandarin^4    8.687e-02  7.415e-02   1.172 0.241364    
## faEdu.L      -8.374e+00  1.139e+02  -0.074 0.941407    
## faEdu.Q      -6.628e+00  9.629e+01  -0.069 0.945121    
## faEdu.C      -3.973e+00  5.696e+01  -0.070 0.944398    
## faEdu^4      -1.724e+00  2.153e+01  -0.080 0.936197    
## faCCPmember1  3.493e-01  9.144e-02   3.819 0.000134 ***
## age:edu.L    -7.105e-02  2.470e-02  -2.877 0.004020 ** 
## age:edu.Q    -1.907e-02  2.057e-02  -0.927 0.354035    
## age:edu.C    -1.857e-02  1.287e-02  -1.443 0.148974    
## age:edu^4    -5.161e-03  6.878e-03  -0.750 0.453087    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6964.8  on 8433  degrees of freedom
## Residual deviance: 5288.5  on 8398  degrees of freedom
## AIC: 5360.5
## 
## Number of Fisher Scoring iterations: 12
pscore <- ps3$fitted.values
data.nomissing$pscore <- pscore
psm3 <- Match(Y = hincome, Tr = CCPmember, X = pscore, estimand = "ATT", 
                M = 5, replace = TRUE)
summary(psm3)
## 
## Estimate...  -4366.9 
## AI SE......  5777 
## T-stat.....  -0.75591 
## p.val......  0.4497 
## 
## Original number of observations..............  8434 
## Original number of treated obs...............  1218 
## Matched number of observations...............  1218 
## Matched number of observations  (unweighted).  16006

The estimate (ATT) is -4366.9, and the p-value is 0.54748 which is greater than 0.05. Therefore, NO evidence supports the ATT of CCPmembership on household income is significantly different from zero. I then try to replace the outcomes variable with log household income.

psm4 <- Match(Y = data.nomissing$lnhincome, Tr = CCPmember, X = pscore, estimand = "ATT", 
                M = 5, replace = TRUE)
summary(psm4)
## 
## Estimate...  0.14272 
## AI SE......  0.036001 
## T-stat.....  3.9645 
## p.val......  7.3545e-05 
## 
## Original number of observations..............  8434 
## Original number of treated obs...............  1218 
## Matched number of observations...............  1218 
## Matched number of observations  (unweighted).  16006

Now, the estimate is 0.14272, indicating that being a member of CCP raise the household income by 14%. The coefficient is significantly different from zero (\(p<0.000\)). Comparing with the pevios result, the difference suggests that CCP membership has a non-linear relationship with household income. I think the estimation of log household income is more reliable.


7 Sensitivity analysis

library(rbounds)
psens(x = psm4, Gamma = 2, GammaInc = 0.1)
## 
##  Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
##  
## Unconfounded estimate ....  0 
## 
##  Gamma Lower bound Upper bound
##    1.0           0      0.0000
##    1.1           0      0.0000
##    1.2           0      0.0000
##    1.3           0      0.0000
##    1.4           0      0.3885
##    1.5           0      0.9996
##    1.6           0      1.0000
##    1.7           0      1.0000
##    1.8           0      1.0000
##    1.9           0      1.0000
##    2.0           0      1.0000
## 
##  Note: Gamma is Odds of Differential Assignment To
##  Treatment Due to Unobserved Factors 
## 
hlsens(x = psm4, Gamma = 2, GammaInc = 0.1)
## 
##  Rosenbaum Sensitivity Test for Hodges-Lehmann Point Estimate 
##  
## Unconfounded estimate ....  0.2054 
## 
##  Gamma Lower bound Upper bound
##    1.0   0.2053500     0.20535
##    1.1   0.1053500     0.30535
##    1.2   0.0053548     0.40535
##    1.3   0.0053548     0.40535
##    1.4  -0.0946450     0.40535
##    1.5  -0.0946450     0.50535
##    1.6  -0.0946450     0.50535
##    1.7  -0.1946500     0.60535
##    1.8  -0.1946500     0.60535
##    1.9  -0.1946500     0.60535
##    2.0  -0.2946500     0.60535
## 
##  Note: Gamma is Odds of Differential Assignment To
##  Treatment Due to Unobserved Factors 
## 

The Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value shows that the upper bound become greater than 0.05 when \(\Gamma\) equals to 1.4.
Also, the Rosenbaum Sensitivity Test for Hodges-Lehmann Point Estimate shows that the lower bound become negative when \(\Gamma\) equals to 1.4.
That is when the probability of an individual being in the treatment group (aka being CCP member) 1.4 times higher because of some omitted variables, the previous conclusion of CCPmembership takes effect would not hold true.


8 Conclusion and Discussion

8.1 Main result

The result suggests that when considering of age, education, height, weight, father’s education and CCP membership, the proficiency of English and Mandarin, CCP membership can benefit the individual by increasing their household income. However, this result is a little sensitive to other unobserved, or non-included factors.
Many unobserved characters may affect personal income, and collecting and including these variables is really difficult. For example, personal ambition might affect both the probability of participate CCP and hiusehold income. However, this variable is hard to observe and measure.
Still, concluding that CCP membership has a causal effect on income is hard.

8.2 Appropriateness of the treatment variable

A good treatment variable must fit the following criteria:

1. The cause must precede the outcome
This dataset is a cross-sectional dataset. We only know that at this certain time what’s the income of the interviewee as well as whether he/she has CCP membership. However, we have no idea about when did he or she joint the CCP and what’s the income before he/she becoming a member of CCP. That is, we can not sure whether the high income happens first (and because he/she is rich or powerful, so he/she can get into the CCP), or the membership occurs first.

2. The cause should associate with the outcome
If I just run a simple t-test (and before that, run a test of homogeneity of variance first.):

var.test(hincome ~ CCPmember, data=data.nomissing)
## 
##  F test to compare two variances
## 
## data:  hincome by CCPmember
## F = 0.54861, num df = 7215, denom df = 1217, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5027726 0.5970442
## sample estimates:
## ratio of variances 
##          0.5486141
t.test(hincome ~ CCPmember, data=data.nomissing , var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  hincome by CCPmember
## t = -5.413, df = 1450.7, p-value = 7.241e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -24782.05 -11598.36
## sample estimates:
## mean in group 0 mean in group 1 
##        19314.18        37504.39
var.test(lnhincome ~ CCPmember, data=data.nomissing)
## 
##  F test to compare two variances
## 
## data:  lnhincome by CCPmember
## F = 1.3364, num df = 7215, denom df = 1217, p-value = 1.753e-10
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  1.224765 1.454413
## sample estimates:
## ratio of variances 
##           1.336436
t.test(lnhincome ~ CCPmember, data=data.nomissing , var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  lnhincome by CCPmember
## t = -25.927, df = 1812.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8900196 -0.7648358
## sample estimates:
## mean in group 0 mean in group 1 
##        9.149145        9.976573
par(mfrow=c(1,2))
boxplot(hincome ~ CCPmember, data=data.nomissing, main="Household income", 
        names=c("Control","Treatment"))
boxplot(lnhincome ~ CCPmember, data=data.nomissing, main="Household income (nature log)", 
        names=c("Control","Treatment"))

The result shows that individual with CCP membership have significantly higher income than an individual without membership (no matter the outcomes variable is household income or log housrhold income). In this case, CCP membership meets the requirement.

3. Treatment must be operation-able
Of course, CCP membership is operation-able. However, even though CCP membership takes effect on income, the estimated result still has little implication. It’s not likely that merely become a member of CCP have any magic power to enhance personal human capital. The explanation must be that the fact of one individual can hold a membership signal his or her special capability or social capital, therefore, he/she can easily get a job or promotion. If everyone can get CCP membership, the value of such membership would decrease. Even if all people in China get the membership, it’s not likely that the average salary of whole population would increase.
CCP membership as a treatment variable is different from a job training program, or medicine caring. CCP membership essentially has no power to increase income.