The dichotomous dependent variable will be called move and it is defined as whether or not the individual has moved in the past year or not. a value of 1 indicates they have moved and a value of 0 indicates they have not.
The data used for this homework assignment comes from University of Michigan’s 2009 Panel Study of Income Dynamics.
Can a person’s health and education determine how likely they are to have moved in the past year?
The two factors used to determine the outcome in this homework assignment are: health, which will have the outcomes of good, fair and poor:
table(hw1$health)
##
## fair good poor
## 624 974 278
and educ, (education) which will have the outcomes of lths (less than highschool), hs (highschool) and college (at least some college):
table(hw1$educ)
##
## college hs lths
## 600 589 691
Number within each group:
table(hw1$health, hw1$educ)
##
## college hs lths
## fair 176 210 238
## good 357 282 335
## poor 64 96 118
Proportion within each group:
round(prop.table(table(hw1$health, hw1$educ), margin = 2), digits = 3)
##
## college hs lths
## fair 0.295 0.357 0.344
## good 0.598 0.480 0.485
## poor 0.107 0.163 0.171
and finally a chi square test to test if the health status is affected by an individual’s education:
chisq.test(table(hw1$health, hw1$educ))
##
## Pearson's Chi-squared test
##
## data: table(hw1$health, hw1$educ)
## X-squared = 24.455, df = 4, p-value = 6.473e-05
For the last part, I’m not a hundred percent surehow to answer the last a,b,c and d, so I’m going to calculate descriptive statistics for unweighted, weighted and full survey design and make comparisons amongst the three.
Looking at the unweighted sample:
round(prop.table(table(hw1$health,hw1$educ), margin = 2), digits =4)
##
## college hs lths
## fair 0.2948 0.3571 0.3444
## good 0.5980 0.4796 0.4848
## poor 0.1072 0.1633 0.1708
and the weighted sample:
round(prop.table(wtd.table(hw1$health, hw1$educ, weights = hw1$wt), margin = 2), digits = 4) ### simple weighted
## college hs lths
## fair 0.2968 0.3449 0.3349
## good 0.5987 0.4877 0.5088
## poor 0.1045 0.1675 0.1563
and we can see that while the numbers are close, they are different, so the 2009, individual cross sectional weights have had an effect on the data.
Here we will examine the standard errors within the weighted sample
n <- table(is.na(hw1$health)==F)
n
##
## FALSE TRUE
## 4 1876
p <- prop.table(wtd.table(hw1$health, hw1$educ, weights = hw1$wt), margin = 2)
se <- (p*(1-p))/n[2]
stargazer(data.frame(proportion = p, se = sqrt(se)), summary = F, type = "text", digits = 4)
##
## =========================================================================
## proportion.Var1 proportion.Var2 proportion.Freq se.Var1 se.Var2 se.Freq
## -------------------------------------------------------------------------
## 1 fair college 0.2968 fair college 0.0105
## 2 good college 0.5987 good college 0.0113
## 3 poor college 0.1045 poor college 0.0071
## 4 fair hs 0.3449 fair hs 0.0110
## 5 good hs 0.4877 good hs 0.0115
## 6 poor hs 0.1675 poor hs 0.0086
## 7 fair lths 0.3349 fair lths 0.0109
## 8 good lths 0.5088 good lths 0.0115
## 9 poor lths 0.1563 poor lths 0.0084
## -------------------------------------------------------------------------
And here we will examine the difference of the full survey design and the weights: weights:
cat <- wtd.table(hw1$health, hw1$educ, weights = hw1$wt)
print(cat)
## college hs lths
## fair 1932391 1924493 2294313
## good 3897847 2721148 3485907
## poor 680154 934473 1070680
and the survey design:
dog <- svytable(~health + educ, design = des)
dog
## educ
## health college hs lths
## fair 1932391 1924493 2294313
## good 3897847 2721148 3485907
## poor 680154 934473 1070680
stargazer(data.frame(prop.table(svytable(~health + educ, design = des), margin = 2)), summary = F, type = "text", digits = 4)
##
## =======================
## health educ Freq
## -----------------------
## 1 fair college 0.2968
## 2 good college 0.5987
## 3 poor college 0.1045
## 4 fair hs 0.3449
## 5 good hs 0.4877
## 6 poor hs 0.1675
## 7 fair lths 0.3349
## 8 good lths 0.5088
## 9 poor lths 0.1563
## -----------------------
From the results of this analysis we can see that the weighted sample has slightly different proportions than does the unweighted sample and exactly the same as the full survey design. the standard error of heh weighted sample is slightly different than that of the full survey design and surprisingly, the weighted sample has slightly smaller standard errors.
In conclusion, the results of this analysis has shown that, by applying the weights from the survey, the proportion of the independent variables change (in this case, slightly), and only by doing the full survey design, do we obtain the true standard error, which must be used for any