Introduction

The aim of this analysis is to explore the associations between various U.S. household characteristics and family activities with a child’s grades in school (measured in A-D).

I used the National Household Education Surveys Program Data (NHES) for the analysis, specifically the Parent and Family Involvement in Education portion of the NHES. Most of the data is collected using self-administered questionnaires delivered and returned through the mail. I used the most recent available data at the time, which was collected in 2016, and contains responses from 14,075 collected surveys. The complete dataset can be found at https://nces.ed.gov/nhes/dataproducts.asp#2016dp.

While the survey collects over 100 variables, I chose to explore the following characteristics and their association with the grades a child recieves in school:

-household annual income -how often the family helps child with homework (from never to 5+ times a week)
-how often the family checks to see if homework is done (from never to always)
-whether the family attended various cultural/art events or institutions in the past month (defined here as visiting a bookstore, library, concert or other show, museum or gallery*)
-whether the parent is involved in the child’s school in the current school year (defined here as volunteering, fundraising, attending PTA meetings or open houses, attending school plays, dances, sporting events or fairs*)
-whether the household is single parent/guardian or not
-whether the household includes a stay-at-home parent or not

*each of these components was a separate question in the original questionnaire, and has been combined into a composite variable for the purposes of this analysis

A more detailed description of the variables in the version of the dataset used here can be found in the Appendix.

Below is the distribution of income in the dataset. The majority of repondents are in the higher income brackets:

A large percentage of households report their children receiving mostly A and B grades in school. Because this variable is self-reported, there is the danger that many families are, even unknowingly, reporting higher grades than were actually assigned. This might especially be an issue since households are asked to report not GPA’s but rather a sense that their children are receiving for instance ‘mostly B’s’.

Below are the proportions of survey respondents reporting a stay at home parent or a single parent/guardian household:

Here we can see the distribution of grades by income. Higher incomes appear to be strongly associated with children being assigned higher grades. Almost no surveyed families in higher income brackets report that their children are assigned D’s:

Single parent status seems to be closely linked with income, which makes sense as the earning potential of a one-person household is smaller:

Statistical Analysis

Correlations within the data

A chi squared test was used to test whether or not there is a significant association between certain variables. Income level was associated with parent participation in school and families attending cultural/art events. Income was also associated with single parent household status, since one parent households may often have just one income.

## Chi-square test p-value for Income and Attending Cultural Activities: 1.820021e-13
## Chi-square test p-value for Income and Parental Involvement in Schools: 3.452529e-47
## Chi-square test p-value for Income and Single Parent Status: 0

The Cramers V test (code available at https://www.r-bloggers.com/example-8-39-calculating-cramers-v/) was used to try to check how strong these existing associations are. The strongest association occurs between income and single parent status, which is not unexpected. The Cramers V value is 0.4, (values for Cramers V range from 0 to 1, with values close to 0 indicating a low association and close to 1 indicating a very high association). Because this value is not extremely high, I will keep both variables in the model.

## Cramer's V for Income and Attending Cultural/Art Activities: 0.08384082
## Cramer's V for Income and Parental Involvement in Schools: 0.1462707
## Cramer's V for Income and Single Parent Status: 0.4126172

Cumulative Logistic Regression

I applied a cumulative logistic regression model to the data. Unfortunately, according to the test provided in the package VGAM, the proportional odds assumption was not met.

## Likelihood ratio test
## 
## Model 1: grade ~ single_parent + parent_involv_school + art_event_attendance + 
##     family_checks_HW + income + family_helps_with_HW + stayhome_parent
## Model 2: grade ~ single_parent + parent_involv_school + art_event_attendance + 
##     family_checks_HW + income + family_helps_with_HW + stayhome_parent
##     #Df LogLik Df  Chisq Pr(>Chisq)   
## 1 33969 -11110                        
## 2 34009 -11145 40 70.843   0.001893 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the test is supposed to be anti-conservative, I also ran a visual check for each separate variable (very helpful code and explanation of the check here: https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/)

It seems that the issue lies mainly with the variable for income.

One alternative is to use a multinomial logistic model, or perhaps collapse the dependent variable (A’s and B’s vs C’s and D’s) and apply binary logistic regression. However, the VGAM package also allows for a partial proportinal odds model, where variables that do not meet the proportional odds assumption can be indicated as such. I therefore used the partial proportional odds model as my final model, indicating income level as not meeting the proportional odds assumption. The final model run in R is below.

partialp_inc <- vglm(grade ~ single_parent + parent_involv_school + art_event_attendance + family_checks_HW + income + family_helps_with_HW + stayhome_parent, data = pfi_pu_pert_m,
                family = cumulative(link = "logit", parallel = FALSE~ income, reverse = FALSE))

Below is a table with the estimated model coefficients as well as exponentiated coefficients (odds ratios). Intercept 1 is for odds of earning A’s vs B’s, C’s and D’s.
Intercept 2 is for odds of earning A’s and B’s vs C’s and D’s.
Intercept 3 is for odds of earning A’s and B’s and C’s vs D’s.

Model Parameter Estimate Odds Ratio P_value Significance
(Intercept):1 -1.8650247 0.1548924 0.0000000 *
(Intercept):2 0.0279738 1.0283688 0.8423046
(Intercept):3 1.8894087 6.6154561 0.0000000 *
single_parentno 0.2021130 1.2239864 0.0000098 *
parent_involv_schoolyes 0.6680950 1.9505180 0.0000000 *
art_event_attendanceyes 0.5013391 1.6509306 0.0000000 *
family_checks_HWnever 0.4948657 1.6402780 0.0000015 *
family_checks_HWrarely -0.1445409 0.8654196 0.0417536 *
family_checks_HWsometimes -0.4601868 0.6311657 0.0000000 *
income10-20K:1 0.2694610 1.3092586 0.0305238 *
income10-20K:2 0.2106136 1.2344353 0.1117887
income10-20K:3 0.2454685 1.2782200 0.3376198
income20-30K:1 0.3472524 1.4151738 0.0036832 *
income20-30K:2 0.3547370 1.4258057 0.0064380 *
income20-30K:3 0.1059756 1.1117947 0.6612234
income30-40K:1 0.4591517 1.5827307 0.0001164 *
income30-40K:2 0.3730853 1.4522082 0.0041702 *
income30-40K:3 0.3488730 1.4174691 0.1668634
income40-50K:1 0.4188250 1.5201743 0.0006505 *
income40-50K:2 0.4663943 1.5942354 0.0007101 *
income40-50K:3 0.9129936 2.4917707 0.0030215 *
income50-60K:1 0.6623186 1.9392836 0.0000001 *
income50-60K:2 0.4339959 1.5434126 0.0017382 *
income50-60K:3 0.5418764 1.7192297 0.0547913
income60-75K:1 0.8179322 2.2658099 0.0000000 *
income60-75K:2 0.7738076 2.1680055 0.0000000 *
income60-75K:3 0.8287190 2.2903828 0.0038130 *
income75-100K:1 0.9495752 2.5846116 0.0000000 *
income75-100K:2 0.8715978 2.3907276 0.0000000 *
income75-100K:3 0.9869052 2.6829186 0.0001607 *
income100-150K:1 1.1107175 3.0365362 0.0000000 *
income100-150K:2 1.2667298 3.5492269 0.0000000 *
income100-150K:3 1.9643001 7.1299203 0.0000000 *
incomeover150K:1 1.3979764 4.0470022 0.0000000 *
incomeover150K:2 1.7957085 6.0237409 0.0000000 *
incomeover150K:3 2.2038628 9.0599429 0.0000000 *
family_helps_with_HW<1/week 0.4924671 1.6363482 0.0000000 *
family_helps_with_HW1-2/week 0.2881690 1.3339827 0.0000152 *
family_helps_with_HW3-4/week 0.2131367 1.2375538 0.0016960 *
family_helps_with_HWnever 0.5133792 1.6709281 0.0000000 *
stayhome_parentno -0.2137523 0.8075484 0.0000900 *

Here is the confusion matrix for how well the model classified households. Because of the large amount of A’s and B’s in the dataset, the model assigns high probabilities to occurance of A’s and B’s. No observations were classified as a D.

##    predicted_grades
##        A    B    C
##   A 5070  878    2
##   B 2743 1111    7
##   C  743  522    7
##   D  114  145    2

Residual deviance and -2*Log Likelihood was higher for the intercept-only model.

lrtest_vglm(intercept_only,partialp_inc)
## Likelihood ratio test
## 
## Model 1: grade ~ 1
## Model 2: grade ~ single_parent + parent_involv_school + art_event_attendance + 
##     family_checks_HW + income + family_helps_with_HW + stayhome_parent
##     #Df LogLik  Df  Chisq Pr(>Chisq)    
## 1 34029 -11768                          
## 2 33991 -11121 -38 1294.4  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results of the Analysis/Odds Ratios

In general, the odds of a student receiving mostly A’s and B’s instead of C’s and D’s increase for families with higher income, two parent households, families with a stay at home parent, families that are involved in their child’s school, and families that attended culture/art events.

Odds of receiving high grades decrease the more often families help with homework.

No distinct pattern emerged for the relationship between grades and frequency of the family checking to see if homework was done.

For instance:

-Odds of a student receiving A’s or B’s instead of C’s or D’s are 23% higher for two parent households.
-Odds of a student receiving A’s or B’s instead of C’s or D’s are almost twice as high for families with parents involved in their child’s school.
-Odds of a student receiving A’s or B’s instead of C’s or D’s are 65% higher for families that attended cultural/art events or institutions.
-Odds of a student receiving A’s or B’s instead of C’s or D’s are 20% lower for households without a stay at home parent.
-Odds of a student receiving A’s or B’s instead of C’s or D’s are 67% higher for families that never help a child with their homework compared to those that always help.
-Odds of a student receiving A’s or B’s instead of C’s or D’s are 6 times higher for the highest income group compared to the lowest income group.

Conclusion

Odds ratios can be difficult to understand because they are not intuitive, so I will also use probabilities to give an idea of the estimated relationships found by the model.

In general, the probability of a student receiving mostly A’s and B’s instead of C’s and D’s increases for families with higher income, two parent households, families with a stay at home parent, families that are involved in their child’s school, and families that attend cultural/art events or institutions. The strongest effect is between income and grades.

Probability of receiving high grades decreases the more often families help with homework.

No distinct pattern emerged for the relationship between grades and frequency of the family checking to see if homework was done.

I have plotted the probability of receiving various grades below, highlighting some of the most important associations between grades and family characteristics. The estimated probabilities will change for families with different characteristics (for families in a different income bracket, for instance), though the general direction of the associations will remain the same. The plots below show probabilities of a child earning mostly A’s, B’s, C’s or D’s for a household most commonly found in the U.S. This is for a family:

-earning 50,000-60,000 annually
-with two parents/guardians living at home
-with no stay at home parent
-that has attended a cultural/art event within the past month
-that participated in the child’s school within the past month
-that usually checks if the child’s homework is done every day
-that helps the child with homework assignments less than once a week

I will change one of these variables at a time to see how the model found each characteristic to be associated with the child’s grades in school.

   

   

   

   

In conclusion, receiving higher grades in school is very strongly associated with living in a higher income household. Higher grades are also associated with parental involvement in school and with families attending culture/art events together, both of which are also linked with higher income. We cannot know from this analysis if attending culture/art events can cause a child to receive higher grades in school, or if the effect comes solely from the benefits of living in a higher income household.

Higher grades are also linked with two parent households and households with a stay at home parent, both of which again are correlated with income.

Somewhat surprisingly, the probability of getting high grades decreases the more families help a child with homework. This might be because children whose parents help them with homework don’t get an opportunity to master the material themselves, but a better explanation might be that children who need help with homework may be struggling with other issues. If a child often needs help with homework, parents may want to consider this as a signal that an intervention for an underlying problem is needed, such as a learning disability, behavioral problem, or other issues at school that the child may be facing.

Appendix

Variables in the dataset:

Variable_Names Variable_Meaning Variable_Levels
single_parent single parent no
yes
parent_involv_school parent involvement in school no
yes
art_event_attendance attended culture/art events no
yes
family_checks_HW family checks homework never
rarely
sometimes
always
income total annual income under 10,000
10,001-20,000
20,001-30,000
30,001-40,000
40,001-50,000
50,001-60,000
60,001-75,000
75,001-100,000
100,001-150,000
150,000+
family_helps_with_HW family helps with homework less than once a week
1-2 times a week
3-4 times a week
5 or more times a week
never
stayhome_parent stay at home parent no
yes