For this short report, I use the technique of Principal Components Analysis (PCA) to form an index of student engagement in education using data from the 2019 National Household Education Survey (NHES), made available by the National Center for Education Statistics (NCES). The index will be comprised of 5 variables: - Days absent - Hours spent completing homework - Enjoyment of school - Academic performance - Academic engagement
Data was filtered to include children only enrolled in public schools, grades 6-8 (middle school).
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
## Rows: 16446 Columns: 828
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (828): BASMID, ALLGRADEX, EDCPUB, EDCCAT, EDCREL, EDCPRI, EDCINTK12, EDC...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
In examining the summary table of eigenvalues, the first three components of the PCA model account for over 80% of the variation in the input variables. The first variable has an eigenvalue greater than 2, and the next two contributing variables have eigenvalues very close to 1.
According to summary statistics, academic engagement and academic performance account for the most variation in the index. Number of hours spent on homework and number of days absent are weaker explanatory variables for the overarching latent variable, student engagement. Number of hours spent on homework is, in fact, negatively correlated to the other factors indicating a poor fit of this variable for the index.
## Warning: package 'FactoMineR' was built under R version 4.1.3
##
## Call:
## PCA(X = nhes19_pca.2[, c(1:5)], scale.unit = T, graph = F)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## Variance 2.115 0.972 0.935 0.717 0.261
## % of var. 42.291 19.442 18.704 14.339 5.224
## Cumulative % of var. 42.291 61.733 80.437 94.776 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## 1 | 1.821 | -1.287 0.076 0.500 | 0.879 0.077 0.233 | 0.064
## 2 | 1.358 | -0.256 0.003 0.036 | -0.243 0.006 0.032 | -0.259
## 3 | 1.740 | 0.271 0.003 0.024 | 0.337 0.011 0.037 | -0.599
## 4 | 1.928 | 0.135 0.001 0.005 | -0.990 0.098 0.264 | 1.496
## 5 | 6.178 | 3.761 0.650 0.371 | 0.710 0.050 0.013 | 4.835
## 6 | 1.583 | -1.210 0.067 0.584 | 0.426 0.018 0.072 | 0.008
## 7 | 2.440 | -2.042 0.192 0.700 | 1.260 0.159 0.267 | 0.176
## 8 | 1.459 | -1.132 0.059 0.603 | -0.028 0.000 0.000 | -0.048
## 9 | 1.075 | -0.375 0.006 0.122 | -0.609 0.037 0.320 | -0.358
## 10 | 2.312 | 0.820 0.031 0.126 | -0.271 0.007 0.014 | 1.226
## ctr cos2
## 1 0.000 0.001 |
## 2 0.007 0.036 |
## 3 0.037 0.119 |
## 4 0.232 0.602 |
## 5 2.429 0.612 |
## 6 0.000 0.000 |
## 7 0.003 0.005 |
## 8 0.000 0.001 |
## 9 0.013 0.111 |
## 10 0.156 0.281 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
## daysabsnt | 0.351 5.821 0.123 | -0.078 0.623 0.006 | 0.931 92.594 0.866
## hwkhrswk | -0.243 2.783 0.059 | 0.962 95.175 0.925 | 0.117 1.455 0.014
## schlenjoy | 0.656 20.353 0.430 | 0.048 0.233 0.002 | -0.036 0.138 0.001
## grades | 0.858 34.845 0.737 | 0.174 3.100 0.030 | -0.111 1.313 0.012
## engage | 0.875 36.198 0.765 | 0.092 0.869 0.008 | -0.205 4.500 0.042
##
## daysabsnt |
## hwkhrswk |
## schlenjoy |
## grades |
## engage |
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 2.1145631 42.291262 42.29126
## comp 2 0.9721076 19.442152 61.73341
## comp 3 0.9351943 18.703887 80.43730
## comp 4 0.7169327 14.338654 94.77595
## comp 5 0.2612023 5.224045 100.00000
## $coord
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## daysabsnt 0.3508348 -0.07779619 0.93055380 -0.06166391 0.03361407
## hwkhrswk -0.2425999 0.96187562 0.11664665 0.03551630 0.03275247
## schlenjoy 0.6560264 0.04755240 -0.03592329 0.75151121 -0.03617294
## grades 0.8583840 0.17360181 -0.11081292 -0.31256206 -0.35080591
## engage 0.8748883 0.09193390 -0.20515247 -0.22227164 0.36691465
##
## $cor
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## daysabsnt 0.3508348 -0.07779619 0.93055380 -0.06166391 0.03361407
## hwkhrswk -0.2425999 0.96187562 0.11664665 0.03551630 0.03275247
## schlenjoy 0.6560264 0.04755240 -0.03592329 0.75151121 -0.03617294
## grades 0.8583840 0.17360181 -0.11081292 -0.31256206 -0.35080591
## engage 0.8748883 0.09193390 -0.20515247 -0.22227164 0.36691465
##
## $cos2
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## daysabsnt 0.12308504 0.006052247 0.865930374 0.003802438 0.001129906
## hwkhrswk 0.05885471 0.925204715 0.013606440 0.001261407 0.001072724
## schlenjoy 0.43037070 0.002261231 0.001290483 0.564769104 0.001308482
## grades 0.73682308 0.030137588 0.012279503 0.097695044 0.123064789
## engage 0.76542958 0.008451842 0.042087535 0.049404683 0.134626363
##
## $contrib
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## daysabsnt 5.820826 0.6225902 92.5936291 0.5303759 0.4325789
## hwkhrswk 2.783304 95.1751322 1.4549318 0.1759450 0.4106872
## schlenjoy 20.352701 0.2326111 0.1379909 78.7757515 0.5009457
## grades 34.845168 3.1002317 1.3130429 13.6268087 47.1147483
## engage 36.198001 0.8694348 4.5004053 6.8911189 51.5410399
## $quanti
## correlation p.value
## engage 0.8748883 0.000000e+00
## grades 0.8583840 5.747323e-300
## schlenjoy 0.6560264 1.189123e-127
## daysabsnt 0.3508348 3.602935e-31
## hwkhrswk -0.2425999 3.001607e-15
##
## attr(,"class")
## [1] "condes" "list"
Below are the screeplot and radial plots displaying the eigenvalues configured for this PCA.
## Warning: package 'factoextra' was built under R version 4.1.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa