Yun Mai
May 15, 2017
knitr::opts_chunk$set(echo = TRUE)
install.packages(c('devtools','openintro'))
devtools::install_github('jbryer/DATA606',force = TRUE)
install.packages("tibble", repos=c("http://rstudio.org/_packages", "http://cran.rstudio.com"))
Load packages
library(RCurl)
## Loading required package: bitops
library(tibble)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
library(stringr)
library(knitr)
library(ggplot2)
library(DATA606)
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
A lot of studies have documented the importance of parent involvement for early childhood education. Parent involvement in a child's education is found to be positively associated with a child's academic performance and social functioning. The goal of this project is to find out whether there is statistically significant association between parent involvement and a child's academic performance from the data collected from Early Childhood Longitudinal Study (ECLS) program conducted by National Center for Educational Statistics (NCES) from 2010 through 2012.
Parent involvement will be categorized into three portions:
Q1.1. Do parent involvement in preschool education influence child early academy outcomes?
Taking the first set of explanatory variables, I will study contemporaneous association between parent involvement in school activities and reading achievement for second grade.
Q1.2. Do parent involvement in school activities influence child early academy outcomes?
Taking the second set of explanatory variables, I will take the advantage of longitudinal follow-up data to investigate the associations between parent involvement in preschool and kindergarten and reading achievement in later grade.
Q1.3. Do parent involvement in summer learning influence child early academy outcomes?
Taking the third set of explanatory variables, I will study contemporaneous association between parent involvement in summer learning and reading achievement for second grade.
Q2. Does school have a rule of that student enter kindergarten must turn 5 before September 1st have effects on children's academic performance in the early childhood?
Q3. Does parents' education level have effects on children's academic performance in the early childhood?
The ECLS-K:2011 is the third and latest study in the Early Childhood Longitudinal Study (ECLS) program collected by the National Center for Educational Statistics (NCES) from 2010 fall to 2013 spring. Survy were conduct every semester and data from the 2010 fall, 2011 spring, 2012 spring, and 2013 spring are more completed.
Data is collected by NCES and is available online here: https://nces.ed.gov/ecls/dataproducts.asp. For this project, data were download and loaded into SPSS as the website suggests, cleaned up (remove most of the repressed variables) and converted to CSV files.
The manual "ECLS-K:2011 Kindergarten User's Manual, Public Version PDF File" and "ECLS-K:2011 Kindergarten-Second Grade User's Manual, Public Version PDF File" is available here: https://nces.ed.gov/ecls/dataproducts.asp. The Eclectronic Code Book could be downloaded from the website too.
The data is from Eearly Childhood Longitutinal Study following a cohort of children from their kindergarten year (the 2010-11 school year, referred to as the base year) through the 2011-12 school year. The sample includes both children who were in kindergarten for the first time and those who were repeating kindergarten during 2010-11. Students from about 1,310 schools and their parents, teachers, school administrators, and before- and after-school care providers participated in the study. Each case represents children who participated, or whose parent participated, in at least one of the two kindergarten data collections (Fall 2010 or Spring 2011).
There are currently 18174 cases in the earlychildhood dataset under consideration. However after data cleaning and considering only complete cases , the number of cases may reduce a bit.
Response variables
For both of the questions, the response variables are the scores for four academic assessments: reading, math, science, Dimensional Change Card Sort(DCCS, for kindergarten and first grade), and DCCS computed(adjusted DCCS for second grade). They are numerical variables. This porject will foncus on the second grade results.
Explanatory variables
For Q1.1, the explanatory variable is September 1st cutoff. It is categorical variable. For Q1.2, the explanatory variable is parents' highest education level. I would take this variable as numerical variable. American education can be convert as following: a high school level of education is equivalent to 12 years'; an Associate's Degree is equivalent to 14 years', a B.S./B.A. is equivalent to 16 years', etc. a Master's Degree is equivalent to 17-18 years', and Master beyondis equivalent to >= 20 years'.
For Q2, explanatory variables are parent involvment related to early literacy: parent volunteers regularly(Prthlp),(PrtW)
The explanatory variables in the dataset are listed in the table below :
X | Name | Lables | Units | Levels |
---|---|---|---|---|
1 | CHILDID | CHILD IDENTIFICATION NUMBER | NA | |
2 | K1_AGE | CHILD ASSESSMENT AGE: Kindergarten_1 | years | NA |
3 | K2_AGE | CHILD ASSESSMENT AGE: Kindergarten_2 | years | NA |
4 | G1_2_AGE | CHILD ASSESSMENT AGE: Grade1_2 | NA | |
5 | G2_2_AGE | CHILD ASSESSMENT AGE: Grade2_2 | NA | |
6 | CHSEX | CHILD COMPOSITE SEX | NA | |
7 | K1_READ | READING: Kindergarten_1 | NA | |
8 | K2_READ | READING: Kindergarten_2 | NA | |
9 | G1_2_READ | READING: Grade1_2 | NA | |
10 | G2_2_READ | READING: Grade2_2 | NA | |
11 | K1_Math | MATH: Kindergarten_1 | NA | |
12 | K2_Math | MATH: Kindergarten_2 | NA | |
13 | G1_2_Math | MATH: Grade1_2 | NA | |
14 | G2_2_Math | MATH: Grade2_2 | NA | |
15 | K2_SCI | SCIENCE: Kindergarten_2 | NA | |
16 | G1_2_SCI | SCIENCE: Grade1_2 | NA | |
17 | G2_2_SCI | SCIENCE: Grade2_2 | NA | |
18 | K1_DCCSTOT | Dimensional Change Card Sort: Kindergarten_1 | NA | |
19 | K2_DCCSTOT | Dimensional Change Card Sort: Grade1_2 | NA | |
20 | G1_2_DCCSTOT | Dimensional Change Card Sort: Grade2_2 | NA | |
21 | G2_2_DCCSSCR | Dimensional Change Card Sort computed: Grade2_2 | NA | |
22 | K2_GIFK | GIFTED-TALENT NOT OFFERED IN K | 2 | |
23 | G1_2_GIFK | GIFTED-TALENT NOT OFFERED IN G1 | 2 | |
24 | K2_GIFS | GIFTED-TALENT NOT OFFERED AT SCHOOL | 2 | |
25 | G1_2_GIFS | GIFTED-TALENT NOT OFFERED AT SCHOOL | 2 | |
26 | G2_2_Sep1Cut | A CUTOFF DATE FOR CHILD TO TURN FIVE TO ENTER KINDERGARTEN | 2 | |
27 | G2_2_Sep1Cut_t | convert G2_2_Sep1Cut value to 1 or 0 | 2 | |
28 | K2_CLSI | CLASS SIZES DECREASED: Kindergarten_2 | 2 | |
29 | G1_2_CLSI | CLASS SIZES DECREASED: Grade1_2 | 2 | |
30 | G2_2_CLSI | CLASS SIZES DECREASED:Grade2_2 | 2 | |
31 | PreK | PRESCHOOL GOOD FOR KINDERGARTEN | 5 | |
32 | PreKt | convert PreK value to number | 5 | |
33 | K1_LOC | LOCATION TYPE OF SCHOOL: Kindergarten_1 | 4 | |
34 | K2_LOC | LOCATION TYPE OF SCHOOL: Kindergarten_2 | 4 | |
35 | G1_2_LOC | LOCATION TYPE OF SCHOOL: Grade1_2 | 4 | |
36 | G2_2_LOC | LOCATION TYPE OF SCHOOL: Grade1_2 | 4 | |
37 | K1_LOCt | convert K1_LOC value to number: 1=CITY, 2=SUBURB, 3=TOWN, 4=RURAL | 4 | |
38 | K2_LOCt | convert K2_LOC value to number: 1=CITY, 2=SUBURB, 3=TOWN, 4=RURAL | 4 | |
39 | G1_2_LOCt | convert G1_2_LOCC value to number: 1=CITY, 2=SUBURB, 3=TOWN, 4=RURAL | 4 | |
40 | G2_2_LOCt | convert G2_2_LOC value to number: 1=CITY, 2=SUBURB, 3=TOWN, 4=RURAL | 4 | |
41 | K2_Prthlp | PARENT VOLUNTEERS REGULARLY: Kindergarten_2 | 6 | |
42 | G1_2_Prthlp | PARENT VOLUNTEERS REGULARLY: Grade1_2 | 6 | |
43 | G2_2_Prthlp | PARENT VOLUNTEERS REGULARLY: Grade2_2 | 6 | |
44 | K1_PrtW | PARENTS WORK HOURS PER WEEK: Kindergarten_1 | 2 | |
45 | G1_2_PrtW | PARENTS WORK HOURS PER WEEK: Grade1_2 | 2 | |
46 | G2_2_PrtW | PARENTS WORK HOURS PER WEEK: Grade2_2 | 2 | |
47 | PrtEDU | HIGHEST EDUCATION LEVEL PARENTS ACHIEVED | 7 | |
48 | PrtEDUt | convert PrtEDU to number: higher number means higher education level | 7 | |
49 | K2_Prtconf | PARENT ATTEND OPEN HOUSE/PARTY: Kindergarten_2 | 5 | |
50 | G1_2_Prtconf | PARENT ATTEND CONFERENCES: Grade1_2 | 5 | |
51 | G2_2_Prtconf | PARENT ATTEND OPEN HOUSE/PARTY:Grade2_2 | 5 | |
52 | K2_Prtoph | PARENT ATTEND OPEN HOUSE/PARTY: Kindergarten_2 | 5 | |
53 | G1_2_Prtoph | PARENT ATTEND OPEN HOUSE/PARTY: Grade1_2 | 5 | |
54 | G2_2_Prtoph | PARENT ATTEND OPEN HOUSE/PARTY:Grade2_2 | 5 | |
55 | G1_2_Prtevt | PARENT ATTEND ART/MUSIC EVENT: Grade1_2 | 5 | |
56 | G2_2_Prtevt | PARENT ATTEND ART/MUSIC EVENT: Grade2_2 | 5 | |
57 | G1_2_Prtpsk | PARENT OPINOION ON HAVING CHILDREN IN PRESCHOOL | 2 | |
58 | G2_2_Prtpsk | PARENT OPINOION ON HAVING CHILDREN IN PRESCHOOL | 2 | |
59 | Prtpremth | PARENT OPINOION ON HAVING PRESCH RD/MATH GOOD FOR SCHOOL | 5 | |
60 | Prtprelitr | PARENT OPINOION ON HAVING CHILD KNOW ALPHABET BEFORE K | 5 | |
61 | Prthw | PARENT PROVIDES HOMEWORK TIME | 5 | |
62 | Prtrc | PARENT SHD READ/COUNT WITH CHILD | 5 | |
63 | Sumsch | CHILD ATTENDED SUMMER SCHOOL | 2 | |
64 | Summth | DO MATH ACTVTY WITH CHILD IN SUMMER | 5 | |
65 | Sumwrt | DO WRITING ACTVTY WITH CHILD IN SUMMER | 5 |
The data used in this project comes from a longitudinal study in which data is gathered for the same subjects repeatedly over a period of time. The subjects were interviewed without any assignment process or experimental design. The data collection process does not interfere with how the data arise, thus it can be classified as an observational study.
The population of interest for ECLS pragram consists of all US students from kindergarten through the fifth grade (*only kindergarten through the second grade data from ECLS pragram were sucessful achieved and used in this project.). Since the respondents where randomly selected then the findings can be generalized.
According to DATA QUALITY AND COMPARABILITY of ECLS pragram on NSES website, the potential errors and bias of the study includ: respondent bias, coverage errors and bias, and nonresponse errors and bias.
One potential source of respondent bias in the ECLS surveys is social desirability bias, when respondents systematically misreport (intentionally or unintentionally) information in a study.
To evaluate and minimize the error and bias, the researchers use the following methods:
"In order to minimize bias, all items were subjected to multiple cognitive interviews and field tests, and actual teachers were involved in the design of the cognitive assessment battery and questionnaires. NCES also followed the criteria recommended in a working paper on the accuracy of teachers' judgments of students' academic performances (see Perry and Meisels 1996)."
By designing the child assessments to be both individually administered and untimed, both coverage error and bias were reduced.
Three methods had been used to to determine if substantial bias was introduced into the data from the kindergarten collections as a result of nonresponse. Findings from these analyses suggest that there is not a substantial bias in the kindergarten year due to nonresponse after adjusting for that nonresponse.
The ECLS program design minimizes potential bias so for the purpose of this project the ECLS results will be considered fully generalizable.
Since it is observation data, and not the experiment, there can be no causality, but only correlation.
# Load data from Github.
url <- getURL("https://raw.githubusercontent.com/YunMai-SPS/DA606/master/final_project/ECLS_2011_K2.csv")
library(data.table)
## -------------------------------------------------------------------------
## data.table + dplyr code now lives in dtplyr.
## Please library(dtplyr)!
## -------------------------------------------------------------------------
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
earlyedu <- fread(url, header = T, sep = ',')
kable(head(earlyedu))
CHILDID | X6AGE | X5AGE | X4AGE | X3AGE | X2KAGE_R | X1KAGE_R | X_CHSEX_R | X1RTHETK2 | X2RTHETK2 | X3RTHETK2 | X4RTHETK2 | X5RTHETK2 | X6RTHETK2 | X1MTHETK2 | X2MTHETK2 | X3MTHETK2 | X4MTHETK2 | X5MTHETK2 | X6MTHETK2 | X2STHETK2 | X3STHETK2 | X4STHETK2 | X5STHETK2 | X6STHETK2 | X1DCCSTOT | X2DCCSTOT | X3DCCSTOT | X4DCCSTOT | X5DCCSSCR | X6DCCSSCR | X6REGION | X5REGION | X4REGION | X3REGION | X2REGION | X1REGION | X6LOCALE | X5LOCALE | X4LOCALE | X3LOCALE | X2LOCALE | X1LOCALE | X6PAR1EMP_I | X4PAR1EMP_I | X1PAR1EMP | X4PAR1ED_I | X6PAR1OCC_I | X4PAR1OCC_I | X1PAR1OCC_I | X6PAR2OCC_I | X4PAR2OCC_I | X1PAR2OCC_I | X6PAR1SCR_I | X4PAR1SCR_I | X1PAR1SCR_I | X6DISTPOV | X4DISTPOV | X_DISTPOV | S2GIFNOG | S2GIFNO | S4GIFNO | S4GIFNOG | S6GIFNOG | S6GIFNO | A1ATNDPR | S6TTLPRE | S4TTLPRE | A1INKNDR | A1VSTK | A1SHRTN | A1STAGGR | A1PRNTOR | A1HMEVST | A1COMM | A1IDCOLO | A1FOLWDR | A1ALPHBT | A1SITSTI | A1SENSTI | A1ENGLAN | A1NOTDSR | A1PENCIL | A1PRBLMS | A1SHARE | A1CNT20 | A1FNSHT | S6CCLSDE | S6TT1CLA | S4TT1CLA | S2TT1CLA | A1FRMLIN | A1ALPHBF | A1LRNREA | A1TCHPRN | A1PRCTWR | A1HMWRK | A1READAT | A1CNTRLC | A1ENJOY | A1MKDIFF | A1TEACH | A1CLSSIZ | A1NATEXM | A1EARLY | A1ELEM | A1DEVLP | A1MTHDRD | A1MTHDMA | A1MTHDSC | A1RSPINT | A1INTSRV | A1STATCT | A1HIGHQL | A1YRBORN | A1HGHSTD | A1HGHPAR | A1YRSCH | P3DOMATH | P3DOWRIT | P3RDBKTC | P3HWLGRD | P3RDALON | P3COMEDU | P3OUTACT | P3TVHR | P3TVMIN | P3VIDHR | P3VIDMIN | P3VISLIB | P3STHLIB | P3SUMBK | P3SUMRD | P3ARTMUS | P3ZOOS | P3AMUSPK | P3BEACHS | P3PLYCRT | P3LRGCTY | P3SUMSCH | P3NDYPRM | P3NHRPRM | P3SMREAD | P3SMMATH | P3SMSCI | P3SMART | P3SMMUSI | P3SMCMPT | P3SMREQ | P3DONCMP | P3NUMCMP | P3NMDCMP | P3NMHCMP | P3NMWCMP | P3CMPSPT | P3CMPART | P3CMPCPT | P3CMPACA | P3CMPMPA | P3CMPSUP | P3TUTOR | A2REGHLP | A4REGHLP | A6REGHLP | A2TPCONF | A4TPCONF | A6TPCONF | A2ATTOPN | A4ATTOPN | A6ATTOPN | VAR00001 | A4ATTART | A6ATTART |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10000001 | 103.66 | NA | 91.66 | NA | 79.76 | 72.39 | 1 | 0.1930 | 1.7614 | NA | 3.1523 | NA | 3.3637 | 0.5827 | 1.5384 | NA | 2.7286 | NA | 3.2950 | 1.3738 | NA | 3.3354 | NA | 2.5071 | 16 | 17 | NA | 16 | NA | 7.3280 | -2 | -2 | -2 | -2 | -2 | -2 | 4 | NA | 4 | NA | 4 | 4 | 4 | 4 | 4 | 5 | -1 | -1 | -1 | 19 | 19 | 19 | -1.00 | -1.00 | -1.00 | 38 | 39 | 45 | -9 | -9 | 1 | 0 | 0 | 0 | 3 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 3 | 2 | 3 | 2 | 3 | 3 | 2 | 3 | 3 | 2 | 3 | 2 | 3 | 2 | 1 | 1 | 2 | 3 | 2 | 4 | 4 | 3 | 2 | 4 | 4 | 5 | 5 | 4 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 3 | 7 | 5 | 6 | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | 2 | 2 | 5 | 5 | 5 | 3 | 5 | 3 | 3 | 5 | 5 |
10000002 | NA | NA | 89.56 | NA | 77.52 | 71.41 | 2 | -0.7870 | 0.7452 | NA | 2.2023 | NA | NA | -0.3473 | 0.8800 | NA | 2.0825 | NA | NA | 0.8811 | NA | 1.3122 | NA | NA | 17 | 16 | NA | 16 | NA | NA | -2 | -2 | -2 | -2 | -2 | -2 | NA | NA | 2 | NA | 2 | 2 | NA | 4 | 4 | 5 | NA | -1 | -1 | NA | 1 | 1 | NA | -1.00 | -1.00 | NA | 7 | 7 | 1 | 1 | 1 | 0 | NA | NA | 3 | NA | -1 | 1 | 1 | 2 | 2 | 1 | 2 | 4 | 3 | 4 | 3 | 3 | 3 | 4 | 4 | 2 | 4 | 5 | 3 | 3 | NA | NA | -1 | 2 | 2 | 3 | 4 | 2 | 2 | 1 | 5 | 5 | 5 | 5 | 5 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 5 | 5 | 27 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 4 | NA | 5 | 5 | NA | 5 | 4 | NA | NA | 4 | NA |
10000003 | 96.26 | NA | 84.33 | NA | 73.51 | NA | 1 | NA | 0.3323 | NA | 1.1861 | NA | 2.0689 | NA | 1.0112 | NA | 2.2080 | NA | 2.8384 | 0.4244 | NA | 2.2479 | NA | 2.3529 | NA | 14 | NA | 16 | NA | 8.2395 | -2 | -2 | -2 | -2 | -2 | -2 | 2 | NA | 2 | NA | 2 | NA | 1 | 1 | NA | 9 | 7 | 7 | NA | -1 | -1 | NA | 77.50 | 77.50 | NA | 12 | 11 | 9 | 2 | 1 | 1 | 0 | 0 | 1 | NA | -1 | -1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | -1 | -1 | -1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 2 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
10000004 | 101.19 | NA | 89.23 | NA | 78.84 | 72.39 | 1 | -1.7087 | -0.2922 | NA | 0.8632 | NA | 1.5117 | -2.1245 | -0.2834 | NA | 1.1082 | NA | 2.8392 | 0.6509 | NA | 0.2933 | NA | 1.7845 | 15 | 14 | NA | 17 | NA | 7.2868 | -2 | -2 | -2 | -2 | -2 | -2 | 4 | NA | 4 | NA | 4 | 4 | NA | 1 | 1 | 5 | NA | 13 | 13 | NA | 16 | 16 | NA | 38.18 | 38.18 | 20 | 20 | 17 | -9 | -9 | 0 | 0 | 0 | 0 | 5 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | 5 | 4 | 4 | 5 | 5 | 3 | -9 | 5 | 5 | 3 | 4 | 4 | 4 | 1 | 1 | 2 | 2 | 5 | 5 | 5 | 4 | 4 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 5 | 5 | 5 | 6 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 5 | 3 |
10000005 | NA | NA | NA | NA | 65.06 | 59.51 | 2 | -0.0734 | 0.8013 | NA | NA | NA | NA | -0.5545 | 0.6163 | NA | NA | NA | NA | 0.3996 | NA | NA | NA | NA | 14 | 17 | NA | NA | NA | NA | -2 | -2 | -2 | -2 | -2 | -2 | NA | NA | NA | NA | 2 | 2 | NA | NA | 1 | NA | NA | NA | 4 | NA | NA | 1 | NA | NA | 59.00 | NA | NA | -1 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | 1 | 1 | 2 | 2 | 1 | 2 | 5 | 4 | 5 | 5 | 5 | 4 | 4 | 5 | 4 | 3 | 5 | 4 | 4 | NA | NA | NA | NA | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 5 | 5 | 5 | 5 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 2 | 5 | 4 | 3 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | NA | NA | 5 | NA | NA | 5 | NA | NA | NA | NA | NA |
10000006 | 107.51 | NA | 94.82 | NA | 82.62 | 75.78 | 2 | -1.4601 | -0.6792 | NA | 1.0132 | NA | 1.9750 | -1.0616 | -0.5379 | NA | 0.4484 | NA | 1.0869 | -0.6320 | NA | -0.7748 | NA | -0.2110 | 16 | 14 | NA | 18 | NA | 5.3872 | -2 | -2 | -2 | -2 | -2 | -2 | 4 | NA | 4 | NA | 4 | 4 | 3 | 1 | 1 | 3 | 20 | 19 | 19 | -1 | -1 | -1 | 35.92 | 33.42 | 33.42 | 10 | 12 | 11 | -9 | -9 | 0 | 0 | 0 | 0 | 4 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 4 | 4 | 4 | 3 | 4 | 4 | 3 | 3 | 4 | 3 | 4 | 3 | 4 | 1 | 1 | 1 | 2 | 3 | 3 | 4 | 4 | 3 | 2 | 4 | 4 | 5 | 4 | 4 | 5 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 8 | 5 | 2 | 8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 2 | 2 | 5 | 5 | 5 | 2 | 5 | 2 | 2 | 5 | 5 |
# Load value lables from Github.
lable <- read.csv("https://raw.githubusercontent.com/YunMai-SPS/DA606/master/final_project/variablelable.csv")
kable(head(lable,n=10))
Variable | Position | Label | Measurement.Level | Role | Column.Width | Alignment | Print.Format | Write.Format |
---|---|---|---|---|---|---|---|---|
CHILDID | 1 | CHILD IDENTIFICATION NUMBER | Nominal | Input | 10 | Left | A8 | A8 |
X6AGE | 2 | X6 CHILD ASSESSMENT AGE(MNTHS) | Scale | Input | 9 | Right | F7.2 | F7.2 |
X5AGE | 3 | X5 CHILD ASSESSMENT AGE(MNTHS) | Scale | Input | 9 | Right | F7.2 | F7.2 |
X4AGE | 4 | X4 CHILD ASSESSMENT AGE(MNTHS) | Scale | Input | 8 | Right | F6.2 | F6.2 |
X3AGE | 5 | X3 CHILD ASSESSMENT AGE(MNTHS) | Scale | Input | 8 | Right | F6.2 | F6.2 |
X2KAGE_R | 6 | X2 CHILD ASSESSMENT AGE(MNTHS)-REV | Scale | Input | 10 | Right | F6.2 | F6.2 |
X1KAGE_R | 7 | X1 CHILD ASSESSMENT AGE(MNTHS)-REV | Scale | Input | 10 | Right | F6.2 | F6.2 |
X_CHSEX_R | 8 | CHILD COMPOSITE SEX - REVISED | Scale | Input | 11 | Right | F2 | F2 |
X1RTHETK2 | 9 | X1 READING THETA-K2 DATA FILE | Scale | Input | 11 | Right | F8.4 | F8.4 |
X2RTHETK2 | 10 | X2 READING THETA-K2 DATA FILE | Scale | Input | 11 | Right | F8.4 | F8.4 |
value <- read.csv("https://raw.githubusercontent.com/YunMai-SPS/DA606/master/final_project/variablevalue.csv")
kable(head(value, n=10))
Value | Label |
---|---|
X6AGE | -9: NOT ASCERTAINED |
X5AGE | -9: NOT ASCERTAINED |
X4AGE | -9: NOT ASCERTAINED |
X3AGE | -9: NOT ASCERTAINED |
X2KAGE_R | -9: NOT ASCERTAINED |
X1KAGE_R | -9: NOT ASCERTAINED |
X_CHSEX_R | -9: NOT ASCERTAINED |
X_CHSEX_R | 1: MALE |
X_CHSEX_R | 2: FEMALE |
X1RTHETK2 | -9: NOT ASCERTAINED |
Select the semester for anlysis.The prefix of the variables name could be infered from the table below. The prefix are 1:2010 fall(K1): 2011 spring(K2); 3, 2011 fall(11f); 4: 2012 sprinsg(G1_2); 5: 2012 fall(12f): 6: 2013 spring(G2_2).
# There are a lot of columns in these data frames. I will subset the dataframes to get the variables relevant to the analysis and also rename the columns for reading convenience.
# The data were collected through 6 consecutive semesters. In some semesters the survay were only done in one thirds of students.
# The data from the survay with same sample size will be selected for this project. I will subset the data to get the survey results from 4 semesters: Kindergarten 1st semester(1), Kindergarten 2nd semester(2), first grade 2nd semester(4), second grade 2nd semester(6).
# reading scores
myreading <- c("CHILDID","X1RTHETK2", "X2RTHETK2", "X4RTHETK2", "X6RTHETK2")
reading <- subset(earlyedu,,myreading)
reading$X1RTHETK2 <- as.character(reading$X1RTHETK2)
reading$X2RTHETK2 <- as.character(reading$X2RTHETK2)
reading$X4RTHETK2 <- as.character(reading$X4RTHETK2)
reading$X6RTHETK2 <- as.character(reading$X6RTHETK2)
reading$X1RTHETK2 <- str_replace_all(reading$X1RTHETK2,"(^\\-9)","")
reading$X2RTHETK2 <- str_replace_all(reading$X2RTHETK2,"(^\\-9)","")
reading$X4RTHETK2 <- str_replace_all(reading$X4RTHETK2,"(^\\-9)","")
reading$X6RTHETK2 <- str_replace_all(reading$X6RTHETK2,"(^\\-9)","")
reading$X1RTHETK2 <- as.numeric(reading$X1RTHETK2)
reading$X2RTHETK2 <- as.numeric(reading$X2RTHETK2)
reading$X4RTHETK2 <- as.numeric(reading$X4RTHETK2)
reading$X6RTHETK2 <- as.numeric(reading$X6RTHETK2)
colnames(reading) <- c("CHILDID", "K1_READ", "K2_READ", "G1_2_READ", "G2_2_READ")
# math scores
mymath <- c("CHILDID","X1MTHETK2", "X2MTHETK2", "X4MTHETK2", "X6MTHETK2")
math <- subset(earlyedu,,mymath)
math$X1MTHETK2 <- as.character(math$X1MTHETK2)
math$X2MTHETK2 <- as.character(math$X2MTHETK2)
math$X4MTHETK2 <- as.character(math$X4MTHETK2)
math$X6MTHETK2 <- as.character(math$X6MTHETK2)
math$X1MTHETK2 <- str_replace_all(math$X1MTHETK2,"(^\\-9)","")
math$X2MTHETK2 <- str_replace_all(math$X2MTHETK2,"(^\\-9)","")
math$X4MTHETK2 <- str_replace_all(math$X4MTHETK2,"(^\\-9)","")
math$X6MTHETK2 <- str_replace_all(math$X6MTHETK2,"(^\\-9)","")
math$X1MTHETK2 <- as.numeric(math$X1MTHETK2)
math$X2MTHETK2 <- as.numeric(math$X2MTHETK2)
## Warning: NAs introduced by coercion
math$X4MTHETK2 <- as.numeric(math$X4MTHETK2)
math$X6MTHETK2 <- as.numeric(math$X6MTHETK2)
colnames(math) <- c("CHILDID", "K1_Math", "K2_Math", "G1_2_Math", "G2_2_Math")
# science scores
myscience <- c("CHILDID", "X2STHETK2", "X4STHETK2", "X6STHETK2")
science <- subset(earlyedu,,myscience)
science$X2STHETK2 <- as.character(science$X2STHETK2)
science$X4STHETK2 <- as.character(science$X4STHETK2)
science$X6STHETK2 <- as.character(science$X6STHETK2)
science$X2STHETK2 <- str_replace_all(science$X2STHETK2,"(^\\-9)","")
science$X4STHETK2 <- str_replace_all(science$X4STHETK2,"(^\\-9)","")
science$X6STHETK2 <- str_replace_all(science$X6STHETK2,"(^\\-9)","")
science$X2STHETK2 <- as.numeric(science$X2STHETK2)
science$X4STHETK2 <- as.numeric(science$X4STHETK2)
## Warning: NAs introduced by coercion
science$X6STHETK2 <- as.numeric(science$X6STHETK2)
colnames(science) <- c("CHILDID", "K2_SCI", "G1_2_SCI", "G2_2_SCI")
# The Dimensional Change Card Sort (DCCS): A Method of Assessing Executive Function in Children
myDCCSTOT <- c("CHILDID", "X1DCCSTOT", "X2DCCSTOT", "X4DCCSTOT")
DCCSTOT <- subset(earlyedu,,myDCCSTOT)
DCCSTOT$X1DCCSTOT <- as.character(DCCSTOT$X1DCCSTOT)
DCCSTOT$X2DCCSTOT <- as.character(DCCSTOT$X2DCCSTOT)
DCCSTOT$X4DCCSTOT <- as.character(DCCSTOT$X4DCCSTOT)
DCCSTOT$X1DCCSTOT <- str_replace_all(DCCSTOT$X1DCCSTOT,"(^\\-9)","")
DCCSTOT$X2DCCSTOT <- str_replace_all(DCCSTOT$X2DCCSTOT,"(^\\-9)","")
DCCSTOT$X4DCCSTOT <- str_replace_all(DCCSTOT$X4DCCSTOT,"(^\\-9)","")
DCCSTOT$X1DCCSTOT <- as.numeric(DCCSTOT$X1DCCSTOT)
DCCSTOT$X2DCCSTOT <- as.numeric(DCCSTOT$X2DCCSTOT)
DCCSTOT$X4DCCSTOT <- as.numeric(DCCSTOT$X4DCCSTOT)
colnames(DCCSTOT) <- c("CHILDID", "K1_DCCSTOT", "K2_DCCSTOT", "G1_2_DCCSTOT")
# DCCS composite score by summing the post-switch score and the Border Game score. Relative completed survay results are only available at 6th semester.
myDCCSSCR <- c("CHILDID", "X6DCCSSCR")
DCCSSCR <- subset(earlyedu,,myDCCSSCR)
DCCSSCR$X6DCCSSCR <- as.character(DCCSSCR$X6DCCSSCR)
DCCSSCR$X6DCCSSCR <- str_replace_all(DCCSSCR$X6DCCSSCR,"(^\\-9)","")
DCCSSCR$X6DCCSSCR <- as.numeric(DCCSSCR$X6DCCSSCR)
colnames(DCCSSCR) <- c("CHILDID", "G2_2_DCCSSCR")
# GIFTED-TALENT not offered in K. Survay results are only available at 2nd and 4th semester.
GTKlevels <- c("yes", "no", "NA")
myGTK <- c("CHILDID", "S2GIFNO","S4GIFNO")
GTK <- subset(earlyedu,,myGTK)
GTK$S2GIFNO <- str_replace_all(GTK$S2GIFNO,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("1","yes") %>%
str_replace_all("2","no")
GTK$S4GIFNO <- str_replace_all(GTK$S4GIFNO,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("0","yes") %>%
str_replace_all("1","no")
colnames(GTK) <- c("CHILDID", "K2_GIFK", "G1_2_GIFK")
# GIFTED-TALENT offered at some grades or not offered at the school.Survay results are only available at 2nd and 4th semester.
GTSlevels <- c("yes", "no", "NA")
myGTS <- c("CHILDID", "S2GIFNOG","S4GIFNOG")
GTS <- subset(earlyedu,,myGTS)
GTS$S2GIFNOG <- str_replace_all(GTS$S2GIFNOG,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("1","yes") %>%
str_replace_all("2","no")
GTS$S4GIFNOG <- str_replace_all(GTS$S4GIFNOG,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("0","yes") %>%
str_replace_all("1","no")
colnames(GTS) <- c("CHILDID", "K2_GIFS", "G1_2_GIFS")
# A CUTOFF DATE FOR CHILD TO TURN FIVE TO ENTER KINDERGARTEN
Sep1cutofflevels <- c(1,2,NA)
mycutoff <- c("CHILDID", "S6GIFNO")
Sep1cutoff <- subset(earlyedu,,mycutoff)
Sep1cutoff$S6GIFNOt <- str_replace_all(Sep1cutoff$S6GIFNO,"0","yes") %>%
str_replace_all("1","no")
colnames(Sep1cutoff) <- c("CHILDID", "G2_2_Sep1Cut", "G2_2_Sep1Cut_t")
# Whether PreK is helpful for prepare children for Kindergarten.
PreKlevles <- c(1, 2, 3, 4, 5)
myprek <- c("CHILDID", "A1ATNDPR")
PreK <- subset(earlyedu,,myprek)
PreK$A1ATNDPR <- str_replace_all(PreK$A1ATNDPR,"^\\-9","NA")
PreK$A1ATNDPRt <- str_replace_all(PreK$A1ATNDPR, "1","STRONGLY DISAGREE") %>%
str_replace_all("2","DISAGREE") %>%
str_replace_all("3","NEITHER AGREE NOR DISAGREE") %>%
str_replace_all("4","AGREE") %>%
str_replace_all("5","STRONGLY AGREE")
colnames(PreK) <- c("CHILDID", "PreK", "PreKt")
agelevels <- c(4, 5, 6, 7, 8)
myage <- c("CHILDID", "X1KAGE_R", "X2KAGE_R", "X4AGE", "X6AGE")
Age <- subset(earlyedu,,myage)
Age[,2]<- Age[,2]/12
Age[,3]<- Age[,3]/12
Age[,4]<- Age[,4]/12
Age[,5]<- Age[,5]/12
Age$X1KAGE_R <-as.character(Age$X1KAGE_R)
Age$X2KAGE_R <-as.character(Age$X2KAGE_R)
Age$X4AGE <-as.character(Age$X4AGE)
Age$X6AGE <-as.character(Age$X6AGE)
Age$X1KAGE_R <-str_replace_all(Age$X1KAGE_R,"^\\-9","NA")
Age$X2KAGE_R <-str_replace_all(Age$X2KAGE_R,"^\\-9","NA")
Age$X4AGE <-str_replace_all(Age$X4AGE,"^\\-9","NA")
Age$X6AGE <-str_replace_all(Age$X6AGE,"^\\-9","NA")
Age$X1KAGE_R <-as.numeric(Age$X1KAGE_R)
Age$X2KAGE_R <-as.numeric(Age$X2KAGE_R)
Age$X4AGE <-as.numeric(Age$X4AGE)
Age$X6AGE <-as.numeric(Age$X6AGE)
colnames(Age) <- c("CHILDID", "K1_AGE", "K2_AGE", "G1_2_AGE", "G2_2_AGE")
genderLevels <- c('Male', 'Female')
mygender <- c("CHILDID", "X_CHSEX_R")
Gender <- subset(earlyedu,,mygender)
Gender$X_CHSEX_R <- str_replace_all(Gender$X_CHSEX_R,"1","Male") %>%
str_replace_all("2","Female")
Gender$X_CHSEX_R <- str_replace_all(Gender$X_CHSEX_R,"^\\-9","NA")
colnames(Gender) <- c("CHILDID", "CHSEX")
#class size decrease?
decreasesizelevel <- c("yes", "no", "NA")
mydecreasesize <- c("CHILDID", "S2TT1CLA", "S4TT1CLA", "S6TT1CLA")
Classsize <- subset(earlyedu,,mydecreasesize)
Classsize$S2TT1CLA <- str_replace_all(Classsize$S2TT1CLA,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("1","yes") %>%
str_replace_all("2","no")
Classsize$S4TT1CLA <- str_replace_all(Classsize$S4TT1CLA,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("1","yes") %>%
str_replace_all("2","no")
Classsize$S6TT1CLA <- str_replace_all(Classsize$S6TT1CLA,"(^\\-9)|(^\\-1)","NA") %>%
str_replace_all("1","yes") %>%
str_replace_all("2","no")
colnames(Classsize) <- c("CHILDID", "K2_CLSI", "G1_2_CLSI", "G2_2_CLSI")
Classsize$K2_CLSIt <- str_replace_all(Classsize$K2_CLSI, "yes", "1") %>%
str_replace_all("no", "2")
Classsize$G1_2_CLSIt <- str_replace_all(Classsize$G1_2_CLSI, "yes", "1") %>%
str_replace_all("no", "2")
Classsize$G2_2_CLSIt <- str_replace_all(Classsize$G2_2_CLSI, "yes", "1") %>%
str_replace_all("no", "2")
# region of the school located
regionlevels <-c(1,2,3,4,NA)
myregion <- c("CHILDID", "X1LOCALE", "X2LOCALE", "X4LOCALE", "X6LOCALE")
Region <- subset(earlyedu,,myregion)
Region$X1LOCALE <- str_replace_all(Region$X1LOCALE,"(^\\-9)|(^\\-1)","NA")
Region$X1LOCALt <- str_replace_all(Region$X1LOCALE, "1", "CITY") %>%
str_replace_all("2", "SUBURB") %>%
str_replace_all("3", "TOWN") %>%
str_replace_all("4", "RURAL")
Region$X2LOCALE <- str_replace_all(Region$X2LOCALE,"(^\\-9)|(^\\-1)","NA")
Region$X2LOCALt <- str_replace_all(Region$X2LOCALE, "1", "CITY") %>%
str_replace_all("2", "SUBURB") %>%
str_replace_all("3", "TOWN") %>%
str_replace_all("4", "RURAL")
Region$X4LOCALE <- str_replace_all(Region$X4LOCALE,"(^\\-9)|(^\\-1)","NA")
Region$X4LOCALt <- str_replace_all(Region$X4LOCALE, "1", "CITY") %>%
str_replace_all("2", "SUBURB") %>%
str_replace_all("3", "TOWN") %>%
str_replace_all("4", "RURAL")
Region$X6LOCALE <- str_replace_all(Region$X6LOCALE,"(^\\-9)|(^\\-1)","NA")
Region$X6LOCALt <- str_replace_all(Region$X6LOCALE, "1", "CITY") %>%
str_replace_all("2", "SUBURB") %>%
str_replace_all("3", "TOWN") %>%
str_replace_all("4", "RURAL")
colnames(Region) <- c("CHILDID", "K1_LOC","K2_LOC", "G1_2_LOC", "G2_2_LOC", "K1_LOCt","K2_LOCt", "G1_2_LOCt", "G2_2_LOCt")
# Parents getting involve in school volunteer
parentschoolhlplevels <- c(0, 1, 2, 3, 4,5,NA)
myhlp <- c("CHILDID", "A2REGHLP", "A4REGHLP", "A6REGHLP")
Prthlp <- subset(earlyedu,,myhlp)
Prthlp$A2REGHLP <- str_replace_all(Prthlp$A2REGHLP,"^\\-9","NA")
Prthlp$A4REGHLP <- str_replace_all(Prthlp$A4REGHLP,"^\\-9","NA")
Prthlp$A6REGHLP <- str_replace_all(Prthlp$A6REGHLP,"^\\-9","NA")
colnames(Prthlp) <- c("CHILDID", "K2_Prthlp", "G1_2_Prthlp", "G2_2_Prthlp")
# Parents work hours per week
parentworklevels <- c("b35h", "a35h", "NA")
myprtwork <- c("CHILDID", "X1PAR1EMP","X4PAR1EMP_I", "X6PAR1EMP_I")
Prtwork <- subset(earlyedu,,myprtwork)
Prtwork$X1PAR1EMP <- str_replace_all(Prtwork$X1PAR1EMP,"(^\\-9)|(^3)|(^4)","NA") %>%
str_replace_all("1","a35h") %>%
str_replace_all("2","blw35h")
Prtwork$X4PAR1EMP_I <- str_replace_all(Prtwork$X4PAR1EMP_I,"(^\\-9)|(^3)|(^4)","NA") %>%
str_replace_all("1","a35h") %>%
str_replace_all("2","blw35h")
Prtwork$X6PAR1EMP_I <- str_replace_all(Prtwork$X6PAR1EMP_I,"(^\\-9)|(^3)|(^4)","NA") %>%
str_replace_all("1","a35h") %>%
str_replace_all("2","blw35h")
colnames(Prtwork) <- c("CHILDID", "K1_PrtW", "G1_2_PrtW", "G2_2_PrtW")
Prtwork$K1_PrtWt <- str_replace_all(Prtwork$K1_PrtW, "a35h", "1") %>%
str_replace_all("blw35h", "2")
Prtwork$G1_2_PrtWt <- str_replace_all(Prtwork$G1_2_PrtW, "a35h", "1") %>%
str_replace_all("blw35h", "2")
Prtwork$G2_2_PrtWt <- str_replace_all(Prtwork$G2_2_PrtW, "a35h", "1") %>%
str_replace_all("blw35h", "2")
# HIGHEST EDUCATION LEVEL PARENTS ACHIEVED
Parentedulevles <- c(1, 2, 3, 4, 5, 6, 7)
myedu <- c("CHILDID", "A1HGHPAR")
Parentedu <- subset(earlyedu,,myedu)
Parentedu$A1HGHPAR <- str_replace_all(Parentedu$A1HGHPAR, "(\\-9)|(8)", "NA")
Parentedu$A1HGHPARt <- str_replace_all(Parentedu$A1HGHPAR,"1","DID NOT COMPLETE HIGH SCHOOL") %>%
str_replace_all("2","HIGH SCHOOL") %>%
str_replace_all("3","SOME COLLEGE") %>%
str_replace_all("4","ASSOCIATE'S DEGREE") %>%
str_replace_all("5","BACHELOR") %>%
str_replace_all("6","MASTER") %>%
str_replace_all("7","BEYOND A MASTER")
colnames(Parentedu) <- c("CHILDID", "PrtEDU", "PrtEDUt")
# Parent attend conferences
parentconflevels <- c(0, 1, 2, 3, 4,5,NA)
myprtconf <- c("CHILDID", "A2TPCONF","A4TPCONF", "A6TPCONF")
Prtconf <- subset(earlyedu,,myprtconf)
Prtconf$A2TPCONF <- str_replace_all(Prtconf$A2TPCONF,"^\\-9","NA")
Prtconf$A4TPCONF <- str_replace_all(Prtconf$A4TPCONF,"^\\-9","NA")
Prtconf$A6TPCONF <- str_replace_all(Prtconf$A6TPCONF,"^\\-9","NA")
colnames(Prtconf) <- c("CHILDID", "K2_Prtconf", "G1_2_Prtconf", "G2_2_Prtconf")
# PARENT ATTEND OPEN HOUSE/PARTY
parentophlevels <- c(0, 1, 2, 3, 4,5,NA)
myparentoph <- c("CHILDID", "A2ATTOPN","A4ATTOPN", "A6ATTOPN")
Prtoph <- subset(earlyedu,,myparentoph)
Prtoph$A2ATTOPN <- str_replace_all(Prtoph$A2ATTOPN,"^\\-9","NA")
Prtoph$A4ATTOPN <- str_replace_all(Prtoph$A4ATTOPN,"^\\-9","NA")
Prtoph$A6ATTOPN <- str_replace_all(Prtoph$A6ATTOPN,"^\\-9","NA")
colnames(Prtoph) <- c("CHILDID", "K2_Prtoph", "G1_2_Prtoph", "G2_2_Prtoph")
# PARENT ATTEND ART/MUSIC EVENT
parentevtlevels <- c(0, 1, 2, 3, 4,5,NA)
myparentevt <- c("CHILDID", "A4ATTART","A6ATTART")
Prtevt <- subset(earlyedu,,myparentevt)
Prtevt$A4ATTART <-as.character(Prtevt$A4ATTART )
Prtevt$A6ATTART <-as.character(Prtevt$A6ATTART )
Prtevt$A4ATTART <- str_replace_all(Prtevt$A4ATTART,"^\\-9","NA")
Prtevt$A6ATTART <- str_replace_all(Prtevt$A6ATTART,"^\\-9","NA")
Prtevt$A4ATTART <-as.numeric(Prtevt$A4ATTART )
## Warning: NAs introduced by coercion
Prtevt$A6ATTART <-as.numeric(Prtevt$A6ATTART )
## Warning: NAs introduced by coercion
colnames(Prtevt) <- c("CHILDID", "G1_2_Prtevt", "G2_2_Prtevt")
#CHILDREN IN PRESCHOOL
parentpsklevels <- c(1, 2, NA)
myparentpsk <- c("CHILDID", "S4TTLPRE","S6TTLPRE")
Prtpsk <- subset(earlyedu,,myparentpsk)
Prtpsk$S4TTLPRE <- as.character(Prtpsk$S4TTLPRE)
Prtpsk$S6TTLPRE <- as.character(Prtpsk$S6TTLPRE)
Prtpsk$S4TTLPRE <- str_replace_all(Prtpsk$S4TTLPRE,"^\\-9","NA")
Prtpsk$S4TTLPRE <- str_replace_all(Prtpsk$S4TTLPRE,"^\\-1","NA")
Prtpsk$S6TTLPRE <- str_replace_all(Prtpsk$S6TTLPRE,"^\\-9","NA")
Prtpsk$S6TTLPRE <- str_replace_all(Prtpsk$S6TTLPRE,"^\\-1","NA")
Prtpsk$S4TTLPRE <- as.numeric(Prtpsk$S4TTLPRE)
## Warning: NAs introduced by coercion
Prtpsk$S6TTLPRE <- as.numeric(Prtpsk$S6TTLPRE)
## Warning: NAs introduced by coercion
colnames(Prtpsk) <- c("CHILDID", "G1_2_Prtpsk", "G2_2_Prtpsk")
#PRESCH RD/MATH GOOD FOR SCHOOL
parentpremthlevels <- c(1, 2, 3, 4,5,NA)
myparentpremth <- c("CHILDID", "A1FRMLIN")
Prtpremth <- subset(earlyedu,,myparentpremth)
Prtpremth$A1FRMLIN <- str_replace_all(Prtpremth$A1FRMLIN,"^\\-9","NA")
colnames(Prtpremth) <- c("CHILDID", "Premath")
#Parent HAVE CHILD KNOW ALPHABET BEFORE K
parentlitrlevels <- c(1, 2, 3, 4,5,NA)
myparentprelitr <- c("CHILDID", "A1ALPHBF")
Prtprelitr <- subset(earlyedu,,myparentprelitr)
Prtprelitr$A1ALPHBF <- str_replace_all(Prtprelitr$A1ALPHBF,"^\\-9","NA")
colnames(Prtprelitr) <- c("CHILDID", "Prelitr")
# PARENT PROVIDES HOMEWORK TIME
parenthwlevels <- c(1, 2, 3, 4,5,NA)
myparenthw <- c("CHILDID", "A1PRCTWR")
Prthw <- subset(earlyedu,,myparenthw)
Prthw$A1PRCTWR <- str_replace_all(Prthw$A1PRCTWR,"^\\-9","NA")
colnames(Prthw) <- c("CHILDID", "Prthw")
# PARENT SHD READ/COUNT WITH CHILD
parentrclevels <- c( 1, 2, 3, 4,5,NA)
myparentrc <- c("CHILDID", "A1READAT")
Prtrc <- subset(earlyedu,,myparentrc)
Prtrc$A1READAT <- str_replace_all(Prtrc$A1READAT,"^\\-9","NA")
colnames(Prtrc) <- c("CHILDID", "Prerc")
# CHILD ATTENDED SUMMER SCHOOL
sumschlevels <- c(1, 2,NA)
mysumsch <- c("CHILDID", "P3SUMSCH")
Sumsch <- subset(earlyedu,,mysumsch)
Sumsch$P3SUMSCH <- str_replace_all(Sumsch$P3SUMSCH,"^\\-9","NA")
Sumsch$P3SUMSCH <- str_replace_all(Sumsch$P3SUMSCH,"^\\-8","NA")
Sumsch$P3SUMSCH <- str_replace_all(Sumsch$P3SUMSCH,"^\\-7","NA")
colnames(Sumsch) <- c("CHILDID", "Sumsch")
# DO MATH ACTVTY WITH CHILD IN SUMMER
summthlevels <- c(1,2,3,4,5,NA)
mysummth <- c("CHILDID", "P3DOMATH")
Summth <- subset(earlyedu,,mysummth)
Summth$P3DOMATH <- str_replace_all(Summth$P3DOMATH,"^\\-9","NA")
Summth$P3DOMATH <- str_replace_all(Summth$P3DOMATH,"^\\-8","NA")
Summth$P3DOMATH <- str_replace_all(Summth$P3DOMATH,"^\\-7","NA")
colnames(Summth) <- c("CHILDID", "Summth")
#DO WRITING ACTVTY WITH CHILD IN SUMMER
sumwrtlevels <- c(1,2,3,4,5,NA)
mysumwrt <- c("CHILDID", "P3DOWRIT")
Sumwrt <- subset(earlyedu,,mysumwrt)
Sumwrt$P3DOWRIT <- str_replace_all(Sumwrt$P3DOWRIT,"^\\-9","NA")
Sumwrt$P3DOWRIT <- str_replace_all(Sumwrt$P3DOWRIT,"^\\-8","NA")
Sumwrt$P3DOWRIT <- str_replace_all(Sumwrt$P3DOWRIT,"^\\-7","NA")
Sumwrt$P3DOWRIT <- str_replace_all(Sumwrt$P3DOWRIT,"^\\-1","NA")
colnames(Sumwrt) <- c("CHILDID", "Sumwrt")
# READ BOOKS TO CHILD IN SUMMER
sumrdlevels <- c(1,2,3,4,5,NA)
mysumrd <- c("CHILDID", "P3RDBKTC")
Sumrd <- subset(earlyedu,,mysumrd)
Sumrd$P3RDBKTC <- str_replace_all(Sumrd$P3RDBKTC,"^\\-9","NA")
Sumrd$P3RDBKTC <- str_replace_all(Sumrd$P3RDBKTC,"^\\-8","NA")
Sumrd$P3RDBKTC <- str_replace_all(Sumrd$P3RDBKTC,"^\\-7","NA")
Sumrd$P3RDBKTC <- str_replace_all(Sumrd$P3RDBKTC,"^\\-1","NA")
colnames(Sumrd) <- c("CHILDID", "Sumrd")
# HOW LONG READ TO CHILD
sumreadtimelevels <- c(0, 1, 2, 3, 4,NA)
mysumreadtime <- c("CHILDID", "P3HWLGRD")
Sumreadtime <- subset(earlyedu,,mysumreadtime)
Sumreadtime$P3HWLGRD <- str_replace_all(Sumreadtime$P3HWLGRD,"^\\-9","NA")
Sumreadtime$P3HWLGRD <- str_replace_all(Sumreadtime$P3HWLGRD,"^\\-8","NA")
Sumreadtime$P3HWLGRD <- str_replace_all(Sumreadtime$P3HWLGRD,"^\\-7","NA")
Sumreadtime$P3HWLGRD <- str_replace_all(Sumreadtime$P3HWLGRD,"^\\-1","NA")
colnames(Sumreadtime) <- c("CHILDID", "Sumreadtime")
#HIGHEST ED LEVEL TEACHER ACHIEVED
myteachedu <- c("CHILDID", "A1HGHSTD")
teachedu <- subset(earlyedu,,myteachedu)
teachedu$A1HGHSTD <- as.character(teachedu$A1HGHSTD)
teachedu$A1HGHSTD <- str_replace_all(teachedu$A1HGHSTD,"^\\-9","NA")
teachedu$A1HGHSTD <- as.numeric(teachedu$A1HGHSTD)
## Warning: NAs introduced by coercion
colnames(teachedu) <- c("CHILDID", "teachedu")
# teacher enjpy current job
myteachjoy <- c("CHILDID", "A1ENJOY")
teachjoy <- subset(earlyedu,,myteachjoy)
teachjoy$A1ENJOY <- as.character(teachjoy$A1ENJOY)
teachjoy$A1ENJOY <- str_replace_all(teachjoy$A1ENJOY,"^\\-9","NA")
teachjoy$A1ENJOY <- as.numeric(teachjoy$A1ENJOY)
## Warning: NAs introduced by coercion
colnames(teachjoy) <- c("CHILDID", "teachjoy")
#teacher make difference in children's life
myteachdiff <- c("CHILDID", "A1MKDIFF")
teachdiff <- subset(earlyedu,,myteachdiff)
teachdiff$A1MKDIFF <- as.character(teachdiff$A1MKDIFF)
teachdiff$A1MKDIFF <- str_replace_all(teachdiff$A1MKDIFF,"^\\-9","NA")
teachdiff$A1MKDIFF <- as.numeric(teachdiff$A1MKDIFF)
## Warning: NAs introduced by coercion
colnames(teachdiff) <- c("CHILDID", "teachdiff")
#teacher take exam on national board
myteachboard <- c("CHILDID", "A1NATEXM")
teachboard <- subset(earlyedu,,myteachboard)
teachboard$A1NATEXM <- as.character(teachboard$A1NATEXM)
teachboard$A1NATEXM <- str_replace_all(teachboard$A1NATEXM,"^\\-9","NA")
teachboard$A1NATEXM <- as.numeric(teachboard$A1NATEXM)
## Warning: NAs introduced by coercion
colnames(teachboard) <- c("CHILDID", "teachboard")
#
#A1EARLY
earlychildhood <- inner_join(Age, Gender, by = "CHILDID") %>%
inner_join(reading, by = "CHILDID") %>%
inner_join(math, by = "CHILDID") %>%
inner_join(science, by = "CHILDID") %>%
inner_join(DCCSTOT, by = "CHILDID") %>%
inner_join(DCCSSCR, by = "CHILDID") %>%
inner_join(GTK, by = "CHILDID") %>%
inner_join(GTS, by = "CHILDID") %>%
inner_join(Sep1cutoff, by = "CHILDID") %>%
inner_join(Classsize, by = "CHILDID") %>%
inner_join(PreK, by = "CHILDID") %>%
inner_join(Region, by = "CHILDID") %>%
inner_join(Prthlp, by = "CHILDID") %>%
inner_join(Prtwork, by = "CHILDID") %>%
inner_join(Parentedu, by = "CHILDID") %>%
inner_join(Prtconf, by = "CHILDID") %>%
inner_join(Prtoph, by = "CHILDID") %>%
inner_join(Prtevt, by = "CHILDID") %>%
inner_join(Prtpsk, by = "CHILDID") %>%
inner_join(Prtpremth, by = "CHILDID") %>%
inner_join(Prtprelitr, by = "CHILDID") %>%
inner_join(Prthw, by = "CHILDID") %>%
inner_join(Prtrc, by = "CHILDID") %>%
inner_join(Sumsch, by = "CHILDID") %>%
inner_join(Summth, by = "CHILDID") %>%
inner_join(Sumwrt, by = "CHILDID") %>%
inner_join(Sumrd, by = "CHILDID") %>%
inner_join(Sumreadtime, by = "CHILDID") %>%
inner_join(teachedu, by = "CHILDID") %>%
inner_join(teachjoy, by = "CHILDID") %>%
inner_join(teachdiff, by = "CHILDID") %>%
inner_join(teachboard, by = "CHILDID")
kable(head(earlychildhood,n=10))
CHILDID | K1_AGE | K2_AGE | G1_2_AGE | G2_2_AGE | CHSEX | K1_READ | K2_READ | G1_2_READ | G2_2_READ | K1_Math | K2_Math | G1_2_Math | G2_2_Math | K2_SCI | G1_2_SCI | G2_2_SCI | K1_DCCSTOT | K2_DCCSTOT | G1_2_DCCSTOT | G2_2_DCCSSCR | K2_GIFK | G1_2_GIFK | K2_GIFS | G1_2_GIFS | G2_2_Sep1Cut | G2_2_Sep1Cut_t | K2_CLSI | G1_2_CLSI | G2_2_CLSI | K2_CLSIt | G1_2_CLSIt | G2_2_CLSIt | PreK | PreKt | K1_LOC | K2_LOC | G1_2_LOC | G2_2_LOC | K1_LOCt | K2_LOCt | G1_2_LOCt | G2_2_LOCt | K2_Prthlp | G1_2_Prthlp | G2_2_Prthlp | K1_PrtW | G1_2_PrtW | G2_2_PrtW | K1_PrtWt | G1_2_PrtWt | G2_2_PrtWt | PrtEDU | PrtEDUt | K2_Prtconf | G1_2_Prtconf | G2_2_Prtconf | K2_Prtoph | G1_2_Prtoph | G2_2_Prtoph | G1_2_Prtevt | G2_2_Prtevt | G1_2_Prtpsk | G2_2_Prtpsk | Premath | Prelitr | Prthw | Prerc | Sumsch | Summth | Sumwrt | Sumrd | Sumreadtime | teachedu | teachjoy | teachdiff | teachboard |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10000001 | 6.032500 | 6.646667 | 7.638333 | 8.638333 | Male | 0.1930 | 1.7614 | 3.1523 | 3.3637 | 0.5827 | 1.5384 | 2.7286 | 3.2950 | 1.3738 | 3.3354 | 2.5071 | 16 | 17 | 16 | 7.3280 | NA | no | NA | yes | 0 | yes | no | yes | yes | 2 | 1 | 1 | 3 | NEITHER AGREE NOR DISAGREE | 4 | 4 | 4 | 4 | RURAL | RURAL | RURAL | RURAL | 3 | 2 | 2 | NA | NA | NA | NA | NA | NA | 6 | MASTER | 5 | 5 | 5 | 3 | 5 | 3 | 5 | 5 | 2 | 1 | 3 | 2 | 3 | 4 | NA | NA | NA | NA | NA | 5 | 5 | 5 | 1 |
10000002 | 5.950833 | 6.460000 | 7.463333 | NA | Female | -0.7870 | 0.7452 | 2.2023 | NA | -0.3473 | 0.8800 | 2.0825 | NA | 0.8811 | 1.3122 | NA | 17 | 16 | 16 | NA | yes | no | yes | yes | NA | NA | no | NA | NA | 2 | NA | NA | 3 | NEITHER AGREE NOR DISAGREE | 2 | 2 | 2 | NA | SUBURB | SUBURB | SUBURB | NA | 2 | 4 | NA | NA | NA | NA | NA | NA | NA | 5 | BACHELOR | 5 | 5 | NA | 5 | 4 | NA | 4 | NA | NA | NA | 2 | 3 | 2 | 5 | NA | NA | NA | NA | NA | 5 | 5 | 5 | 1 |
10000003 | NA | 6.125833 | 7.027500 | 8.021667 | Male | NA | 0.3323 | 1.1861 | 2.0689 | NA | 1.0112 | 2.2080 | 2.8384 | 0.4244 | 2.2479 | 2.3529 | NA | 14 | 16 | 8.2395 | yes | no | no | yes | 1 | no | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 2 | 2 | NA | SUBURB | SUBURB | SUBURB | 2 | 2 | 4 | NA | a35h | a35h | NA | 1 | 1 | NA | NA | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
10000004 | 6.032500 | 6.570000 | 7.435833 | 8.432500 | Male | -1.7087 | -0.2922 | 0.8632 | 1.5117 | -2.1245 | -0.2834 | 1.1082 | 2.8392 | 0.6509 | 0.2933 | 1.7845 | 15 | 14 | 17 | 7.2868 | NA | yes | NA | yes | 0 | yes | no | no | yes | 2 | 2 | 1 | 5 | STRONGLY AGREE | 4 | 4 | 4 | 4 | RURAL | RURAL | RURAL | RURAL | 2 | 2 | 3 | a35h | a35h | NA | 1 | 1 | NA | 5 | BACHELOR | 3 | 3 | 3 | 3 | 4 | 3 | 5 | 3 | 1 | 1 | 5 | 5 | 4 | 5 | NA | NA | NA | NA | NA | 5 | 5 | 5 | 2 |
10000005 | 4.959167 | 5.421667 | NA | NA | Female | -0.0734 | 0.8013 | NA | NA | -0.5545 | 0.6163 | NA | NA | 0.3996 | NA | NA | 14 | 17 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | STRONGLY AGREE | 2 | 2 | NA | NA | SUBURB | SUBURB | NA | NA | 2 | NA | NA | a35h | NA | NA | 1 | NA | NA | 3 | SOME COLLEGE | 5 | NA | NA | 5 | NA | NA | NA | NA | NA | NA | 5 | 5 | 5 | 5 | NA | NA | NA | NA | NA | 4 | 5 | 5 | 1 |
10000006 | 6.315000 | 6.885000 | 7.901667 | 8.959167 | Female | -1.4601 | -0.6792 | 1.0132 | 1.9750 | -1.0616 | -0.5379 | 0.4484 | 1.0869 | -0.6320 | -0.7748 | -0.2110 | 16 | 14 | 18 | 5.3872 | NA | yes | NA | yes | 0 | yes | no | yes | yes | 2 | 1 | 1 | 4 | AGREE | 4 | 4 | 4 | 4 | RURAL | RURAL | RURAL | RURAL | 2 | 2 | 2 | a35h | a35h | NA | 1 | 1 | NA | 2 | HIGH SCHOOL | 5 | 5 | 5 | 2 | 5 | 2 | 5 | 5 | 1 | 2 | 3 | 3 | 3 | 4 | NA | NA | NA | NA | NA | 5 | 5 | 4 | 2 |
10000007 | 5.329167 | 5.825000 | 6.748333 | 7.797500 | Female | -0.3363 | 0.8570 | 1.3456 | 2.4449 | -0.6902 | 0.1032 | 1.4144 | 2.4619 | 0.0669 | 0.2036 | 1.0675 | 14 | 17 | 18 | 6.9906 | yes | no | no | no | 0 | yes | no | no | NA | 2 | 2 | NA | 4 | AGREE | 3 | 3 | 3 | 3 | TOWN | TOWN | TOWN | TOWN | 2 | 2 | 2 | blw35h | blw35h | NA | 2 | 2 | NA | 2 | HIGH SCHOOL | 5 | 5 | 5 | 4 | 5 | 4 | 5 | 4 | 2 | 2 | 4 | 3 | 4 | 5 | NA | NA | NA | NA | NA | 6 | 5 | 5 | 1 |
10000008 | 5.345000 | 5.994167 | 6.967500 | 8.005833 | Female | -0.4149 | 0.6023 | 2.3156 | 2.2707 | -0.2921 | 1.1411 | 2.7230 | 3.1506 | 0.7350 | 1.1363 | 2.5198 | 17 | 17 | 18 | 6.5526 | NA | yes | NA | yes | 0 | yes | NA | NA | NA | NA | NA | NA | 5 | STRONGLY AGREE | 2 | 2 | 2 | 2 | SUBURB | SUBURB | SUBURB | SUBURB | 3 | 2 | 2 | NA | NA | NA | NA | NA | NA | 4 | ASSOCIATE'S DEGREE | 5 | 5 | 4 | 5 | 5 | 3 | 5 | 2 | NA | NA | 5 | 5 | 4 | 5 | NA | NA | NA | NA | NA | 6 | 5 | 5 | 1 |
10000009 | 5.693333 | 6.197500 | NA | NA | Female | -1.7711 | -0.6955 | NA | NA | -1.9759 | -0.3111 | NA | NA | 0.4266 | NA | NA | 15 | 15 | NA | NA | NA | NA | NA | NA | NA | NA | no | NA | NA | 2 | NA | NA | 4 | AGREE | 1 | 1 | NA | NA | CITY | CITY | NA | NA | 2 | NA | NA | blw35h | NA | NA | 2 | NA | NA | 5 | BACHELOR | 4 | NA | NA | 3 | NA | NA | NA | NA | NA | NA | 4 | 3 | 4 | 5 | NA | NA | NA | NA | NA | 6 | 5 | 4 | NA |
10000010 | 5.808333 | 6.265833 | 7.378333 | 8.285000 | Male | -1.7137 | -1.2014 | 0.2007 | 1.0282 | -2.0032 | -0.6192 | 0.7475 | 1.4483 | -1.7469 | -0.7688 | -1.0706 | 15 | 16 | 13 | 2.8750 | yes | yes | yes | yes | 1 | no | yes | yes | NA | 1 | 1 | NA | 3 | NEITHER AGREE NOR DISAGREE | 1 | 1 | 1 | 1 | CITY | CITY | CITY | CITY | 1 | 1 | 1 | NA | NA | NA | NA | NA | NA | 3 | SOME COLLEGE | 5 | 5 | 5 | 5 | 5 | 4 | 2 | 1 | 2 | NA | 3 | 3 | 4 | 4 | 2 | 3 | 3 | 2 | 2 | 6 | 5 | 5 | 1 |
summary(reading$G2_2_READ)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.221 1.824 2.231 2.189 2.612 3.829 4337
Histogram
hist(reading$G2_2_READ)
Box plot.
boxplot(reading$G2_2_READ)
Normal probability plot.
qqnorm(reading$G2_2_READ)
qqline(reading$G2_2_READ)
The distribution of the second grade reading scores is nearly normal and slghtly skewed to the right. The lower score deviated from the line in QQ plot. But the sample size is big so the sightly skewness will not be a concern.
summary(math$G2_2_Math)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -2.845 1.940 2.539 2.453 3.055 6.537 4344
Histogram
hist(math$G2_2_Math)
qqnorm(math$G2_2_Math)
qqline(math$G2_2_Math)
The distribution of the second grade math scores is nearly normal and slghtly skewed to the right. There are deviations on both lower- and higher- end from the line in QQ plot. But the sample size is big so the deviations will not be a concern.
3.Distribution of science score(G2_2_SCI)
summary(science$G2_2_SCI)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -4.490 1.096 1.719 1.587 2.236 5.370 4355
hist(science$G2_2_SCI)
qqnorm(science$G2_2_SCI)
qqline(science$G2_2_SCI)
The distribution of the second grade science scores is nearly normal and slghtly skewed to the right. There are deviations on both lower- and higher- end from the line in QQ plot. But the sample size is big so the deviations will not be a concern.
summary(DCCSSCR$G2_2_DCCSSCR)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.625 6.329 6.983 6.689 7.526 10.000 4400
hist(DCCSSCR$G2_2_DCCSSCR)
qqnorm(DCCSSCR$G2_2_DCCSSCR)
qqline(DCCSSCR$G2_2_DCCSSCR)
The distribution of the second grade Dimensional Change Card Sort (DCCSSCR) scores is bimodal. There are strong deviations on lower-end from the line in QQ plot. But the sample size is big so the deviations will not be a concern.
The distribution of second grade DCCS score is not so ideal, I looked into the DCCS score in first grades, in which the game design and the score calculation was different from second grade.
summary(DCCSTOT$G1_2_DCCSTOT)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 15.00 17.00 16.05 18.00 18.00 3065
hist(DCCSTOT$G1_2_DCCSTOT)
qqnorm(DCCSTOT$G1_2_DCCSTOT)
qqline(DCCSTOT$G1_2_DCCSTOT)
The distribution of the second grade Dimensional Change Card Sort (DCCSTOT) scores is strongly skewd to the right. The distribution is not normal according to the QQ plot. Similar pattern has been seen to kingdergarten DCCSTOT score. So it is not appropriate to use these socores as response variablesare.
plot(earlychildhood$G2_2_READ,earlychildhood$G2_2_Math)
cor(earlychildhood$G2_2_READ,earlychildhood$G2_2_Math,use = "complete.obs")
## [1] 0.7328452
The math score increases with the increase of reading score. There is a strong (Pearson's r > 0.6) positive linear realtionship between the math sore and the reading score.
plot(earlychildhood$G2_2_READ,earlychildhood$G2_2_SCI)
cor(earlychildhood$G2_2_READ,earlychildhood$G2_2_SCI,use = "complete.obs")
## [1] 0.6957673
The science score increases with the increase of reading score. There is a strong (Pearson's r > 0.6) positive linear realtionship between the math sore and the reading score.
plot(earlychildhood$G2_2_READ,earlychildhood$G2_2_DCCSSCR)
cor(earlychildhood$G2_2_READ,earlychildhood$G2_2_DCCSSCR,use = "complete.obs")
## [1] 0.3928881
The DCCSSCR score slightly increases with the increase of reading score. There is a weak (Pearson's r < 0.4) positive linear realtionship between the math sore and the reading score.
6.Is there difference between the academy performance between boys and girls?
sub_1 <- earlychildhood[which(earlychildhood$CHSEX != "NA"),]
boxplot(G2_2_READ ~ CHSEX, data=sub_1)
girl <- earlychildhood[which(earlychildhood$CHSEX == 'Female'),]
boy <- earlychildhood[which(earlychildhood$CHSEX == 'Male'),]
t.test(girl$G2_2_READ,boy$G2_2_READ)
##
## Welch Two Sample t-test
##
## data: girl$G2_2_READ and boy$G2_2_READ
## t = 13.056, df = 13791, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1201849 0.1626479
## sample estimates:
## mean of x mean of y
## 2.262143 2.120727
The medium of the reading score of girls is slightly higher than that of boys. T-test results suggest that there is significant difference between the average reading score of girls and that of boys as p-value < 2.2e-16.
boxplot(G2_2_Math ~ CHSEX, data=sub_1)
t.test(girl$G2_2_Math,boy$G2_2_Math)
##
## Welch Two Sample t-test
##
## data: girl$G2_2_Math and boy$G2_2_Math
## t = -5.7726, df = 13694, p-value = 7.976e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1075887 -0.0530442
## sample estimates:
## mean of x mean of y
## 2.413426 2.493743
T-test results suggest that there is siglthy but significant difference between the average math score of girls and that of boys. But I think the difference is so small that we could say that the average math score of girls and boys are about the same.
boxplot(G2_2_SCI ~ CHSEX, data=sub_1)
t.test(girl$G2_2_SCI,boy$G2_2_SCI)
##
## Welch Two Sample t-test
##
## data: girl$G2_2_SCI and boy$G2_2_SCI
## t = -4.124, df = 13793, p-value = 3.745e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09743361 -0.03465322
## sample estimates:
## mean of x mean of y
## 1.553854 1.619897
T-test results suggest that there is siglthy but significant difference between the average science score of girls and that of boys. But I think the difference is so small that we could say that the average science score of girls and boys are about the same.
boxplot(G2_2_DCCSSCR ~ CHSEX, data=sub_1)
t.test(girl$G2_2_DCCSSCR,boy$G2_2_DCCSSCR)
##
## Welch Two Sample t-test
##
## data: girl$G2_2_DCCSSCR and boy$G2_2_DCCSSCR
## t = 8.0041, df = 13618, p-value = 1.301e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1387685 0.2287779
## sample estimates:
## mean of x mean of y
## 6.783344 6.599571
The medium of the DCCSSCR score of girls is slightly higher than that of boys. T-test results suggest that there is significant difference between the average DCCSSCR score of girls and that of boys.
Parent involvment related to early literacy is believed to have positive effects on children's academy performance.
1.1 First we could create a scatterplot to see if there is positive correlation between the parents' opinion on whether parents help in children's homework to the reading score:
sub_2 <- earlychildhood[which(earlychildhood$Prthw != "NA"),]
sub_2$Prthw <- as.numeric(sub_2$Prthw)
plot(sub_2$G2_2_READ ~ sub_2$Prthw)
m_Prthw <- lm(sub_2$G2_2_READ ~ sub_2$Prthw)
m_Prthw
##
## Call:
## lm(formula = sub_2$G2_2_READ ~ sub_2$Prthw)
##
## Coefficients:
## (Intercept) sub_2$Prthw
## 2.47035 -0.06722
the equation for the linear model:
$$\widehat{score_{reading}} = 2.470354-0.06722\times Prthw $$
plot(jitter(sub_2$G2_2_READ, factor= 1.2) ~ jitter(sub_2$Prthw, factor=1.2))+
abline(m_Prthw)
## numeric(0)
cor(sub_2$G2_2_READ, sub_2$Prthw, use = "complete.obs")
## [1] -0.1001445
summary(m_Prthw)
##
## Call:
## lm(formula = sub_2$G2_2_READ ~ sub_2$Prthw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.42257 -0.36007 0.03647 0.41743 1.69505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.470347 0.026311 93.89 <2e-16 ***
## sub_2$Prthw -0.067219 0.006213 -10.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6326 on 11555 degrees of freedom
## (3650 observations deleted due to missingness)
## Multiple R-squared: 0.01003, Adjusted R-squared: 0.009943
## F-statistic: 117.1 on 1 and 11555 DF, p-value: < 2.2e-16
The average reading score will decrease 0.067219 for every one point increase in the parents' attitute on whether parents should help children's homework.
p-value is < 2.2e-16, which is less than 0.05 ,suggesting the parents' attitute on whether parents should help children' homework is a statistically significant predictor. It may not be a practically significant predictor because there is very weak correlation between the average reading score and the parents' attitute on whether parents should help children' homework with Multiple R-squared equals to 0.01003. For every one point increase in the parents' attitute on whether parents should help children' homework, the model only predicts an decrease of 0.067219 in the average reading score, which barely changes the score.
1.2 Use residual plots to evaluate whether the conditions of least squares regression are reasonable.
yx.res <- resid(m_Prthw, na.rm=T)
par(mfrow = c(1,2))
hist(yx.res, xlab="Residuals", breaks = 10)
sub_2 <- sub_2[which(sub_2$G2_2_READ != "NA"), ]
sub_2$Prthw <- as.numeric(sub_2$Prthw)
a <- (2.470354-0.06722*sub_2$Prthw)
plot((2.470354-0.06722*sub_2$Prthw), yx.res, ylab="Residuals", xlab="fitted values", main="Parents homework help")
abline(0, 0)
Following conditions were checked to evaluate whether the conditions of least squares regression are reasonable:
Linearity:the variable parent should help homework is linearly related to the reading score.
Nearly normal residuals:The distribution of residuals normal.
Constant variabilities:The variance around the line is constant.
Independent observation:Each student's parent's opinion on whether parent should help children's homework is indenpendent to each other.
2.1 Then let's see if there is positive correlation between the summer reading time to the reading score:
sub_4 <- earlychildhood[which(earlychildhood$Sumrd != "NA"),]
sub_4$Sumrd <- as.numeric (sub_4$Sumrd)
m_sumrd <- lm(sub_4$G2_2_READ ~ sub_4$Sumrd)
m_sumrd
##
## Call:
## lm(formula = sub_4$G2_2_READ ~ sub_4$Sumrd)
##
## Coefficients:
## (Intercept) sub_4$Sumrd
## 1.805 0.117
the equation for the linear model:
$$\widehat{score_{reading}} = 1.805 + 0.117\times Sumrd $$
plot(jitter(sub_4$G2_2_READ, factor= 1.2) ~ jitter(sub_4$Sumrd, factor=1.2))+
abline(m_sumrd)
## numeric(0)
cor(sub_4$G2_2_READ, sub_4$Sumrd,use = "complete.obs")
## [1] 0.1482586
summary(m_sumrd)
##
## Call:
## lm(formula = sub_4$G2_2_READ ~ sub_4$Sumrd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47357 -0.37566 0.02997 0.43454 1.67382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.80453 0.03851 46.855 <2e-16 ***
## sub_4$Sumrd 0.11699 0.01176 9.947 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6592 on 4402 degrees of freedom
## (548 observations deleted due to missingness)
## Multiple R-squared: 0.02198, Adjusted R-squared: 0.02176
## F-statistic: 98.93 on 1 and 4402 DF, p-value: < 2.2e-16
The average reading score will increase 0.11699 for every one point increase in summer reading.
p-value is < 2.2e-16, which is less than 0.05 ,suggesting the summer reading is a statistically significant predictor. It may not be a practically significant predictor because there is very weak correlation between the average reading score and the summer reading with Multiple R-squared equals to 0.02198. For every one point increase in summer reading, the model only predicts an increase of 0.11699 in the average reading score, which barely changes the score.
2.2 Use residual plots to evaluate whether the conditions of least squares regression are reasonable.
yx.res <- resid(m_sumrd, na.rm=T)
par(mfrow = c(1,2))
hist(yx.res, xlab="Residuals", breaks = 10)
sub_4 <- sub_4[which(sub_4$G2_2_READ != "NA"), ]
sub_4$Sumrd <- as.numeric(sub_4$Sumrd)
a <- (1.805 + 0.117*sub_4$Sumrd)
plot((1.805 + 0.117*sub_4$Sumrd), yx.res, ylab="Residuals", xlab="fitted values", main="Summer reading")
abline(0, 0)
Following conditions were checked to evaluate whether the conditions of least squares regression are reasonable:
Linearity:the variable parent should help homework is linearly related to the reading score.
Nearly normal residuals:The distribution of residuals normal.
Constant variabilities:The variance around the line is constant.
Independent observation:Each student's parent's opinion on whether parent should help children's homework is indenpendent to each other.
3.Is there positive correlation between the summer reading time to the reading score?
sub_5 <- earlychildhood[which(earlychildhood$Sumreadtime != "NA"),]
sub_5$Sumreadtime <- as.numeric (sub_5$Sumreadtime)
m_sumreadtime <- lm(sub_5$G2_2_READ ~ sub_5$Sumreadtime)
m_sumreadtime
##
## Call:
## lm(formula = sub_5$G2_2_READ ~ sub_5$Sumreadtime)
##
## Coefficients:
## (Intercept) sub_5$Sumreadtime
## 2.13244 0.02566
the equation for the linear model:
$$\widehat{score_{readingtime}} = 2.13244 + 0.02566\times Sumreadtime $$
plot(jitter(sub_5$G2_2_READ, factor= 1.2) ~ jitter(sub_5$Sumreadtime, factor=1.2))+
abline(m_sumreadtime)
## numeric(0)
cor(sub_5$G2_2_READ, sub_5$Sumreadtime,use = "complete.obs")
## [1] 0.03000113
summary(m_sumreadtime)
##
## Call:
## lm(formula = sub_5$G2_2_READ ~ sub_5$Sumreadtime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.38487 -0.39157 0.02553 0.43288 1.67119
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.13244 0.02522 84.563 <2e-16 ***
## sub_5$Sumreadtime 0.02566 0.01306 1.965 0.0495 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6636 on 4284 degrees of freedom
## (533 observations deleted due to missingness)
## Multiple R-squared: 0.0009001, Adjusted R-squared: 0.0006669
## F-statistic: 3.859 on 1 and 4284 DF, p-value: 0.04953
The average reading score will increase 0.02566 for every one point increase in summer reading time.
p-value is 0.04953, which is less than 0.05 ,suggesting the summer reading time is a statistically significant predictor. It may not be a practically significant predictor because there is very weak correlation between the average reading score and the summer reading time with Multiple R-squared equals to 0.0009001. For every one point increase in summer reading time, the model only predicts an increase of 0.02566 in the average reading score, which barely changes the score.
3.2 Use residual plots to evaluate whether the conditions of least squares regression are reasonable.
yx.res <- resid(m_sumreadtime, na.rm=T)
par(mfrow = c(1,2))
hist(yx.res, xlab="Residuals", breaks = 10)
sub_5 <- sub_5[which(sub_5$G2_2_READ != "NA"), ]
sub_5$Sumreadtime <- as.numeric(sub_5$Sumreadtime)
a <- (2.13244 + 0.02566*sub_5$Sumreadtime)
plot((2.13244 + 0.02566*sub_5$Sumreadtime), yx.res, ylab="Residuals", xlab="fitted values", main="Summer reading time")
abline(0, 0)
Following conditions were checked to evaluate whether the conditions of least squares regression are reasonable:
Linearity:the variable parent should help homework is linearly related to the reading score.
Nearly normal residuals:The distribution of residuals normal.
Constant variabilities:The variance around the line is constant.
Independent observation:Each student's parent's opinion on whether parent should help children's homework is indenpendent to each other.
4.1 Is there positive correlation between the parent volenteer time at school to the reading score?
sub_6 <- earlychildhood[which(earlychildhood$G2_2_Prthlp != "NA"),]
sub_6$G2_2_Prthlp <- as.numeric (sub_6$G2_2_Prthlp)
m_G2_2_Prthlp <- lm(sub_6$G2_2_READ ~ sub_6$G2_2_Prthlp)
m_G2_2_Prthlp
##
## Call:
## lm(formula = sub_6$G2_2_READ ~ sub_6$G2_2_Prthlp)
##
## Coefficients:
## (Intercept) sub_6$G2_2_Prthlp
## 1.9003 0.1356
the equation for the linear model:
$$\widehat{score_{prthlp}} = 1.9003 + 0.1356\times Prthlp $$
plot(jitter(sub_6$G2_2_READ, factor= 1.2) ~ jitter(sub_6$G2_2_Prthlp, factor=1.2))+
abline(m_G2_2_Prthlp)
## numeric(0)
cor(sub_6$G2_2_Prthlp, sub_6$G2_2_READ,use = "complete.obs")
## [1] 0.2006117
summary(m_G2_2_Prthlp)
##
## Call:
## lm(formula = sub_6$G2_2_READ ~ sub_6$G2_2_Prthlp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.66368 -0.35602 0.03544 0.41270 1.70060
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.90034 0.01436 132.36 <2e-16 ***
## sub_6$G2_2_Prthlp 0.13556 0.00598 22.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6245 on 12255 degrees of freedom
## (145 observations deleted due to missingness)
## Multiple R-squared: 0.04025, Adjusted R-squared: 0.04017
## F-statistic: 513.9 on 1 and 12255 DF, p-value: < 2.2e-16
The average reading score will increase 0.13556 for every one point increase in parents's volunteer time at school.
p-value is < 2.2e-16, which is less than 0.05 ,suggesting the parents's volunteer time at school is a statistically significant predictor. It may not be a practically significant predictor because there is very weak correlation between the average reading score and the parents's volunteer time at school with Multiple R-squared equals to 0.04025. For every one point increase in parents's volunteer time at school, the model only predicts an increase of 0.13556 in the average reading score, which barely changes the score.
4.2 Use residual plots to evaluate whether the conditions of least squares regression are reasonable.
yx.res <- resid(m_G2_2_Prthlp, na.rm=T)
par(mfrow = c(1,2))
hist(yx.res, xlab="Residuals", breaks = 10)
sub_6 <- sub_6[which(sub_6$G2_2_READ != "NA"), ]
sub_6$G2_2_Prthlp <- as.numeric(sub_6$G2_2_Prthlp)
a <- (1.9003 + 0.1356*sub_6$G2_2_Prthlp)
plot((1.9003 + 0.1356*sub_6$G2_2_Prthlp), yx.res, ylab="Residuals", xlab="fitted values", main="Parent school helping")
abline(0, 0)
Following conditions were checked to evaluate whether the conditions of least squares regression are reasonable:
Linearity:the variable parent should help homework is linearly related to the reading score.
Nearly normal residuals:The distribution of residuals normal.
Constant variabilities:The variance around the line is constant.
Independent observation:Each student's parent's opinion on whether parent should help children's homework is indenpendent to each other.
For the convenience of analysis, I extract the second grade academic outcomes and related variables.
# subset second grade and related vaviables
g2read <- c("CHILDID","G2_2_AGE","CHSEX","G2_2_READ","G2_2_Math","G2_2_SCI", "G2_2_DCCSSCR","G2_2_Sep1Cut","G2_2_CLSIt", "PreK", "G2_2_LOC", "G2_2_Prthlp", "G2_2_PrtWt","PrtEDU", "G2_2_Prtconf", "G2_2_Prtoph", "G2_2_Prtevt", "G2_2_Prtpsk", "Premath" ,"Prelitr" , "Prthw", "Prerc" ,"Sumsch","Summth", "Sumwrt", "Sumrd", "Sumreadtime","teachedu","teachjoy","teachdiff","teachboard")
sub_g2read <- subset(earlychildhood,,g2read)
sub_g2read$G2_2_CLSIt <- as.numeric(sub_g2read$G2_2_CLSIt)
## Warning: NAs introduced by coercion
sub_g2read$PreK <- as.numeric(sub_g2read$PreK)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_LOC <- as.numeric(sub_g2read$G2_2_LOC)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_Prthlp <- as.numeric(sub_g2read$G2_2_Prthlp)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_PrtWt <- as.numeric(sub_g2read$G2_2_PrtWt)
## Warning: NAs introduced by coercion
sub_g2read$PrtEDU <-as.numeric(sub_g2read$PrtEDU)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_Prtconf <-as.numeric(sub_g2read$G2_2_Prtconf)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_Prtoph <- as.numeric(sub_g2read$G2_2_Prtoph)
## Warning: NAs introduced by coercion
sub_g2read$G2_2_Prtevt <-as.numeric(sub_g2read$G2_2_Prtevt)
sub_g2read$G2_2_Prtpsk <- as.numeric(sub_g2read$G2_2_Prtpsk)
sub_g2read$Premath <- as.numeric(sub_g2read$Premath)
## Warning: NAs introduced by coercion
sub_g2read$Prelitr <- as.numeric(sub_g2read$Prelitr)
## Warning: NAs introduced by coercion
sub_g2read$Prthw <- as.numeric(sub_g2read$Prthw)
## Warning: NAs introduced by coercion
sub_g2read$Prerc <- as.numeric(sub_g2read$Prerc)
## Warning: NAs introduced by coercion
sub_g2read$Sumsch <- as.numeric(sub_g2read$Prerc)
sub_g2read$Summth <- as.numeric(sub_g2read$Summth)
## Warning: NAs introduced by coercion
sub_g2read$Sumwrt <- as.numeric(sub_g2read$Sumwrt)
## Warning: NAs introduced by coercion
sub_g2read$Sumrd <- as.numeric(sub_g2read$Sumrd)
## Warning: NAs introduced by coercion
sub_g2read$Sumreadtime <- as.numeric(sub_g2read$Sumreadtime)
## Warning: NAs introduced by coercion
Q1.1. Do parent involvement in preschool education influence child early academy outcomes?
Miedel and Reynolds (1999) detected positive associations between parent involvement in preschool and kindergarten and reading achievement in kindergarten and in eighth grade. I want to see whether parent involvement in child's preschool literacy education associates to second grade academic outcomes.
The data set contains several variables on the parent involvement related to early literacy: children go to preschool before K (G2_2_Prtpsk), preschool read/math good for school (Premath), have child know alphabet before K (Prelitr), parent provide homework time (Prthw), parent should read and count with child(Prerc).
Q1.1.1.First look at the relationship between one of these variables and reading score.
plot(sub_g2read$G2_2_READ ~ sub_g2read$G2_2_Prtpsk)
cor(sub_g2read$G2_2_READ, sub_g2read$G2_2_Prtpsk,use = "complete.obs")
## [1] 0.09137424
There is very weak relationship between children go to preschool before K and reading score.
cor(sub_g2read$G2_2_READ, sub_g2read$Premath,use = "complete.obs")
## [1] -0.09509362
cor(sub_g2read$G2_2_READ, sub_g2read$Prelitr,use = "complete.obs")
## [1] -0.04224958
cor(sub_g2read$G2_2_READ, sub_g2read$Prthw,use = "complete.obs")
## [1] -0.1001445
cor(sub_g2read$G2_2_READ, sub_g2read$Prerc,use = "complete.obs")
## [1] 0.003368523
There are very weak relationship (positive or negative) between all varaibles on parent involvement related to preschool literacy. All relationships between all varaibles on parent involvement related to early literacy could be seen by using the following command:
g2read_prt <- c("G2_2_READ","G2_2_Prtpsk", "Premath" ,"Prelitr" , "Prthw", "Prerc")
sub_g2read_prt <- subset(sub_g2read,,g2read_prt)
plot(sub_g2read_prt[,1:6])
Q1.1.2.Search for the best model
I will start with a full model that predicts reading score based on children go to preschool before K (G2_2_Prtpsk), preschool read/math good for school (Premath), have child know alphabet before K (Prelitr), parent provide homework time (Prthw), and parent should read and count with child(Prerc).
sub_q1 <- sub_g2read[!is.na(sub_g2read$G2_2_Prtpsk),]
sub_q1 <- sub_q1[!is.na(sub_q1$Premath),]
sub_q1 <- sub_q1[!is.na(sub_q1$Prelitr),]
sub_q1 <- sub_q1[!is.na(sub_q1$Prthw),]
sub_q1 <- sub_q1[!is.na(sub_q1$Prerc),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_READ),]
m_full <- lm(G2_2_READ ~ G2_2_Prtpsk + Premath + Prelitr + Prthw + Prerc, data = sub_q1)
summary(m_full)
##
## Call:
## lm(formula = G2_2_READ ~ G2_2_Prtpsk + Premath + Prelitr + Prthw +
## Prerc, data = sub_q1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.35048 -0.35064 0.04447 0.40997 1.78065
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.018191 0.101270 19.929 < 2e-16 ***
## G2_2_Prtpsk 0.137998 0.019098 7.226 5.54e-13 ***
## Premath -0.031712 0.009440 -3.359 0.000786 ***
## Prelitr 0.002319 0.009812 0.236 0.813161
## Prthw -0.043327 0.009802 -4.420 1.00e-05 ***
## Prerc 0.028466 0.019591 1.453 0.146272
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6277 on 6639 degrees of freedom
## Multiple R-squared: 0.01746, Adjusted R-squared: 0.01672
## F-statistic: 23.6 on 5 and 6639 DF, p-value: < 2.2e-16
Using backward-selection and p-value as the selection criterion, determine the best model. Drop the variable with the highest p-value, which are PreLitr and Prerc, and re-fit the model.
m_backward <- lm(G2_2_READ ~ G2_2_Prtpsk + Premath + Prthw, data = sub_q1)
summary(m_backward)
##
## Call:
## lm(formula = G2_2_READ ~ G2_2_Prtpsk + Premath + Prthw, data = sub_q1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.35153 -0.35053 0.04417 0.41251 1.78889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.143248 0.055338 38.730 < 2e-16 ***
## G2_2_Prtpsk 0.140231 0.019032 7.368 1.94e-13 ***
## Premath -0.030383 0.008587 -3.538 0.000405 ***
## Prthw -0.040514 0.009435 -4.294 1.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6277 on 6641 degrees of freedom
## Multiple R-squared: 0.01714, Adjusted R-squared: 0.01669
## F-statistic: 38.6 on 3 and 6641 DF, p-value: < 2.2e-16
After dropping PreLitr and Prerc, the coefficients and significance of the other variables were unchanged, suggesting that the dropped variable was not collinear with other variables.
linear model:
$$\widehat{Score}_{reading} = 2.143248 + 0.140231\times G2_2_Prtpsk - 0.030383\times Premath - 0.040514\times Prthw $$
3.Verify that the conditions for this model are reasonable using diagnostic plots.
3.1 Normal probability plot.
qqnorm(m_backward$residuals)
qqline(m_backward$residuals)
The residuals of the model are nearly normal as shown in the QQ plot. While there are a few observations that deviate noticeably from the line, they are not particularly extreme.
Q1.1.3.2 Absolute values of residuals against fitted values ($\hat{y_i}$).
fitted_backward <- 2.137331 + 0.139253* sub_q1$G2_2_Prtpsk -0.029061 * sub_q1$Premath - 0.040143 * sub_q1$Prthw
plot(round(fitted_backward,1),m_backward$residuals,ylab="Absolute value of residuals", xlab="Fitted values")
The plot shows that the variance of the residuals is approximately constant.
Residuals in order of their data collection is not applicable in this data set because we second grade scores are collected at the same time.
Q1.1.3.3 Residuals against each predictor variable.
boxplot(m_backward$residuals~G2_2_Prtpsk,data=sub_q1,ylab="Residuals", main="G2_2_Prtpsk")
plot(m_backward$residuals~Premath,data=sub_q1,ylab="Residuals", main="Premath")
plot(m_backward$residuals~Prthw,data=sub_q1,ylab="Residuals", main="Prthw")
The plot shows that the variance of the residuals is approximately constant except there are some deviations at low fitted values.
Based on the multiple linear model, the student who went to preschool before K and whose parent believe that preschool read/math is good for school and provide homework time to children will have higher reading score. But overall the correlation is weak (Multiple R-squared: 0.01714).
Q1.2. Do parent involvement in school activities influence child early academy outcomes?
School encourage parent involvement by inviting parents to participate in activities at school such as school open house, general school meetingand, regularly scheduled parent teacher meetings, and parent volunteering in the classroom etc. and also encourage facilitating parent-teacher communication. In this project, I will study contemporaneous association between parent involvement in school activities and reading achievement.
Q1.2.1.Search for the best model
full model: predicts reading score based on parent volenteering at school(G2_2_Prthlp), parent attending school conference(G2_2_Prtconf), parent attending school open house(G2_2_Prtoph), and parent attending school art/music events(G2_2_Prtevt).
sub_q1 <- sub_g2read[!is.na(sub_g2read$G2_2_Prthlp),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_Prtconf),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_Prtoph),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_Prtevt),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_READ),]
m_full <- lm(G2_2_READ ~ G2_2_Prthlp + G2_2_Prtconf + G2_2_Prtoph + G2_2_Prtevt, data = sub_q1)
summary(m_full)
##
## Call:
## lm(formula = G2_2_READ ~ G2_2_Prthlp + G2_2_Prtconf + G2_2_Prtoph +
## G2_2_Prtevt, data = sub_q1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.51444 -0.35236 0.03851 0.41254 1.72053
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.526989 0.028077 54.386 < 2e-16 ***
## G2_2_Prthlp 0.082554 0.006725 12.276 < 2e-16 ***
## G2_2_Prtconf 0.051295 0.006656 7.706 1.40e-14 ***
## G2_2_Prtoph 0.031835 0.006369 4.998 5.87e-07 ***
## G2_2_Prtevt 0.042076 0.005493 7.660 2.00e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6171 on 12103 degrees of freedom
## Multiple R-squared: 0.06441, Adjusted R-squared: 0.0641
## F-statistic: 208.3 on 4 and 12103 DF, p-value: < 2.2e-16
All variables are significant and there is no need to do backward-selection.
linear model:
$$\widehat{Score}_{reading} = 1.526989 + 0.082554\times G2_2_Prthlp + 0.051295\times G2_2_Prtconf + 0.031835\times G2_2_Prtoph + 0.042076\times G2_2_Prtevt$$
Q1.2.2.Verify that the conditions for this model are reasonable using diagnostic plots.
Q1.2.2.1 Normal probability plot.
qqnorm(m_full$residuals)
qqline(m_full$residuals)
The residuals of the model are nearly normal as shown in the QQ plot. While there are a few observations that deviate noticeably from the line, they are not particularly extreme.
Q1.2.2.2 Absolute values of residuals against fitted values ($\hat{y_i}$).
fitted_full <- 1.526989 + 0.082554*sub_q1$G2_2_Prthlp + 0.051295*sub_q1$G2_2_Prtconf + 0.0318358 *sub_q1$G2_2_Prtoph + 0.042076*sub_q1$G2_2_Prtevt
plot(round(fitted_full,1),m_full$residuals,ylab="Absolute value of residuals", xlab="Fitted values")
Q1.2.2.3 Residuals against each predictor variable.
plot(m_full$residuals~G2_2_Prthlp,data=sub_q1,ylab="Residuals", main="G2_2_Prthlp")
plot(m_full$residuals~G2_2_Prtconf,data=sub_q1,ylab="Residuals", main="G2_2_Prtconf")
plot(m_full$residuals~G2_2_Prtoph,data=sub_q1,ylab="Residuals", main="G2_2_Prtoph")
plot(m_full$residuals~G2_2_Prtevt,data=sub_q1,ylab="Residuals", main="G2_2_Prtevt")
The plots show that the variance of the residuals is approximately constant while there are some deviations at high (G2_2_Prthlp) and low (G2_2_Prtconf,G2_2_Prtoph) fitted values.
Based on the multiple linear model, the student whose parent actively volunteering at shcool, attending school conference, attending school open house, and attending school art/music events will have higher reading score. But overall the correlation is weak (Multiple R-squared: 0.06441).
Q1.3. Do parent involvement in summer learning influence child early academy outcomes?
Schacter and Jo demonstrated that summer learning improve the achievement of economically disadvantaged first graders.
Q1.3.1.Search for the best model
full model: predicts reading score based on child attending summer school (Sumsch), summer school math (Summth), doing wrting activity with child in summer(Sumwrt), summer school reading (Sumrd), how long read to child in summer (Sumreadtime).
sub_q1 <- sub_g2read[!is.na(sub_g2read$Sumsch),]
sub_q1 <- sub_q1[!is.na(sub_q1$Summth),]
sub_q1 <- sub_q1[!is.na(sub_q1$Sumwrt),]
sub_q1 <- sub_q1[!is.na(sub_q1$Sumrd),]
sub_q1 <- sub_q1[!is.na(sub_q1$Sumreadtime),]
sub_q1 <- sub_q1[!is.na(sub_q1$G2_2_READ),]
m_full <- lm(G2_2_READ ~ Sumsch + Summth + Sumwrt + Sumrd +Sumreadtime, data = sub_q1)
summary(m_full)
##
## Call:
## lm(formula = G2_2_READ ~ Sumsch + Summth + Sumwrt + Sumrd + Sumreadtime,
## data = sub_q1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.25695 -0.36256 0.03253 0.42795 1.63264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.97223 0.14247 13.843 < 2e-16 ***
## Sumsch -0.01305 0.02718 -0.480 0.63123
## Summth -0.02394 0.01583 -1.512 0.13061
## Sumwrt -0.04695 0.01483 -3.166 0.00156 **
## Sumrd 0.13134 0.01485 8.846 < 2e-16 ***
## Sumreadtime 0.02392 0.01444 1.657 0.09767 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6468 on 3554 degrees of freedom
## Multiple R-squared: 0.024, Adjusted R-squared: 0.02262
## F-statistic: 17.48 on 5 and 3554 DF, p-value: < 2.2e-16
Using backward-selection and p-value as the selection criterion, determine the best model. Drop the variable with the highest p-value, which are Sumsch, Summth and Sumreadtime, and re-fit the model.
m_backward <- lm(G2_2_READ ~ Sumwrt + Sumrd, data = sub_q1)
summary(m_backward)
##
## Call:
## lm(formula = G2_2_READ ~ Sumwrt + Sumrd, data = sub_q1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.26199 -0.35991 0.03373 0.42663 1.63298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.91269 0.05125 37.324 < 2e-16 ***
## Sumwrt -0.05484 0.01325 -4.140 3.55e-05 ***
## Sumrd 0.13111 0.01466 8.941 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6469 on 3557 degrees of freedom
## Multiple R-squared: 0.02267, Adjusted R-squared: 0.02212
## F-statistic: 41.26 on 2 and 3557 DF, p-value: < 2.2e-16
After dropping PreLitr and Prerc, the coefficients and significance of Sumrd was unchanged but Sumwrt became much smaller, suggesting that the dropped variable was collinear with Sumwrt.
linear model:
$$\widehat{Score}_{reading} = 1.91269 -0.05484\times Sumwrt +0.13111\times Sumrd $$
Q1.3.3.Verify that the conditions for this model are reasonable using diagnostic plots.
Q1.3.3.1 Normal probability plot.
qqnorm(m_backward$residuals)
qqline(m_backward$residuals)
The residuals of the model are nearly normal as shown in the QQ plot. While there are a few observations that deviate noticeably from the line, they are not particularly extreme.
Q1.3.3.2 Absolute values of residuals against fitted values ($\hat{y_i}$).
fitted_backward <- 1.91269 - 0.05484* sub_q1$Sumwrt + 0.13111* sub_q1$Sumrd
plot(round(fitted_backward,1),m_backward$residuals,ylab="Absolute value of residuals", xlab="Fitted values")
The plot shows that the variance of the residuals is approximately constant.
Q1.3.3.3 Residuals against each predictor variable.
plot(m_backward$residuals~Sumwrt,data=sub_q1,ylab="Residuals", main="Sumwrt")
plot(m_backward$residuals~Sumrd,data=sub_q1,ylab="Residuals", main="Sumrd")
The plot shows that the variance of the residuals is approximately constant.
Based on the multiple linear model, the student who do reading and writing activities will have higher reading score. But overall the correlation is weak (Multiple R-squared: 0.13111).
Q2. Does school have a rule of that student enter kindergarten must turn 5 before September 1st have effects on children's academic performance in the early childhood?
# In order to investigate the correlation between age and reading skill. the reading score data from first semester(K1), 2nd semester(K2), 4th semester(grade 1_2), 6th seester(grade2_2), and the data about September cutoff will be further cleaned. filter cases that don't have "NA".
# remove rows with NA in reading score
earlychildhood1 <- earlychildhood[!is.na(earlychildhood$K1_READ),]
earlychildhood2 <- earlychildhood[!is.na(earlychildhood$K2_READ),]
earlychildhood3 <- earlychildhood[!is.na(earlychildhood$G1_2_READ),]
earlychildhood4 <- earlychildhood[!is.na(earlychildhood$G2_2_READ),]
# remove rows with NA in September 1st cutoff
earlychildhood5 <- earlychildhood1[!is.na(earlychildhood1$G2_2_Sep1Cut_t),]
earlychildhood6 <- earlychildhood2[!is.na(earlychildhood2$G2_2_Sep1Cut_t),]
earlychildhood7 <- earlychildhood3[!is.na(earlychildhood3$G2_2_Sep1Cut_t),]
earlychildhood8 <- earlychildhood4[!is.na(earlychildhood4$G2_2_Sep1Cut_t),]
# remove rows with NA in PrtEDU
earlychildhood9 <- filter(earlychildhood1, PrtEDU != "NA")
earlychildhood10 <- filter(earlychildhood2, PrtEDU != "NA")
earlychildhood11 <- filter(earlychildhood3, PrtEDU != "NA")
earlychildhood12 <- filter(earlychildhood4, PrtEDU != "NA")
# statistic for explanatory variables
summary(earlychildhood5$G2_2_Sep1Cut)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3836 1.0000 1.0000
summary(earlychildhood6$G2_2_Sep1Cut)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3825 1.0000 1.0000
summary(earlychildhood7$G2_2_Sep1Cut)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3821 1.0000 1.0000
summary(earlychildhood8$G2_2_Sep1Cut)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3812 1.0000 1.0000
summary(earlychildhood9$PrtEDU)
## Length Class Mode
## 14509 character character
summary(earlychildhood10$PrtEDU)
## Length Class Mode
## 14389 character character
summary(earlychildhood11$PrtEDU)
## Length Class Mode
## 12580 character character
summary(earlychildhood12$PrtEDU)
## Length Class Mode
## 11529 character character
# statistic for response variable
summary(earlychildhood2$K2_READ)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.0000 0.0005 0.4950 0.4473 0.9352 2.9780
summary(earlychildhood2$K2_Math)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -5.86600 0.02135 0.52520 0.44730 0.95060 2.88200 43
summary(earlychildhood2$K2_SCI)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -2.40900 -0.56000 0.11450 -0.00694 0.65920 1.89000 249
summary(earlychildhood2$K2_DCCSTOT)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 15.00 15.00 15.15 17.00 18.00 43
summary(earlychildhood1$K1_READ)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.0850 -1.1360 -0.5695 -0.5422 -0.0060 2.9780
summary(earlychildhood1$K1_Math)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -5.8600 -1.0430 -0.3897 -0.4910 0.1369 5.3200 110
summary(earlychildhood1$K1_DCCSTOT)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 14.00 15.00 14.23 16.00 18.00 120
# example for summary of 1st semester reading score
# subest 2nd semester reading score and september 1st cutoff for statistic analysis
k2read <- data.frame("CHILDID"=earlychildhood5$CHILDID,"K2_READ"=earlychildhood5$K2_READ,"G2_2_Sep1Cut_t"=earlychildhood5$G2_2_Sep1Cut_t)
k2read <- k2read %>% mutate_if(is.factor, as.character)
# do the statistic on 2nd semester science score
stat <- k2read %>% group_by(G2_2_Sep1Cut_t) %>% summarise (mean=mean(K2_READ,na.rm=T),sd(K2_READ,na.rm=T), median=median(K2_READ,na.rm=T), min=min(K2_READ,na.rm=T),max=max(K2_READ,na.rm=T))
ggplot(data=subset(k2read,!is.na(k2read$K2_READ)),aes(factor(G2_2_Sep1Cut_t),K2_READ))+
geom_boxplot()+
xlab("September 1st Cut Off for Kindergarten") +
ylab("2011 Spring Kindergarten Reading Score")
Sep1cutyes <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == "yes"),]
Sep1cutno <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == 'no'),]
t.test(Sep1cutyes$G2_2_READ,Sep1cutyes$G2_2_READ)
##
## Welch Two Sample t-test
##
## data: Sep1cutyes$G2_2_READ and Sep1cutyes$G2_2_READ
## t = 0, df = 15414, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.02029484 0.02029484
## sample estimates:
## mean of x mean of y
## 2.168926 2.168926
Sep1cutyes <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == "yes"),]
Sep1cutno <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == 'no'),]
t.test(Sep1cutyes$G2_2_Math,Sep1cutyes$G2_2_Math)
##
## Welch Two Sample t-test
##
## data: Sep1cutyes$G2_2_Math and Sep1cutyes$G2_2_Math
## t = 0, df = 15408, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.02602507 0.02602507
## sample estimates:
## mean of x mean of y
## 2.425733 2.425733
Sep1cutyes <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == "yes"),]
Sep1cutno <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == 'no'),]
t.test(Sep1cutyes$G2_2_SCI,Sep1cutyes$G2_2_SCI)
##
## Welch Two Sample t-test
##
## data: Sep1cutyes$G2_2_SCI and Sep1cutyes$G2_2_SCI
## t = 0, df = 15400, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.03009313 0.03009313
## sample estimates:
## mean of x mean of y
## 1.569671 1.569671
Sep1cutyes <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == "yes"),]
Sep1cutno <- earlychildhood[which(earlychildhood$G2_2_Sep1Cut_t == 'no'),]
t.test(Sep1cutyes$G2_2_DCCSSCR,Sep1cutyes$G2_2_DCCSSCR)
##
## Welch Two Sample t-test
##
## data: Sep1cutyes$G2_2_DCCSSCR and Sep1cutyes$G2_2_DCCSSCR
## t = 0, df = 15342, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04300482 0.04300482
## sample estimates:
## mean of x mean of y
## 6.666743 6.666743
Comparing the academic outcomes of children from schools with September 1st cut off rule and from schools not having September 1st cut off rule, there is no difference in reading, math, science, and DCCSSCR score as p-value is > 0.05.
Q3. Does parents' education level have effects on children's academic performance in the early childhood?
# subest 6th semester reading score and parents eduction level for statistic analysis
g2edu <- data.frame("CHILDID"=earlychildhood12$CHILDID,"G2_2_READ"=earlychildhood12$G2_2_READ,"PrtEDU"=earlychildhood12$PrtEDU)
g2edu %>% mutate_if(is.factor, as.character) -> g2edu
ggplot(data=subset(g2edu,!is.na(g2edu$G2_2_READ)),aes(factor(PrtEDU),G2_2_READ))+
geom_boxplot(aes(fill=factor(PrtEDU)))+
xlab("Parent Education Level")+
ylab("2013 Spring 2nd Grade Reading Score")
# subest 6th semester academic scores and parents eduction level for statistic analysis
g2edu <- data.frame("CHILDID"=earlychildhood12$CHILDID,"G2_2_READ"=earlychildhood12$G2_2_READ, "G2_2_Math"=earlychildhood12$G2_2_Math, "G2_2_SCI"=earlychildhood12$G2_2_SCI,"G2_2_DCCSSCR"=earlychildhood12$G2_2_DCCSSCR,"PrtEDU"=earlychildhood12$PrtEDU)
g2edu %>% mutate_if(is.factor, as.character) -> g2edu
g2edu$PrtEDU <- str_replace_all(g2edu$PrtEDU,"2|3|4|5|6|7","above high school")
g2edu$PrtEDU <- str_replace_all(g2edu$PrtEDU,"1","below high school")
ggplot(data=subset(g2edu,!is.na(g2edu$G2_2_READ)),aes(factor(PrtEDU),G2_2_READ))+
geom_boxplot(aes(fill=factor(PrtEDU)))+
xlab("Parent Education Level")+
ylab("2013 Spring 2nd Grade Reading Score")+
labs(title = "Reading")
PrtblwHisch <- g2edu[which(g2edu$PrtEDU == "below high school"),]
PrtabvHisch <- g2edu[which(g2edu$PrtEDU == 'above high school'),]
t.test(PrtblwHisch$G2_2_READ,PrtabvHisch$G2_2_READ)
##
## Welch Two Sample t-test
##
## data: PrtblwHisch$G2_2_READ and PrtabvHisch$G2_2_READ
## t = -6.8754, df = 1063.7, p-value = 1.052e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1972397 -0.1096544
## sample estimates:
## mean of x mean of y
## 2.052743 2.206190
ggplot(data=subset(g2edu,!is.na(g2edu$G2_2_Math)),aes(factor(PrtEDU),G2_2_Math))+
geom_boxplot(aes(fill=factor(PrtEDU)))+
xlab("Parent Education Level")+
ylab("2013 Spring 2nd Grade Math Score")+
labs(title = "Math")
PrtblwHisch <- g2edu[which(g2edu$PrtEDU == "below high school"),]
PrtabvHisch <- g2edu[which(g2edu$PrtEDU == 'above high school'),]
t.test(PrtabvHisch$G2_2_Math,PrtblwHisch$G2_2_Math)
##
## Welch Two Sample t-test
##
## data: PrtabvHisch$G2_2_Math and PrtblwHisch$G2_2_Math
## t = 5.4003, df = 1076.1, p-value = 8.184e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0950871 0.2036204
## sample estimates:
## mean of x mean of y
## 2.464817 2.315463
ggplot(data=subset(g2edu,!is.na(g2edu$G2_2_SCI)),aes(factor(PrtEDU),G2_2_SCI))+
geom_boxplot(aes(fill=factor(PrtEDU)))+
xlab("Parent Education Level")+
ylab("2013 Spring 2nd Grade Reading Score")+
labs(title = "Science")
PrtblwHisch <- g2edu[which(g2edu$PrtEDU == "below high school"),]
PrtabvHisch <- g2edu[which(g2edu$PrtEDU == 'above high school'),]
t.test(PrtblwHisch$G2_2_SCI,PrtabvHisch$G2_2_SCI)
##
## Welch Two Sample t-test
##
## data: PrtblwHisch$G2_2_SCI and PrtabvHisch$G2_2_SCI
## t = -8.8582, df = 1029, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3900706 -0.2485934
## sample estimates:
## mean of x mean of y
## 1.304297 1.623629
ggplot(data=subset(g2edu,!is.na(g2edu$G2_2_DCCSSCR)),aes(factor(PrtEDU),G2_2_DCCSSCR))+
geom_boxplot(aes(fill=factor(PrtEDU)))+
xlab("Parent Education Level")+
ylab("2013 Spring 2nd Grade Reading Score")+
labs(title = "Dimentional Card Sort")
PrtblwHisch <- g2edu[which(g2edu$PrtEDU == "below high school"),]
PrtabvHisch <- g2edu[which(g2edu$PrtEDU == 'above high school'),]
t.test(PrtblwHisch$G2_2_DCCSSCR,PrtabvHisch$G2_2_DCCSSCR)
##
## Welch Two Sample t-test
##
## data: PrtblwHisch$G2_2_DCCSSCR and PrtabvHisch$G2_2_DCCSSCR
## t = -1.5261, df = 1063.5, p-value = 0.1273
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.16343219 0.02043077
## sample estimates:
## mean of x mean of y
## 6.638675 6.710176
There is significant difference in reading, math, and science scores between the students whose parents highest education level is below high school or beyond high school. However, it is very interesting to see that there is no significant difference in dimentional card sort scores between these two categories.
1.Correlations between parent involvement in preschool literacy, scholl activities, and summer learnings and the reading score are weak. So variables on parent involvement related to early literacy are not appropriate predictors to early childhoold academy outcomes(reading sscore).
2.Whether schools has September 1st cut off rule for kindergarten entry or not did not influence students academic outcomes.
3.When looking at the influence of parents' highest education level on the academic outcomes, there was a clear difference in reading, math, and science scores between students whose parents finished high school and those whose parents did not finish high school. But the Dimentional Card Sort score of the students of these two caterogies are the same. As math and science both strongly and positively associates with reading score, reading, math, and science are more litercy-related. However, Dimentional Card Sort has weak correlation to reading score and it may largly reflects IQ. These observations suggested that parents' education level will have positive influence on literacy but not IQ. Literacy intervention in early childhood may improve the achievement of the students who are not performing well in reading, math and science but with no problem in sloving problems like Dimentional Card Sorting.
1.Early Childhood Longitudinal Study (ECLS) program collected by the National Center for Educational Statistics(NCES)(http://nces.ed.gov//)
2.ECLS-K:2011 Kindergarten User's Manual, Public Version PDF File. (https://nces.ed.gov/ecls/dataproducts.asp)
3.ECLS-K:2011 Kindergarten-Second Grade User's Manual, Public Version PDF File.(https://nces.ed.gov/ecls/dataproducts.asp)
Miedel WT, Reynolds AJ. Parent involvement in early intervention for disadvantaged children: Does it matter? Journal of School Psychology. 1999 ; 37: p379-402.
John Schacter and Booli Jo, Learning when school is not in session; a reading summer day-camp intervention to improve the achievement of exiting First-Grade students who are economically disadvantaged. Journal of Research in Reading. 2005; 28(2), p158-169.