You must follow the instructions below to get credits for this assignment.
The Early Childhood Longitudinal Study (ECLS) was undertaken by The U.S. Department of Education in the late 1990s to record a an extensive set of data of American school children. The subjects of the ECLS were 20,000 school children from kindergarten to fifth grade. Their were hundreds of variables recorded from age to grade level, and even if the parents of the children were brought to museums or libraries. The data was collected to be used in regression analysis to find the correlation of different variables involved and to see if some of them ended up to just be confounding variables.
Regression analysis is a way to sort out enormous piles of data by holding all variables in the set at a constant except for the two variables that are being researched to find a correlation. It is utilized in this way become economist often do not have the ability to manipulate variables in a way like physicist or biologists do when the set up experiments with samples. The data an economist works with is not randomly generated and regression analysis offers a way to more easily and effectively find related variables.
Regression analysis is a very limited tool to be utilized. There a questions that simply cannot be answered by using this process resulting in you having to understand exactly what you are trying to figure out to get the answer you are looking for. The reading’s example was that regression analysis can not prove that having a lot of books in your home will cause your child to do well in school, but it would be able to show if a child in a home with lots of books did better in school than a child in a home with no books. Regression analysis can show that variables are positively or negatively correlated but it can not prove that one variable directly causes another. So, if when X increases then Y increases and vice versa with decreasing, this analysis can show the variables correlation but It does not directly prove that an increase in X causes an increase in Y.
Higher quality schools correlate with better education. The ECLS has shown that black students are more likely to go to lower quality schools than white students due to the correlation with other variables. When all variables that increased the gap between black and white students were eliminated, it was shown that they all went into kindergarten with the same abilities. However, after kindergarten the gap between black and white students steadily increased with white students being found to have higher abilities than black students of the same age. It was also written that all students who attend low quality schools showed lower performance to those in higher quality schools. It also stated that black students who go to higher quality schools outperform white students who attend a low quality school.
To control for the quality of school, based on the data set of ECLS I would use the second variable against quality to be the student’s parent’s income. Set every other variable as constant among the students. I think that parent’s income would have a strong correlation with quality of schools due to parents being able to pay for private institutions and schools that have better access to supplies and funding. Parent’s with lower income would have less of a choice in which school they get to send their children to, resulting in many of them going to the nearest and the cheapest with very few luxuries.
The main things I took from the reading was how useful regression analysis can be, albeit how limited it also is. Regression analysis seems like it would be an overly complicated tool but the fact that you can use RStudio to manipulate and control your data set to find the correlation between your variables. Also, how important it really is to look at variables that negatively correlate in some situations to see what the underlying problems might be, i.e. the majority of black students performing lower than their white counterparts.