The STAR (Student-Teacher Achievement Ratio) Project is a four-year longitudinal study examining the effect of class size in early grade levels on educational performance and personal development.5 A longitudinal study is one in which the same participants are followed over time. This particular study lasted from 1985 to 1989 and involved 11,601 students. During the four years of the study, students were randomly assigned to small classes, regular-sized classes, or regular-sized classes with an aid. In all, the experiment cost around $12 million. Even though the program stopped in 1989 after the first kindergarten class in the program finished third grade, the collection of various measurements (e.g., performance on tests in eighth grade, overall high-school GPA) continued through to the end of participants’ high-school attendance. We will analyze just a portion of this data to investigate whether the small class sizes improved educational performance or not. The data file name is STAR.csv, which is in CSV format. The names and descriptions of variables in this data set are displayed in table 2.6. Note that there are a fair amount of missing values in this data set, which arise, for example, because some students left a STAR school before third grade, or did not enter a STAR school until first grade.
Import the data
# s <- read.csv("C:/Users/poppr/Desktop/STAR.csv", header=TRUE)
Alternatively –> Load your data from GitHub
library(RCurl)
## Warning: package 'RCurl' was built under R version 3.3.3
## Loading required package: bitops
x <- getURL("https://raw.githubusercontent.com/kosukeimai/qss/master/CAUSALITY/STAR.csv")
STAR <- read.csv(text = x)
head(STAR)
## race classtype yearssmall hsgrad g4math g4reading
## 1 1 3 0 NA NA NA
## 2 2 3 0 NA 706 661
## 3 1 3 0 1 711 750
## 4 2 1 4 NA 672 659
## 5 1 2 0 NA NA NA
## 6 1 3 0 NA NA NA
Create a new factor variable called kinder in the data frame. This variable should recode classtype by changing integer values to their corresponding informative labels (e.g., change 1 to small etc.). Similarly, recode the race variable into a factor variable with four levels (white, black, hispanic, others) by combining the Asian and Native American categories with the others category. For the race variable, overwrite the original variable in the data frame rather than creating a new one. Recall that na.rm = TRUE can be added to functions in order to remove missing data (see section 1.3.5).
Create a new factor variable “kinder”
class(STAR$classtype)
## [1] "integer"
STAR$kinder <- NA
STAR$kinder[STAR$classtype=="1"] <- "small"
STAR$kinder[STAR$classtype=="2"] <- "regular"
STAR$kinder[STAR$classtype=="3"] <- "regularwithaid"
table(STAR$kinder)
##
## regular regularwithaid small
## 2194 2231 1900
Convert to factor variable
STAR$kinder <- as.factor(STAR$kinder)
class(STAR$kinder)
## [1] "factor"
Create “race” label variable. It’s never a good idea to overwrite the original variable, so I won’t do that.
table(STAR$race)
##
## 1 2 3 4 5 6
## 4234 2058 14 5 2 9
STAR$race2 <- NA
STAR$race2[STAR$race=="1"] <- "white"
STAR$race2[STAR$race=="2"] <- "black"
STAR$race2[STAR$race=="4"] <- "hispanic"
STAR$race2[STAR$race=="3" | STAR$race=="5" | STAR$race=="6"] <- "others"
class(STAR$race2)
## [1] "character"
STAR$race2 <- as.factor(STAR$race2)
class(STAR$race2)
## [1] "factor"
table(STAR$race2, exclude = NULL)
##
## black hispanic others white <NA>
## 2058 5 25 4234 3
How to delete a variable from a dataframe
STAR$kindr <- NA
STAR$kindr <- NULL
How does performance on fourth-grade reading and math tests for those students assigned to a small class in kindergarten compare with those assigned to a regular sized class? Do students in the smaller classes perform better? Use means to make this comparison while removing missing values. Give a brief substantive interpretation of the results. To understand the size of the estimated effects, compare them with the standard deviation of the test scores.
Difference in means for reading for students assigned to small and regular classes
tapply(STAR$g4reading, STAR$kinder, mean, na.rm = TRUE)
## regular regularwithaid small
## 719.8900 720.7155 723.3912
Difference in means for maths for students assigned to small and regular classes
tapply(STAR$g4math, STAR$kinder, mean, na.rm = TRUE)
## regular regularwithaid small
## 709.5214 707.6335 709.1851
in the book chapter they take this approach:
mean(STAR$g4reading[STAR$kinder=="small"], na.rm = TRUE) -
mean(STAR$g4reading[STAR$kinder=="regular"], na.rm = TRUE)
## [1] 3.501232
mean(STAR$g4math[STAR$kinder=="small"], na.rm = TRUE) -
mean(STAR$g4math[STAR$kinder=="regular"], na.rm = TRUE)
## [1] -0.3362425
Standard deviations for reading and maths scores
sd(STAR$g4reading, na.rm = TRUE)
## [1] 52.42592
sd(STAR$g4math, na.rm = TRUE)
## [1] 43.09217
range(STAR$g4math, na.rm=TRUE)
## [1] 487 821
Instead of just comparing average scores of reading and math tests between those students assigned to small classes and those assigned to regular-sized classes, look at the entire range of possible scores. To do so, compare a high score, defined as the 66th percentile, and a low score (the 33rd percentile) for small classes with the corresponding score for regular classes. These are examples of quantile treatment effects. Does this analysis add anything to the analysis based on mean in the previous question?
Reading scores for students assigned to small and regular classes (quantiles)
summary(STAR$g4reading)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 528.0 696.0 723.0 721.2 750.0 836.0 3972
summary(STAR$g4reading[STAR$kinder=="small"])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 528.0 697.0 724.0 723.4 750.0 836.0 1174
summary(STAR$g4reading[STAR$kinder=="regular"])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 528.0 693.0 723.0 719.9 749.2 836.0 1358
Maths scores for students assigned to small and regular classes (quantiles)
summary(STAR$g4math)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 487.0 688.0 710.0 708.8 732.5 821.0 3930
summary(STAR$g4math[STAR$kinder=="small"])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 487.0 686.8 710.0 709.2 736.2 821.0 1160
summary(STAR$g4math[STAR$kinder=="regular"])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 487.0 688.0 710.0 709.5 731.8 821.0 1352
they’ve asked for 33 and 66 quantile
quantile(STAR$g4reading[STAR$kinder=="small"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66%
## 705 741
quantile(STAR$g4reading[STAR$kinder=="regular"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66%
## 705 740
quantile(STAR$g4math[STAR$kinder=="small"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66%
## 694 726
quantile(STAR$g4math[STAR$kinder=="regular"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66%
## 696 724
Some students were in small classes for all four years that the STAR program ran. Others were assigned to small classes for only one year and had either regularsized classes or regular-sized classes with an aid for the rest. How many students of each type are in the data set? Create a contingency table of proportions using the kinder and yearssmall variables. Does participation in more years of small classes make a greater difference in test scores? Compare the average and median reading and math test scores across students who spent different numbers of years in small classes.
Contingency table of proportions by years stent in small classes
summary(STAR$yearssmall)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.9542 2.0000 4.0000
table(STAR$yearssmall)
##
## 0 1 2 3 4
## 3957 768 390 353 857
# or in percentages
prop.table(table(STAR$yearssmall))
##
## 0 1 2 3 4
## 0.62561265 0.12142292 0.06166008 0.05581028 0.13549407
prop.table(table(STAR$kinder, STAR$yearssmall))
##
## 0 1 2 3
## regular 0.310039526 0.015019763 0.009169960 0.012648221
## regularwithaid 0.315573123 0.015335968 0.009486166 0.012332016
## small 0.000000000 0.091067194 0.043003953 0.030830040
##
## 4
## regular 0.000000000
## regularwithaid 0.000000000
## small 0.135494071
make the table nicer, by naming the variables
prop.table(table("Class Type" = STAR$kinder,
"Number of Years in small classes" = STAR$yearssmall), 2)
## Number of Years in small classes
## Class Type 0 1 2 3 4
## regular 0.4955775 0.1236979 0.1487179 0.2266289 0.0000000
## regularwithaid 0.5044225 0.1263021 0.1538462 0.2209632 0.0000000
## small 0.0000000 0.7500000 0.6974359 0.5524079 1.0000000
# use the round() function to round the values
round(prop.table(table("Class Type" = STAR$kinder,
"Number of Years in small classes" = STAR$yearssmall), 2) *100)
## Number of Years in small classes
## Class Type 0 1 2 3 4
## regular 50 12 15 23 0
## regularwithaid 50 13 15 22 0
## small 0 75 70 55 100
Mean/median reading score across years in small classes
tapply(STAR$g4reading, STAR$yearssmall, mean, na.rm = TRUE)
## 0 1 2 3 4
## 719.8754 723.1471 717.8681 719.8986 724.6651
tapply(STAR$g4reading, STAR$yearssmall, median, na.rm = TRUE)
## 0 1 2 3 4
## 722.0 724.5 720.0 721.0 726.0
Mean/median maths score across years in small classes
tapply(STAR$g4math, STAR$yearssmall, mean, na.rm = TRUE)
## 0 1 2 3 4
## 707.9793 707.5524 711.9140 709.6170 710.0519
tapply(STAR$g4math, STAR$yearssmall, median, na.rm = TRUE)
## 0 1 2 3 4
## 710 709 714 712 711
Examine whether the STAR program reduced achievement gaps across different racial groups. Begin by comparing the average reading and math test scores between white and minority students (i.e., blacks and Hispanics) among those students who were assigned to regular-sized classes with no aid. Conduct the same comparison among those students who were assigned to small classes. Give a brief substantive interpretation of the results of your analysis. Examine the achievement gap (scores) for white and minority students in small and regular classes
tapply(STAR$g4reading, STAR$race, mean, na.rm = TRUE)
## 1 2 3 4 5 6
## 726.1261 692.6239 739.8571 737.5000 NaN 807.0000
Subset the data
white <- subset(STAR, race2=="white")
minority <- subset(STAR, race2 =="black" | race2 =="hispanic")
Racial gap in reading among students in regular classes
tapply(white$g4reading, white$kinder=="regular", mean, na.rm = TRUE)
## FALSE TRUE
## 726.6836 725.1158
Another way to look at it, using the approach from the book
mean(white$g4reading[white$kinder=="regular"], na.rm = TRUE) -
mean(minority$g4reading[minority$kinder=="regular"], na.rm = TRUE)
## [1] 35.76098
Racial gap in reading among students in small classes
mean(white$g4reading[white$kinder=="small"], na.rm = TRUE) -
mean(minority$g4reading[minority$kinder=="small"], na.rm = TRUE)
## [1] 28.55433
Racial gap in maths among students in regular classes
mean(white$g4math[white$kinder=="regular"], na.rm = TRUE) -
mean(minority$g4math[minority$kinder=="regular"], na.rm = TRUE)
## [1] 12.87811
Racial gap in maths among students in small classes
mean(white$g4math[white$kinder=="small"], na.rm = TRUE) -
mean(minority$g4math[minority$kinder=="small"], na.rm = TRUE)
## [1] 12.96779
Consider the long-term effects of kindergarten class size. Compare high-school graduation rates across students assigned to different class types. Also, examine whether graduation rates differ depending on the number of years spent in small classes. Finally, as in the previous question, investigate whether the STAR program has reduced the racial gap between white and minority students’ graduation rates. Briefly discuss the results.
Graduation rate by class size
tapply(STAR$hsgrad, STAR$kinder, mean, na.rm = TRUE)
## regular regularwithaid small
## 0.8251619 0.8392857 0.8359202
Graduation rate by years spent in small class
tapply(STAR$hsgrad, STAR$yearssmall, mean, na.rm = TRUE)
## 0 1 2 3 4
## 0.8286020 0.7910448 0.8131868 0.8324607 0.8775510
Racial gap in graduation rates
mean(white$hsgrad[white$kinder=="regular"], na.rm = TRUE) -
mean(minority$hsgrad[minority$kinder=="regular"], na.rm = TRUE)
## [1] 0.1173787
mean(white$hsgrad[white$kinder=="small"], na.rm = TRUE) -
mean(minority$hsgrad[minority$kinder=="small"], na.rm = TRUE)
## [1] 0.122789
Alternatively, a more efficient method with tapply()
tapply(white$hsgrad, white$kinder, mean, na.rm = TRUE) -
tapply(minority$hsgrad, minority$kinder, mean, na.rm = TRUE)
## regular regularwithaid small
## 0.1173787 0.1440545 0.1227890