CHAPTER 2. CAUSALITY

2.8.1. EFFICACY OF SMALL-CLASS SIZE IN EARLY EDUCATION

The STAR (Student-Teacher Achievement Ratio) Project is a four-year longitudinal study examining the effect of class size in early grade levels on educational performance and personal development.5 A longitudinal study is one in which the same participants are followed over time. This particular study lasted from 1985 to 1989 and involved 11,601 students. During the four years of the study, students were randomly assigned to small classes, regular-sized classes, or regular-sized classes with an aid. In all, the experiment cost around $12 million. Even though the program stopped in 1989 after the first kindergarten class in the program finished third grade, the collection of various measurements (e.g., performance on tests in eighth grade, overall high-school GPA) continued through to the end of participants’ high-school attendance. We will analyze just a portion of this data to investigate whether the small class sizes improved educational performance or not. The data file name is STAR.csv, which is in CSV format. The names and descriptions of variables in this data set are displayed in table 2.6. Note that there are a fair amount of missing values in this data set, which arise, for example, because some students left a STAR school before third grade, or did not enter a STAR school until first grade.

Import the data

# s <- read.csv("C:/Users/poppr/Desktop/STAR.csv", header=TRUE)

Alternatively –> Load your data from GitHub

library(RCurl)
## Warning: package 'RCurl' was built under R version 3.3.3
## Loading required package: bitops
x <- getURL("https://raw.githubusercontent.com/kosukeimai/qss/master/CAUSALITY/STAR.csv")
STAR <- read.csv(text = x)
head(STAR)
##   race classtype yearssmall hsgrad g4math g4reading
## 1    1         3          0     NA     NA        NA
## 2    2         3          0     NA    706       661
## 3    1         3          0      1    711       750
## 4    2         1          4     NA    672       659
## 5    1         2          0     NA     NA        NA
## 6    1         3          0     NA     NA        NA

EXERCISE 1

Create a new factor variable called kinder in the data frame. This variable should recode classtype by changing integer values to their corresponding informative labels (e.g., change 1 to small etc.). Similarly, recode the race variable into a factor variable with four levels (white, black, hispanic, others) by combining the Asian and Native American categories with the others category. For the race variable, overwrite the original variable in the data frame rather than creating a new one. Recall that na.rm = TRUE can be added to functions in order to remove missing data (see section 1.3.5).

Create a new factor variable “kinder”

class(STAR$classtype)
## [1] "integer"
STAR$kinder <- NA
STAR$kinder[STAR$classtype=="1"] <- "small"
STAR$kinder[STAR$classtype=="2"] <- "regular"
STAR$kinder[STAR$classtype=="3"] <- "regularwithaid"
table(STAR$kinder)
## 
##        regular regularwithaid          small 
##           2194           2231           1900

Convert to factor variable

STAR$kinder <- as.factor(STAR$kinder)
class(STAR$kinder)
## [1] "factor"

Create “race” label variable. It’s never a good idea to overwrite the original variable, so I won’t do that.

table(STAR$race)
## 
##    1    2    3    4    5    6 
## 4234 2058   14    5    2    9
STAR$race2 <- NA
STAR$race2[STAR$race=="1"] <- "white"
STAR$race2[STAR$race=="2"] <- "black"
STAR$race2[STAR$race=="4"] <- "hispanic"
STAR$race2[STAR$race=="3" | STAR$race=="5" | STAR$race=="6"] <- "others"

class(STAR$race2)
## [1] "character"
STAR$race2 <- as.factor(STAR$race2)
class(STAR$race2)
## [1] "factor"
table(STAR$race2, exclude = NULL)
## 
##    black hispanic   others    white     <NA> 
##     2058        5       25     4234        3

How to delete a variable from a dataframe

STAR$kindr <- NA
STAR$kindr <- NULL

EXERCISE 2

How does performance on fourth-grade reading and math tests for those students assigned to a small class in kindergarten compare with those assigned to a regular sized class? Do students in the smaller classes perform better? Use means to make this comparison while removing missing values. Give a brief substantive interpretation of the results. To understand the size of the estimated effects, compare them with the standard deviation of the test scores.

Difference in means for reading for students assigned to small and regular classes

tapply(STAR$g4reading, STAR$kinder, mean, na.rm = TRUE)
##        regular regularwithaid          small 
##       719.8900       720.7155       723.3912

Difference in means for maths for students assigned to small and regular classes

tapply(STAR$g4math, STAR$kinder, mean, na.rm = TRUE)
##        regular regularwithaid          small 
##       709.5214       707.6335       709.1851

in the book chapter they take this approach:

mean(STAR$g4reading[STAR$kinder=="small"], na.rm = TRUE) -
mean(STAR$g4reading[STAR$kinder=="regular"], na.rm = TRUE)
## [1] 3.501232
mean(STAR$g4math[STAR$kinder=="small"], na.rm = TRUE) -
mean(STAR$g4math[STAR$kinder=="regular"], na.rm = TRUE)
## [1] -0.3362425

Standard deviations for reading and maths scores

sd(STAR$g4reading, na.rm = TRUE)
## [1] 52.42592
sd(STAR$g4math, na.rm = TRUE)
## [1] 43.09217
range(STAR$g4math, na.rm=TRUE)
## [1] 487 821

EXERCISE 3

Instead of just comparing average scores of reading and math tests between those students assigned to small classes and those assigned to regular-sized classes, look at the entire range of possible scores. To do so, compare a high score, defined as the 66th percentile, and a low score (the 33rd percentile) for small classes with the corresponding score for regular classes. These are examples of quantile treatment effects. Does this analysis add anything to the analysis based on mean in the previous question?

Reading scores for students assigned to small and regular classes (quantiles)

summary(STAR$g4reading)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   528.0   696.0   723.0   721.2   750.0   836.0    3972
summary(STAR$g4reading[STAR$kinder=="small"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   528.0   697.0   724.0   723.4   750.0   836.0    1174
summary(STAR$g4reading[STAR$kinder=="regular"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   528.0   693.0   723.0   719.9   749.2   836.0    1358

Maths scores for students assigned to small and regular classes (quantiles)

summary(STAR$g4math)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   487.0   688.0   710.0   708.8   732.5   821.0    3930
summary(STAR$g4math[STAR$kinder=="small"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   487.0   686.8   710.0   709.2   736.2   821.0    1160
summary(STAR$g4math[STAR$kinder=="regular"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   487.0   688.0   710.0   709.5   731.8   821.0    1352

they’ve asked for 33 and 66 quantile

quantile(STAR$g4reading[STAR$kinder=="small"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66% 
## 705 741
quantile(STAR$g4reading[STAR$kinder=="regular"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66% 
## 705 740
quantile(STAR$g4math[STAR$kinder=="small"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66% 
## 694 726
quantile(STAR$g4math[STAR$kinder=="regular"], probs = c(0.33, 0.66), na.rm = TRUE)
## 33% 66% 
## 696 724

EXERCISE 4

Some students were in small classes for all four years that the STAR program ran. Others were assigned to small classes for only one year and had either regularsized classes or regular-sized classes with an aid for the rest. How many students of each type are in the data set? Create a contingency table of proportions using the kinder and yearssmall variables. Does participation in more years of small classes make a greater difference in test scores? Compare the average and median reading and math test scores across students who spent different numbers of years in small classes.

Contingency table of proportions by years stent in small classes

summary(STAR$yearssmall)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.9542  2.0000  4.0000
table(STAR$yearssmall)
## 
##    0    1    2    3    4 
## 3957  768  390  353  857
# or in percentages
prop.table(table(STAR$yearssmall))
## 
##          0          1          2          3          4 
## 0.62561265 0.12142292 0.06166008 0.05581028 0.13549407
prop.table(table(STAR$kinder, STAR$yearssmall))
##                 
##                            0           1           2           3
##   regular        0.310039526 0.015019763 0.009169960 0.012648221
##   regularwithaid 0.315573123 0.015335968 0.009486166 0.012332016
##   small          0.000000000 0.091067194 0.043003953 0.030830040
##                 
##                            4
##   regular        0.000000000
##   regularwithaid 0.000000000
##   small          0.135494071

make the table nicer, by naming the variables

prop.table(table("Class Type" = STAR$kinder,
                 "Number of Years in small classes" = STAR$yearssmall), 2)
##                 Number of Years in small classes
## Class Type               0         1         2         3         4
##   regular        0.4955775 0.1236979 0.1487179 0.2266289 0.0000000
##   regularwithaid 0.5044225 0.1263021 0.1538462 0.2209632 0.0000000
##   small          0.0000000 0.7500000 0.6974359 0.5524079 1.0000000
# use the round() function to round the values
round(prop.table(table("Class Type" = STAR$kinder,
                 "Number of Years in small classes" = STAR$yearssmall), 2) *100)
##                 Number of Years in small classes
## Class Type         0   1   2   3   4
##   regular         50  12  15  23   0
##   regularwithaid  50  13  15  22   0
##   small            0  75  70  55 100

Mean/median reading score across years in small classes

tapply(STAR$g4reading, STAR$yearssmall, mean, na.rm = TRUE)
##        0        1        2        3        4 
## 719.8754 723.1471 717.8681 719.8986 724.6651
tapply(STAR$g4reading, STAR$yearssmall, median, na.rm = TRUE)
##     0     1     2     3     4 
## 722.0 724.5 720.0 721.0 726.0

Mean/median maths score across years in small classes

tapply(STAR$g4math, STAR$yearssmall, mean, na.rm = TRUE)
##        0        1        2        3        4 
## 707.9793 707.5524 711.9140 709.6170 710.0519
tapply(STAR$g4math, STAR$yearssmall, median, na.rm = TRUE)
##   0   1   2   3   4 
## 710 709 714 712 711

EXERCISE 5

Examine whether the STAR program reduced achievement gaps across different racial groups. Begin by comparing the average reading and math test scores between white and minority students (i.e., blacks and Hispanics) among those students who were assigned to regular-sized classes with no aid. Conduct the same comparison among those students who were assigned to small classes. Give a brief substantive interpretation of the results of your analysis. Examine the achievement gap (scores) for white and minority students in small and regular classes

tapply(STAR$g4reading, STAR$race, mean, na.rm = TRUE)
##        1        2        3        4        5        6 
## 726.1261 692.6239 739.8571 737.5000      NaN 807.0000

Subset the data

white <- subset(STAR, race2=="white")
minority <- subset(STAR, race2 =="black" | race2 =="hispanic")

Racial gap in reading among students in regular classes

tapply(white$g4reading, white$kinder=="regular", mean, na.rm = TRUE)
##    FALSE     TRUE 
## 726.6836 725.1158

Another way to look at it, using the approach from the book

mean(white$g4reading[white$kinder=="regular"], na.rm = TRUE) -
mean(minority$g4reading[minority$kinder=="regular"], na.rm = TRUE)
## [1] 35.76098

Racial gap in reading among students in small classes

mean(white$g4reading[white$kinder=="small"],  na.rm = TRUE) -
mean(minority$g4reading[minority$kinder=="small"], na.rm = TRUE)
## [1] 28.55433

Racial gap in maths among students in regular classes

mean(white$g4math[white$kinder=="regular"],  na.rm = TRUE) -
mean(minority$g4math[minority$kinder=="regular"], na.rm = TRUE)
## [1] 12.87811

Racial gap in maths among students in small classes

mean(white$g4math[white$kinder=="small"],  na.rm = TRUE) -
mean(minority$g4math[minority$kinder=="small"], na.rm = TRUE)
## [1] 12.96779

EXERCISE 6

Consider the long-term effects of kindergarten class size. Compare high-school graduation rates across students assigned to different class types. Also, examine whether graduation rates differ depending on the number of years spent in small classes. Finally, as in the previous question, investigate whether the STAR program has reduced the racial gap between white and minority students’ graduation rates. Briefly discuss the results.

Graduation rate by class size

tapply(STAR$hsgrad, STAR$kinder, mean, na.rm = TRUE)
##        regular regularwithaid          small 
##      0.8251619      0.8392857      0.8359202

Graduation rate by years spent in small class

tapply(STAR$hsgrad, STAR$yearssmall, mean, na.rm = TRUE)
##         0         1         2         3         4 
## 0.8286020 0.7910448 0.8131868 0.8324607 0.8775510

Racial gap in graduation rates

mean(white$hsgrad[white$kinder=="regular"], na.rm = TRUE) -
mean(minority$hsgrad[minority$kinder=="regular"], na.rm = TRUE)
## [1] 0.1173787
mean(white$hsgrad[white$kinder=="small"], na.rm = TRUE) -
mean(minority$hsgrad[minority$kinder=="small"], na.rm = TRUE)
## [1] 0.122789

Alternatively, a more efficient method with tapply()

tapply(white$hsgrad, white$kinder, mean, na.rm = TRUE) - 
  tapply(minority$hsgrad, minority$kinder, mean, na.rm = TRUE)
##        regular regularwithaid          small 
##      0.1173787      0.1440545      0.1227890