Overview

This project will examine if there is a relationship between a student’s race and their performance on standardized testing. It will also see if the results remain true depending on if the student attended public or private school.

Introduction

Standardized testing is used as a way to compare students’ performance across the country. This project will look to see if standardized testing is a fair measure of performance among different races and if the school setting has any impact on student’s performance.

The data used for this project was obtained using the results of the High School and Beyond Survey data available in the openintro dataset within R. The data was collected by the National Center of Education Statistics through a survey conducted on high school seniors.

Exploring the Data

#Load OpenIntro Library
library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
## 
##     cars, trees
#Store data in environment
hsb2 <- hsb2

#View the structure of the dataset
str(hsb2)
## 'data.frame':    200 obs. of  11 variables:
##  $ id     : int  70 121 86 141 172 113 50 11 84 48 ...
##  $ gender : chr  "male" "female" "male" "male" ...
##  $ race   : chr  "white" "white" "white" "white" ...
##  $ ses    : Factor w/ 3 levels "low","middle",..: 1 2 3 3 2 2 2 2 2 2 ...
##  $ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
##  $ prog   : Factor w/ 3 levels "general","academic",..: 1 3 1 3 2 2 1 2 1 2 ...
##  $ read   : int  57 68 44 63 47 44 50 34 63 57 ...
##  $ write  : int  52 59 33 44 52 52 59 46 57 55 ...
##  $ math   : int  41 53 54 47 57 51 42 45 54 52 ...
##  $ science: int  47 63 58 53 53 63 53 39 58 50 ...
##  $ socst  : int  57 61 31 56 61 61 61 36 51 51 ...
#Preview of the first few lines of data
head(hsb2)
##    id gender  race    ses schtyp       prog read write math science socst
## 1  70   male white    low public    general   57    52   41      47    57
## 2 121 female white middle public vocational   68    59   53      63    61
## 3  86   male white   high public    general   44    33   54      58    31
## 4 141   male white   high public vocational   63    44   47      53    56
## 5 172   male white middle public   academic   47    52   57      53    61
## 6 113   male white middle public   academic   44    52   51      63    61

There are 200 observations in the dataset. There are 11 variables recorded in the dataset which include the subject ID number, gender, race, socioeconomic status, school type, program, standardized reading score, standardized writing score, standardized math score, standardized science score, and standardized social studies score. There is no missing data in the dataset.

#Summary of data
summary(hsb2)
##        id            gender              race               ses    
##  Min.   :  1.00   Length:200         Length:200         low   :47  
##  1st Qu.: 50.75   Class :character   Class :character   middle:95  
##  Median :100.50   Mode  :character   Mode  :character   high  :58  
##  Mean   :100.50                                                    
##  3rd Qu.:150.25                                                    
##  Max.   :200.00                                                    
##      schtyp            prog          read           write      
##  public :168   general   : 45   Min.   :28.00   Min.   :31.00  
##  private: 32   academic  :105   1st Qu.:44.00   1st Qu.:45.75  
##                vocational: 50   Median :50.00   Median :54.00  
##                                 Mean   :52.23   Mean   :52.77  
##                                 3rd Qu.:60.00   3rd Qu.:60.00  
##                                 Max.   :76.00   Max.   :67.00  
##       math          science          socst      
##  Min.   :33.00   Min.   :26.00   Min.   :26.00  
##  1st Qu.:45.00   1st Qu.:44.00   1st Qu.:46.00  
##  Median :52.00   Median :53.00   Median :52.00  
##  Mean   :52.65   Mean   :51.85   Mean   :52.41  
##  3rd Qu.:59.00   3rd Qu.:58.00   3rd Qu.:61.00  
##  Max.   :75.00   Max.   :74.00   Max.   :71.00
#View race variable
table(hsb2$race)
## 
## african american            asian         hispanic            white 
##               20               11               24              145

The scores for the reading portion of the standardized test range from 28 to 76, writing scores range from 31 to 67, math scores range from 33 to 75, science scores range from 26 to 74, and social studies scores range from 26 to 71. Race categories include 20 African American students, 11 Asian students, 24 Hispanic students, and 145 White students.

#Create an overall standardized score variable
hsb2$TotalScore <- hsb2$read+hsb2$write+hsb2$math+hsb2$science+hsb2$socst
TotalScore <- hsb2$read+hsb2$write+hsb2$math+hsb2$science+hsb2$socst
#create AA subset
AAStudents<- subset(hsb2,hsb2$race == "african american")

#create asian subset
asianStudents<- subset(hsb2,hsb2$race == "asian")

#create Hispanic subset
HispStudents<- subset(hsb2,hsb2$race == "hispanic")

#create white subset
whiteStudents<- subset(hsb2,hsb2$race == "white")

To see if there is a relationship between race and the total standardized test score, we will look at a scatterplot of the data.

#Exploring the data with a boxplot
boxplot(AAStudents$TotalScore, asianStudents$TotalScore, HispStudents$TotalScore, whiteStudents$TotalScore, main="Total Standardized Test Score based on Race", xlab="Race",names = c("African American", "Asian", "Hisp", "White"), ylab = "Standardized Test Score (total)")

Based on the data, the total standardized test scores differ among races. This also shows that there is an outlier in the African American data.

Cleaning and Transforming the Data (optional)

#Get the value of the outlier
boxplot(AAStudents$TotalScore, plot=FALSE)$out
## [1] 327
#Assign outlier to a vector
outliers <- boxplot(AAStudents$TotalScore, plot=FALSE)$out
#Check the results
print(outliers)
## [1] 327
#Find the row of the outlier
hsb2[which(AAStudents$TotalScore %in% outliers),]
##    id gender  race  ses schtyp     prog read write math science socst
## 17 76   male white high public academic   47    52   51      50    56
##    TotalScore
## 17        256
#Removing outliers
hsb2 <- hsb2[-which(AAStudents$TotalScore %in% outliers),]

Analysis

In this section you should complete at least one of the following:
#One way anova test to see if there is a significant difference among total scores among different races

Ho: no differences between race

HA: Some difference exists between race

# Compute the analysis of variance
race.aov <- aov(TotalScore ~ race, data = hsb2)
# Summary of the analysis
summary(race.aov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## race          3  44587   14862   10.08 3.33e-06 ***
## Residuals   195 287553    1475                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is 0.0000039 therefore, we reject the null hypothesis. There is a significant difference between the groups highlighted with * in the model summary.

#Tukey honest significant difference to perform multiple pairwise comparisons between group means
TukeyHSD(race.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = TotalScore ~ race, data = hsb2)
## 
## $race
##                                  diff        lwr        upr     p adj
## asian-african american     35.6363636  -1.717229 72.9899563 0.0675218
## hispanic-african american  -0.2916667 -30.419548 29.8362146 0.9999943
## white-african american     35.9305556  12.184660 59.6764511 0.0007004
## hispanic-asian            -35.9280303 -72.160249  0.3041886 0.0528774
## white-asian                 0.2941919 -30.833802 31.4221858 0.9999947
## white-hispanic             36.2222222  14.282524 58.1619203 0.0001721

Two way anova test to see if the significant difference among total scores among different races remains true within public and private schooling

The null hypotheses are:

HO:There is no difference in the means of Total Scores by race

HO:There is no difference in means of Total Scores by type of schooling

HO:There is no interaction between race and type of schooling

HA:The alternative hypothesis for cases 1 and 2 is: the means are not equal.

HA:The alternative hypothesis for case 3 is: there is an interaction between race and type of schooling.

#Two way anova with interaction effect to test if difference between groups remains true for type of schooling
raceschool.aov2 <- aov(TotalScore ~ race + schtyp + race:schtyp, data = hsb2)
summary(raceschool.aov2)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## race          3  44587   14862  10.148 3.11e-06 ***
## schtyp        1   2012    2012   1.374    0.243    
## race:schtyp   3   5806    1935   1.321    0.269    
## Residuals   191 279735    1465                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Tukey honest significant difference to perform multiple pairwise comparisons between group means
TukeyHSD(raceschool.aov2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = TotalScore ~ race + schtyp + race:schtyp, data = hsb2)
## 
## $race
##                                  diff        lwr       upr     p adj
## asian-african american     35.6363636  -1.596481 72.869208 0.0661941
## hispanic-african american  -0.2916667 -30.322157 29.738824 0.9999942
## white-african american     35.9305556  12.261421 59.599691 0.0006699
## hispanic-asian            -35.9280303 -72.043126  0.187065 0.0517584
## white-asian                 0.2941919 -30.733178 31.321562 0.9999946
## white-hispanic             36.2222222  14.353446 58.090998 0.0001635
## 
## $schtyp
##                    diff       lwr      upr     p adj
## private-public 8.595354 -5.971258 23.16197 0.2459183
## 
## $`race:schtyp`
##                                                        diff         lwr
## asian:public-african american:public              38.422222   -7.837991
## hispanic:public-african american:public           -3.050505  -40.328047
## white:public-african american:public              37.410256    7.713941
## african american:private-african american:public  22.222222  -65.201363
## asian:private-african american:public             32.222222  -88.282874
## hispanic:private-african american:public          56.722222  -30.701363
## white:private-african american:public             41.370370    5.679841
## hispanic:public-asian:public                     -41.472727  -86.205778
## white:public-asian:public                         -1.011966  -39.655229
## african american:private-asian:public            -16.200000 -107.053255
## asian:private-asian:public                        -6.200000 -129.215888
## hispanic:private-asian:public                     18.300000  -72.553255
## white:private-asian:public                         2.948148  -40.471255
## white:public-hispanic:public                      40.460761   13.204393
## african american:private-hispanic:public          25.272727  -61.352452
## asian:private-hispanic:public                     35.272727  -84.654403
## hispanic:private-hispanic:public                  59.772727  -26.852452
## white:private-hispanic:public                     44.420875   10.733306
## african american:private-white:public            -15.188034  -98.831191
## asian:private-white:public                        -5.188034 -122.979259
## hispanic:private-white:public                     19.311966  -64.331191
## white:private-white:public                         3.960114  -21.082018
## asian:private-african american:private            10.000000 -133.651609
## hispanic:private-african american:private         34.500000  -82.791047
## white:private-african american:private            19.148148  -66.806028
## hispanic:private-asian:private                    24.500000 -119.151609
## white:private-asian:private                        9.148148 -110.295208
## white:private-hispanic:private                   -15.351852 -101.306028
##                                                         upr     p adj
## asian:public-african american:public              84.682435 0.1830745
## hispanic:public-african american:public           34.227037 0.9999968
## white:public-african american:public              67.106572 0.0037897
## african american:private-african american:public 109.645807 0.9940087
## asian:private-african american:public            152.727318 0.9918484
## hispanic:private-african american:public         144.145807 0.4923467
## white:private-african american:public             77.060900 0.0111135
## hispanic:public-asian:public                       3.260323 0.0910495
## white:public-asian:public                         37.631298 1.0000000
## african american:private-asian:public             74.653255 0.9993729
## asian:private-asian:public                       116.815888 0.9999999
## hispanic:private-asian:public                    109.153255 0.9986147
## white:private-asian:public                        46.367551 0.9999991
## white:public-hispanic:public                      67.717130 0.0002527
## african american:private-hispanic:public         111.897907 0.9863047
## asian:private-hispanic:public                    155.199858 0.9856403
## hispanic:private-hispanic:public                 146.397907 0.4092274
## white:private-hispanic:public                     78.108445 0.0019388
## african american:private-white:public             68.455123 0.9992935
## asian:private-white:public                       112.603191 1.0000000
## hispanic:private-white:public                    102.955123 0.9966995
## white:private-white:public                        29.002246 0.9997163
## asian:private-african american:private           153.651609 0.9999990
## hispanic:private-african american:private        151.791047 0.9856340
## white:private-african american:private           105.102325 0.9973651
## hispanic:private-asian:private                   168.151609 0.9995319
## white:private-asian:private                      128.591504 0.9999980
## white:private-hispanic:private                    70.602325 0.9993660

Conclusions

From comparing means between groups, we can see that there is only a signifcant difference between the mean Total Scores for 3 of the races. When comparing the total scores between White students and African American students, White studens had 35.80 higher mean total scores than African American students, with a p-value of 0.0008. The mean total scores for Hispanic students were 35.93 points lower than the mean total scores for Asian students, the p-value is 0.05. White students had mean total scores that were 36.1 points higher than Hispanic students, the p-value is 0.0002.

When testing to see if the differences in total score by race hold true depending on the type of schooling, we find that the difference in total scores for White and African American students remains true in the public school setting, the mean total score is 37.41 points higher for White students in public schools than African Americans in public schools, the p-value is 0.0039. However, there was no significant different between mean total scores for White students attending private schools and African Americans attending private schools. The difference in mean total scores between Hispanic and Asian students did not remain true when adding type of schooling, the p value was greater than 0.05 for both private and public schooling. The differences in mean total score between White students and Hispanic students remained true by public schooling, the White Students attending public school had mean total scores 40.46 points higher than Hispanic students attending public school. However, there was no significant difference for the mean total scores of Hispanic students and White students both attending private schools.

There were two additional mean total scores that were significant when looking at race and type of schooling. White students attending private schools had mean total scores 40.80 points higher than African American students attending public schools (p-value= 0.015) and mean total scores 43.85 points higher than Hispanic students attending public schools (p-value= 0.003).

When conducting a two-way anova to test if type of schooling has an effect on total scores, the p-value was 0.269 therefore, we fail to reject the null and there is no significant difference between type of schooling and total score. When looking at the interaction between race and type of schooling on the mean total scores, there is a p-value of 0.263, therefore we also fail to reject the null for this hypothesis. This indicates that the relationships between race and total score does not depend on the type of schooling.

Limitations

Although the students that participated in this survey were randomly sampled, there are only 200 observations in the data, which is not a lot of students to base this information off of. Therefore, we may see different results if this had been a larger study. There also may be other factors that could contribute to the differences in a student’s performance on standardized testing that we cannot see from the types of tests we have done. This dataset includes other factors that may be important in evaluating differences in mean total scores that we did not test for, such as differences in socioeconomic status and type of program. There might be other things that can impact performance on a test such as class size, distance to school or transportation needs that could impact the amount of time a student has to study or the amount of sleep they get each night. The data in the High School and Beyond Survey is a good start at giving us an idea of the types of variables that may impact a student’s performance on standardized testing, but there are more things that should be looked into before drawing any conclusions.


This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Nicholaus Mlekelwa Semester: Spring 2019