This project will examine if there is a relationship between a student’s race and their performance on standardized testing. It will also see if the results remain true depending on if the student attended public or private school.
Standardized testing is used as a way to compare students’ performance across the country. This project will look to see if standardized testing is a fair measure of performance among different races and if the school setting has any impact on student’s performance.
The data used for this project was obtained using the results of the High School and Beyond Survey data available in the openintro dataset within R. The data was collected by the National Center of Education Statistics through a survey conducted on high school seniors.
#Load OpenIntro Library
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
#Store data in environment
hsb2 <- hsb2
#View the structure of the dataset
str(hsb2)
## 'data.frame': 200 obs. of 11 variables:
## $ id : int 70 121 86 141 172 113 50 11 84 48 ...
## $ gender : chr "male" "female" "male" "male" ...
## $ race : chr "white" "white" "white" "white" ...
## $ ses : Factor w/ 3 levels "low","middle",..: 1 2 3 3 2 2 2 2 2 2 ...
## $ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
## $ prog : Factor w/ 3 levels "general","academic",..: 1 3 1 3 2 2 1 2 1 2 ...
## $ read : int 57 68 44 63 47 44 50 34 63 57 ...
## $ write : int 52 59 33 44 52 52 59 46 57 55 ...
## $ math : int 41 53 54 47 57 51 42 45 54 52 ...
## $ science: int 47 63 58 53 53 63 53 39 58 50 ...
## $ socst : int 57 61 31 56 61 61 61 36 51 51 ...
#Preview of the first few lines of data
head(hsb2)
## id gender race ses schtyp prog read write math science socst
## 1 70 male white low public general 57 52 41 47 57
## 2 121 female white middle public vocational 68 59 53 63 61
## 3 86 male white high public general 44 33 54 58 31
## 4 141 male white high public vocational 63 44 47 53 56
## 5 172 male white middle public academic 47 52 57 53 61
## 6 113 male white middle public academic 44 52 51 63 61
There are 200 observations in the dataset. There are 11 variables recorded in the dataset which include the subject ID number, gender, race, socioeconomic status, school type, program, standardized reading score, standardized writing score, standardized math score, standardized science score, and standardized social studies score. There is no missing data in the dataset.
#Summary of data
summary(hsb2)
## id gender race ses
## Min. : 1.00 Length:200 Length:200 low :47
## 1st Qu.: 50.75 Class :character Class :character middle:95
## Median :100.50 Mode :character Mode :character high :58
## Mean :100.50
## 3rd Qu.:150.25
## Max. :200.00
## schtyp prog read write
## public :168 general : 45 Min. :28.00 Min. :31.00
## private: 32 academic :105 1st Qu.:44.00 1st Qu.:45.75
## vocational: 50 Median :50.00 Median :54.00
## Mean :52.23 Mean :52.77
## 3rd Qu.:60.00 3rd Qu.:60.00
## Max. :76.00 Max. :67.00
## math science socst
## Min. :33.00 Min. :26.00 Min. :26.00
## 1st Qu.:45.00 1st Qu.:44.00 1st Qu.:46.00
## Median :52.00 Median :53.00 Median :52.00
## Mean :52.65 Mean :51.85 Mean :52.41
## 3rd Qu.:59.00 3rd Qu.:58.00 3rd Qu.:61.00
## Max. :75.00 Max. :74.00 Max. :71.00
#View race variable
table(hsb2$race)
##
## african american asian hispanic white
## 20 11 24 145
The scores for the reading portion of the standardized test range from 28 to 76, writing scores range from 31 to 67, math scores range from 33 to 75, science scores range from 26 to 74, and social studies scores range from 26 to 71. Race categories include 20 African American students, 11 Asian students, 24 Hispanic students, and 145 White students.
#Create an overall standardized score variable
hsb2$TotalScore <- hsb2$read+hsb2$write+hsb2$math+hsb2$science+hsb2$socst
TotalScore <- hsb2$read+hsb2$write+hsb2$math+hsb2$science+hsb2$socst
#create AA subset
AAStudents<- subset(hsb2,hsb2$race == "african american")
#create asian subset
asianStudents<- subset(hsb2,hsb2$race == "asian")
#create Hispanic subset
HispStudents<- subset(hsb2,hsb2$race == "hispanic")
#create white subset
whiteStudents<- subset(hsb2,hsb2$race == "white")
To see if there is a relationship between race and the total standardized test score, we will look at a scatterplot of the data.
#Exploring the data with a boxplot
boxplot(AAStudents$TotalScore, asianStudents$TotalScore, HispStudents$TotalScore, whiteStudents$TotalScore, main="Total Standardized Test Score based on Race", xlab="Race",names = c("African American", "Asian", "Hisp", "White"), ylab = "Standardized Test Score (total)")
Based on the data, the total standardized test scores differ among races. This also shows that there is an outlier in the African American data.
#Get the value of the outlier
boxplot(AAStudents$TotalScore, plot=FALSE)$out
## [1] 327
#Assign outlier to a vector
outliers <- boxplot(AAStudents$TotalScore, plot=FALSE)$out
#Check the results
print(outliers)
## [1] 327
#Find the row of the outlier
hsb2[which(AAStudents$TotalScore %in% outliers),]
## id gender race ses schtyp prog read write math science socst
## 17 76 male white high public academic 47 52 51 50 56
## TotalScore
## 17 256
#Removing outliers
hsb2 <- hsb2[-which(AAStudents$TotalScore %in% outliers),]
In this section you should complete at least one of the following:
#One way anova test to see if there is a significant difference among total scores among different races
Ho: no differences between race
HA: Some difference exists between race
# Compute the analysis of variance
race.aov <- aov(TotalScore ~ race, data = hsb2)
# Summary of the analysis
summary(race.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 3 44587 14862 10.08 3.33e-06 ***
## Residuals 195 287553 1475
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is 0.0000039 therefore, we reject the null hypothesis. There is a significant difference between the groups highlighted with * in the model summary.
#Tukey honest significant difference to perform multiple pairwise comparisons between group means
TukeyHSD(race.aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = TotalScore ~ race, data = hsb2)
##
## $race
## diff lwr upr p adj
## asian-african american 35.6363636 -1.717229 72.9899563 0.0675218
## hispanic-african american -0.2916667 -30.419548 29.8362146 0.9999943
## white-african american 35.9305556 12.184660 59.6764511 0.0007004
## hispanic-asian -35.9280303 -72.160249 0.3041886 0.0528774
## white-asian 0.2941919 -30.833802 31.4221858 0.9999947
## white-hispanic 36.2222222 14.282524 58.1619203 0.0001721
Two way anova test to see if the significant difference among total scores among different races remains true within public and private schooling
The null hypotheses are:
HO:There is no difference in the means of Total Scores by race
HO:There is no difference in means of Total Scores by type of schooling
HO:There is no interaction between race and type of schooling
HA:The alternative hypothesis for cases 1 and 2 is: the means are not equal.
HA:The alternative hypothesis for case 3 is: there is an interaction between race and type of schooling.
#Two way anova with interaction effect to test if difference between groups remains true for type of schooling
raceschool.aov2 <- aov(TotalScore ~ race + schtyp + race:schtyp, data = hsb2)
summary(raceschool.aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 3 44587 14862 10.148 3.11e-06 ***
## schtyp 1 2012 2012 1.374 0.243
## race:schtyp 3 5806 1935 1.321 0.269
## Residuals 191 279735 1465
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Tukey honest significant difference to perform multiple pairwise comparisons between group means
TukeyHSD(raceschool.aov2)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = TotalScore ~ race + schtyp + race:schtyp, data = hsb2)
##
## $race
## diff lwr upr p adj
## asian-african american 35.6363636 -1.596481 72.869208 0.0661941
## hispanic-african american -0.2916667 -30.322157 29.738824 0.9999942
## white-african american 35.9305556 12.261421 59.599691 0.0006699
## hispanic-asian -35.9280303 -72.043126 0.187065 0.0517584
## white-asian 0.2941919 -30.733178 31.321562 0.9999946
## white-hispanic 36.2222222 14.353446 58.090998 0.0001635
##
## $schtyp
## diff lwr upr p adj
## private-public 8.595354 -5.971258 23.16197 0.2459183
##
## $`race:schtyp`
## diff lwr
## asian:public-african american:public 38.422222 -7.837991
## hispanic:public-african american:public -3.050505 -40.328047
## white:public-african american:public 37.410256 7.713941
## african american:private-african american:public 22.222222 -65.201363
## asian:private-african american:public 32.222222 -88.282874
## hispanic:private-african american:public 56.722222 -30.701363
## white:private-african american:public 41.370370 5.679841
## hispanic:public-asian:public -41.472727 -86.205778
## white:public-asian:public -1.011966 -39.655229
## african american:private-asian:public -16.200000 -107.053255
## asian:private-asian:public -6.200000 -129.215888
## hispanic:private-asian:public 18.300000 -72.553255
## white:private-asian:public 2.948148 -40.471255
## white:public-hispanic:public 40.460761 13.204393
## african american:private-hispanic:public 25.272727 -61.352452
## asian:private-hispanic:public 35.272727 -84.654403
## hispanic:private-hispanic:public 59.772727 -26.852452
## white:private-hispanic:public 44.420875 10.733306
## african american:private-white:public -15.188034 -98.831191
## asian:private-white:public -5.188034 -122.979259
## hispanic:private-white:public 19.311966 -64.331191
## white:private-white:public 3.960114 -21.082018
## asian:private-african american:private 10.000000 -133.651609
## hispanic:private-african american:private 34.500000 -82.791047
## white:private-african american:private 19.148148 -66.806028
## hispanic:private-asian:private 24.500000 -119.151609
## white:private-asian:private 9.148148 -110.295208
## white:private-hispanic:private -15.351852 -101.306028
## upr p adj
## asian:public-african american:public 84.682435 0.1830745
## hispanic:public-african american:public 34.227037 0.9999968
## white:public-african american:public 67.106572 0.0037897
## african american:private-african american:public 109.645807 0.9940087
## asian:private-african american:public 152.727318 0.9918484
## hispanic:private-african american:public 144.145807 0.4923467
## white:private-african american:public 77.060900 0.0111135
## hispanic:public-asian:public 3.260323 0.0910495
## white:public-asian:public 37.631298 1.0000000
## african american:private-asian:public 74.653255 0.9993729
## asian:private-asian:public 116.815888 0.9999999
## hispanic:private-asian:public 109.153255 0.9986147
## white:private-asian:public 46.367551 0.9999991
## white:public-hispanic:public 67.717130 0.0002527
## african american:private-hispanic:public 111.897907 0.9863047
## asian:private-hispanic:public 155.199858 0.9856403
## hispanic:private-hispanic:public 146.397907 0.4092274
## white:private-hispanic:public 78.108445 0.0019388
## african american:private-white:public 68.455123 0.9992935
## asian:private-white:public 112.603191 1.0000000
## hispanic:private-white:public 102.955123 0.9966995
## white:private-white:public 29.002246 0.9997163
## asian:private-african american:private 153.651609 0.9999990
## hispanic:private-african american:private 151.791047 0.9856340
## white:private-african american:private 105.102325 0.9973651
## hispanic:private-asian:private 168.151609 0.9995319
## white:private-asian:private 128.591504 0.9999980
## white:private-hispanic:private 70.602325 0.9993660
From comparing means between groups, we can see that there is only a signifcant difference between the mean Total Scores for 3 of the races. When comparing the total scores between White students and African American students, White studens had 35.80 higher mean total scores than African American students, with a p-value of 0.0008. The mean total scores for Hispanic students were 35.93 points lower than the mean total scores for Asian students, the p-value is 0.05. White students had mean total scores that were 36.1 points higher than Hispanic students, the p-value is 0.0002.
When testing to see if the differences in total score by race hold true depending on the type of schooling, we find that the difference in total scores for White and African American students remains true in the public school setting, the mean total score is 37.41 points higher for White students in public schools than African Americans in public schools, the p-value is 0.0039. However, there was no significant different between mean total scores for White students attending private schools and African Americans attending private schools. The difference in mean total scores between Hispanic and Asian students did not remain true when adding type of schooling, the p value was greater than 0.05 for both private and public schooling. The differences in mean total score between White students and Hispanic students remained true by public schooling, the White Students attending public school had mean total scores 40.46 points higher than Hispanic students attending public school. However, there was no significant difference for the mean total scores of Hispanic students and White students both attending private schools.
There were two additional mean total scores that were significant when looking at race and type of schooling. White students attending private schools had mean total scores 40.80 points higher than African American students attending public schools (p-value= 0.015) and mean total scores 43.85 points higher than Hispanic students attending public schools (p-value= 0.003).
When conducting a two-way anova to test if type of schooling has an effect on total scores, the p-value was 0.269 therefore, we fail to reject the null and there is no significant difference between type of schooling and total score. When looking at the interaction between race and type of schooling on the mean total scores, there is a p-value of 0.263, therefore we also fail to reject the null for this hypothesis. This indicates that the relationships between race and total score does not depend on the type of schooling.
Although the students that participated in this survey were randomly sampled, there are only 200 observations in the data, which is not a lot of students to base this information off of. Therefore, we may see different results if this had been a larger study. There also may be other factors that could contribute to the differences in a student’s performance on standardized testing that we cannot see from the types of tests we have done. This dataset includes other factors that may be important in evaluating differences in mean total scores that we did not test for, such as differences in socioeconomic status and type of program. There might be other things that can impact performance on a test such as class size, distance to school or transportation needs that could impact the amount of time a student has to study or the amount of sleep they get each night. The data in the High School and Beyond Survey is a good start at giving us an idea of the types of variables that may impact a student’s performance on standardized testing, but there are more things that should be looked into before drawing any conclusions.
This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Nicholaus Mlekelwa Semester: Spring 2019