Fernanda Aguila
#to se working directory
setwd("C:/Users/saints/Documents")
#to import data
sat = read.csv("sat.csv")
Is there a gender difference when it comes to performance on the SAT? some people hypothesize that males do better on the math section and females outscore males on the verbal section. These perceptions may be so ingrained in our society that girls choose humanities-based majors while males fill STEM-based majors. These percieved gender discrepancies may be altering pivotal life choices. What follows in an analysis of SAT in an effor to see if there is indeed a gender difference in performance on the SAT.
303 observations were collected from NorCal High. The variables collected were gender, verbal SAT score, and math SAT score. There were 158 males and 145 females in this study.
table(sat$gndr)
0 1
158 145
gndr <- (sat$gndr)
vrbl <- (sat$vrbl)
math <- (sat$math)
mv <- subset(vrbl, gndr=="0" )
fm <- subset(math, gndr=="1")
fv <- subset(vrbl, gndr=="1")
mm <- subset(math, gndr=="0")
mtotal <- (mm + mv)
ftotal <- (fm + fv)
totalsat <- (math + vrbl)
summary(totalsat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
650 1105 1230 1207 1330 1550
hist(totalsat, xlab = "total", main = "Total SAT scores for males and females combined")
The distribution of SAT scores for the 303 students from NorCal High is unimodal and skewed left. The average SAT score at NorCal High is 1230, which is slightly higher than the mean of 1207 because the data is skewed left. The middle 50% of students scored between 1105 and 1330. Compared to a national average of 1050 on the SAT (Collegeboard.com), NorCal high students are comparatively doing much better than average; the 1st quartile for NorCal high is greater than the national average score.
According to the summary statistics, males score higher, on average, than females. When comparing centers, males score 45 points higher based on the median (male = 1255 and female = 1210). All of the summary statistics are higher for males than females. Both distributions are slightly skewed left (as indicated by mean values that are less than the median). The middle 50% of males scored between 1110 and 1350, while the middle 50% of females scored 1090 and 1320
Summary for Male Performance
summary(mtotal)
Min. 1st Qu. Median Mean 3rd Qu. Max.
650 1110 1255 1226 1350 1550
Summary for female Performance
summary(ftotal)
Min. 1st Qu. Median Mean 3rd Qu. Max.
750 1090 1210 1186 1320 1520
boxplot(totalsat ~ gndr, horizontal = TRUE, ylab = "Males = 0, females = 1", xlab = "SAT Score", main = "SAT by Gender")
There is a common belief that boys perform more strongly in math than girls, while girls have a better aptitude for language arts. Does this translate to the SAT? The following subsets will compare genders within each section.
**Summary male math
summary(mm)
Min. 1st Qu. Median Mean 3rd Qu. Max.
290 570 640 629 690 800
Summary female math
summary(fm)
Min. 1st Qu. Median Mean 3rd Qu. Max.
360.0 530.0 600.0 592.1 660.0 800.0
boxplot(math ~ gndr, main = "Math Scores by Gender", horizontal = TRUE, xlab = "Math Score", ylab = "Males = o, Females = 1")
In considering the middle 50% of students surveyed, the males score significantly higher than the females.The middle 50% of males score between 570 and 690, while the middle 50% of females score between 530 and 660. That being said, both groups are scoring higher than the national average for math which is 520. Specifically, 75% of both groups are scoring higher than the national average.
Verbal Performance by Gender
summary(mv)
Min. 1st Qu. Median Mean 3rd Qu. Max.
310.0 540.0 600.0 596.6 667.5 800.0
Verbal Performance by Gender
summary (fv)
Min. 1st Qu. Median Mean 3rd Qu. Max.
280.0 530.0 610.0 594.1 660.0 770.0
boxplot(vrbl ~ gndr, main = "verbal Scores by Gender", horizontal = TRUE, xlab = "verbal score", ylab = "Males = 0, Female = 1")
While the distributions seem to indicate that males from NorCal high are performing better than females, more tests would need to be performed to show whether or not it is a statistically significant difference. This is a large sample size (n = 303) but performance on the SAT may vary from year to year. In addition, we can only generalize our findings to this high school, because the sample was specific to NorCal High. A better approach to answer this question may be to take a random sample of SAT data from nationwide results. It may be beneficial to collect information regarding other variables, in addition to gender, verbal and math scores. Some other variables to consider may be gpa (quantitative), number of AP/Honors courses taken (quantitative),race/ethnicity (categorical), highest math course taken (categorical), socioeconomic status based on parent income (categorical), and the region of the United States in which the student lives (categorical). This would allow us to expand our research beyond gender as there aremany other factors that affect SAT performance.