#to set working directorty
setwd("C:/users/saints/Documents")
#to import data
sat = read.csv("sat.csv")

Introduction- Is there a difference when it comes to performance on the SAT? Some people hypothesize that males do better on the math section and females outscore males on the english section. These perception may be so ingrained in our society that girls choose humanities-based majors while males fill STEM-based majors. These percevied gender discrepancies may be altering pivotal life choices. What follows is an analysis of SAT in an effot to see if there is indeed a gender difference in performance on the SAT.

The Data- 303 observations were colleted from NorCal High. The variables collected were gender, english SAT score, and math SAT score. There were 158 males and 145 females in the study.

english <- (sat$english)
gender <- (sat$gender)
math <- (sat$math)
me <- subset(english, gender=="0" )
fm <- subset(math, gender=="1")
fe <- subset(english, gender=="1")
mm <- subset(math, gender=="0")
mtotal <- (mm + me)
ftotal <- (fm + fe)
total <-(math + english)
summary(math)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   290.0   550.0   620.0   611.3   680.0   800.0
summary(english)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   280.0   535.0   600.0   595.4   665.0   800.0
hist(total, main = "SAT scores among everyone", xlab = "SAT scores", col="hot pink")

The histogram is slightly skewed left and is unimodal, the average is between 1200 and 1300, and range between 800 and 1500

boxplot(total ~ sat$gender, horizontal = TRUE, main = "Gender Score Comparison", xlab = "Score", ylab = "Gender", col = "Hot Pink")

The Boxplots show show us that the average male score is higher than the average female score. Both data sets are slightly skewed left with the middle 50% being around 1100-300. On the Male boxplot there is one outlier of a really low score, the average male scores are still higher.

plot(fe,fm, main = "female english and math scores", xlab = "tests completed", ylab= "score", col = "Hot Pink")

plot(me,mm, main = "male english and math score", xlab = "test completed", ylab = "score", col="hot pink")

Conclusion

While the distributions seem to indicate that males from NorCal high are performing better than females, more tests would need to be performed to show whether or not it is a statistically significant difference. This is a large sample size (n = 303) but performance on the SAT may vary from year to year. In addition, we can only generalize our findings to this high school, because the sample was specific to NorCal High. A better approach to answer this question may be to take a random sample of SAT data from nationwide results. It may be beneficial to collect information regarding other variables, in addition to gender, verbal and math scores. Some other variables to consider may be gpa , number of AP/Honors courses taken, race, highest math course taken, socioeconomic status based on parent income, and the region of the United States in which the student lives. This would allow us to expand our research beyond gender as there are many other factors that affect SAT performance.