1 Executive Summary

The Scholastic Aptitude Test (SAT) is an American examination which is taken by senior high school to help universities and colleges gauge student academic levels, providing the information required to allow students entry into universities and colleges.1

The report displays data collected from 197 randomly selected high schools, providing a sample that aims to represent the entirety of the United States high school systems SAT score data from the years 2012-2013. 2 The aim of this report is to analyse the correlations between participation in highschool and SAT performance to determine whether either variable effect each other. This report provides information to universities or colleges in student selection and the education system on how to influence or improve academics.

The SAT dataset (2012-2013) discusses the effect of district number and student participation on performance of students meeting or exceeding the benchmark score of 1550. 3 The data shows that in comparison of 2012 - 2013, there is a mean participation rate of 76.83 and 76.61 respectively, mean percent of meeting the benchmark at 45.79 and 45.57 follwoed by a correlation coefficient of 0.52 and 0.46 (participation rate vs percentage meeting benchmark).

Further analysis shows that there is a -0.05037668 and -0.03666808 correlation between the district number and percentage meeting benchmark. Through the careful assessment of data, there is a possible effect of participation rates on the students meeting the benchmark and certainly no effect of district number affecting students performance in meeting those benchmarks.



2 Full Report

2.1 Initial Data Analysis (IDA)

SAT = read.csv("https://data.ct.gov/api/views/kbxi-4ia7/rows.csv?accessType=DOWNLOAD")

# LOAD DATA v2 - uncomment the link below to: load data from local file
#SATSPP1213 = read.csv("SAT_School_Participation_and_Performance__2012-2013")

Classification of Variables (complex)

Variables:

sapply(SAT, class)

  • Test takers: 2012 (independant)
  • Test takers: 2013 (independant)
  • Test takers: Change (dependant)

  • Participation rate: 2012 (independant)
  • Participation rate: 2013 (independant)
  • Participation rate: Change (dependant)

  • Percent meeting benchmark: 2012 (independant)
  • Percent meeting benchmark: 2013 (independant)
  • Percent meeting benchmark: Change (dependant)

  • District number (Classification variable)
  • District (Classification variable)
  • School (Classification variable)

## R's classification of data
#class(SAT)

## R's classification of variables
#str(SAT)
sapply(SAT, class)
##                        District.Number 
##                              "integer" 
##                               District 
##                               "factor" 
##                                 School 
##                               "factor" 
##                      Test.takers..2012 
##                              "integer" 
##                      Test.takers..2013 
##                              "integer" 
##                   Test.takers..Change. 
##                              "integer" 
##    Participation.Rate..estimate...2012 
##                              "integer" 
##    Participation.Rate..estimate...2013 
##                              "integer" 
## Participation.Rate..estimate...Change. 
##                              "integer" 
##        Percent.Meeting.Benchmark..2012 
##                              "integer" 
##        Percent.Meeting.Benchmark..2013 
##                              "integer" 
##     Percent.Meeting.Benchmark..Change. 
##                              "integer"

For the SAT dataset, there are 197 rows (randomly selected highschools) and 12 columns (properties of highschools), 9 of which are numerical data variables that analyse trends in SAT scores (2012-2013). This shows a large sample size and gives better overall trends of SAT scores across America. dim(SAT)

## Size of data
# For the SAT dataset, there are 197 rows (participating highschools) and 12 variables (properties of highschools).
dim(SAT)
## [1] 197  12

Source of Data

This dataset was found on the catalog for data which is an American government website that sourced this dataset from College-Board. College-Board is an American not for profit organisation (NGO) that observes, administrates and gathers data for over 6000 universities and colleges. 4 This data source is reliable as it is recognised by the american government.

However it is not an accurate representation SAT benchmarks, as the data collected was from only 68,000 students out of the ~3.5 million students taking the SATs in 2012-2013. 5 In addition, each year contains a different amount of students which have different levels of knowledge and skills, therefore accurately not representing all students in 2012 and 2013 due to missing values.

Stakeholders

Stakeholders for the data include the US Department of Education, High Schools in the US, Students deciding on a high school in the US.

Domain Knowledge

The dataset originates from College Board which is a non for profit organisation (NGO) that researches the students success and opportunity. The data is centralised around the SAT College and Career Readiness (CRR) Benchmark score of 1550. 6 This score is derived from the combined score of critical reading, mathematics and writing sections. According to the College Board, if a student achieves this benchmark they will have a “65 percent or greater likelihood of achieving a B- average or higher during the first year of college”. 7

The NGO used and created college benchmarks through the use of 68,000 previous students SAT results, which received above the 65 percentile. 8 Thus, the College Board found the benchmark score of 500 as a result.

The organisation then conducted an investigation analysing student readiness through the validity of the previously calculated benchmarks. This investigation found that students which met the estimated benchmarks are the majority of students which enrolled and applied to enter university or college then the students in which did not meet the estimated benchmark and these students also received higher grades in both high school and university or college. 9


3 Research Questions

3.1 Research Question 1

Does district number affect the percentage of students meeting the performance benchmark?

The scatter plots “District Number VS Percentage Meeting Benchmark 2012” and “District Number Vs Percentage Meeting Benchmark 2013” yields a -0.05037668 and -0.03666808 correlation, respectively.

Upon further analysis the data is sparsely distributed throughout both graphs. In conclusion there is no correlation between the the District Number of high school and percentage of students meeting the benchmark.

##2012

L = lm(SAT$Participation.Rate..estimate...2012 ~ SAT$Percent.Meeting.Benchmark..2012)
plot(SAT$District.Number, SAT$Percent.Meeting.Benchmark..2012, , xlab = "District Number", ylab = "Percentage Meeting Benchmark", main = "District Number Vs Percentage Meeting Benchmark (2012)")

cor(SAT$District.Number, SAT$Percent.Meeting.Benchmark..2012)
## [1] NA
abline(L)

3.2 Research Question 2

How does participation rate impact the students percentage benchmark performance in the overall SAT’s in 2012?

The below graph is a comparison between participation rate (x-axis) and the percentage of students meeting the benchmark (y-axis) in 2012. The graph demonstrates a positive increasing trend, between the independent variable (x-axis) and dependent variable (y-axis). In 2012, it is observed that the correlation coefficient between participation rate and percentage of students meeting the benchmark is approximately 0.52 (2 d.p.). This correlation shows a relationship of 50% correlation between the percentage of students meeting the benchmark and the participation rate of students.

cor(SAT$Participation.Rate..estimate...2012, SAT$Percent.Meeting.Benchmark..2012)
## [1] NA
L = lm(SAT$Percent.Meeting.Benchmark..2012 ~ SAT$Participation.Rate..estimate...2012)
plot(SAT$Participation.Rate..estimate...2012, SAT$Percent.Meeting.Benchmark..2012, xlab = "Participation rate estimate", ylab = "Percent meeting benchmark", main = "Participation rate estimate VS Percent meeting benchmark (2012)")

abline(L)

These results indicate that the higher the participation rate of students, the more likely they were able to gain a better result than the estimated benchmark calculated by College-Board. However, the variables not having a complete correlation of 1 means that the relationship is not very conclusive as the two variables only slightly affect each other.

Therefore, without the exact or concise graphical summary, which would allow individuals to gain an understanding of how much the percentage benchmark performance of 2012 is impacted by student participation rate. In conclusion, the correlation between the two variables shows us a slight positive impact of participation rate on the percentage of students meeting the benchmark, however the graph does not show how much it is affected. Thus, the graph shows that the higher the participation rate, the higher chance of meeting the benchmark percentage.

3.3 Research Question 3

Does the participation rate and performance benchmark change significantly from 2012 to 2013?

The scatter plot below highlights the participation rate and performance benchmark of high school students in 2013. According to the regression line, the trend of the graph shows that it is positively correlated, however there are several data points that are outliers. These outliers impact the position of the regression line resulting in the line being pulled down which ultimately decreases the value of the correlation coefficient.

cor(SAT$Participation.Rate..estimate...2013, SAT$Percent.Meeting.Benchmark..2013)
## [1] NA
L = lm(SAT$Percent.Meeting.Benchmark..2013 ~ SAT$Participation.Rate..estimate...2013)
plot(SAT$Participation.Rate..estimate...2013, SAT$Percent.Meeting.Benchmark..2013, xlab = "Participation rate estimate", ylab = "Percent meeting benchmark", main = "Participation rate estimate VS Percent meeting benchmark (2013)")

abline(L)

It is shown that the correlation coefficient is approximately 0.46 (2 d.p). This measures the linear relationship between the independent and dependent variables, in this case the participation rate and performance benchmark. It is known that a strong positive relationship between two variables is 1 however, the association between these two variables in 2013 represents a weak linear relationship (< 50%). In comparison, the 2012 SAT data set has a correlation coefficient of 0.52 which is slightly stronger than 2013’s performance.

This is due to the average of participation rate being 76.83 (2 d.p) and average benchmark performance being 45.79 (2 d.p) in 2012, showing a slightly declined relationship between the independent and dependent variables from 2012-2013. This could be due to the several outliers in the 2013 data set which reduces the overall averages of student participation and benchmark performance. Thus, the participation rate and benchmark performance from 2012 to 2013 changes to certain extent which is observed through the formulated graphs.



4 Conclusion

In hindsight, these research questions evaluate the SAT data in regards to the affect of district numbers on the performance of students, the impact of participation rate and performance benchmark in 2012. The research questions compare the statistical data gained from 2012 to 2013 and identify whether there is a significant cause and effect between the independent and dependent variables.

The results shows that the participation of students does impact the performance benchmark to a certain extent, however the district number has limited effect on the performance on students achieving the sufficient benchmark of 65% and readiness for college or university. Although this data is valid as it was collected via random selection by a government run data collection organisation. The data is not an accurate representation of all SAT sitting students as it only analyses a small portion of students (68,000 out of ~3.5million).


5 References

[1] College Board (2017). Class of 2017 SAT Results - 2017 SAT Suite of Assessments Program Results - The College Board. [online] College Board. Available at: https://reports.collegeboard.org/archive/sat-suite-program-results/2017/class-2017-results [Accessed 10 Apr. 2019].

[2] O’Day, S. (2018). SAT School Participation and Performance: 2012-2013 - Data.gov. [online] Catalog.data.gov. Available at: https://catalog.data.gov/dataset/sat-school-participation-and-performance-2012-2013 [Accessed 15 Apr. 2019].

[3] Wyatt, J., Kobrin, J., Wiley, A., J. Camara, W. and Proestler, N. (2018). SAT Benchmarks. 1st ed. [ebook] pp.1-10. Available at: https://files.eric.ed.gov/fulltext/ED521173.pdf [Accessed 17 Apr. 2019].



  1. [1]

  2. [3]

  3. [2]

  4. [2]

  5. [1]

  6. [2]

  7. [3]

  8. [3]

  9. [3]