Overview

The purpose of this project is to find which is more related to the first year of college GPA, SAT scores or high school GPA.

Introduction

High school GPA are supposed to be representing the average value of the accumulated final grades earned in courses in high school over time. Therefore it should be an indicator of how well you are able to support workloads. An SAT exam can be seen as evaluation of everything you learned, challenging common sense and using own knowledge. Which one of the two shows how well you will do in college. We will be using a data obatained through openintro which was originally collected by Educational Testing Service.

Exploring the Data

Storing data

# Storing data in environment 
library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
## 
##     cars, trees
satGPA <- satGPA

Further into data

# Using the str() function to have an overview of the data 
str(satGPA) 
## 'data.frame':    1000 obs. of  6 variables:
##  $ sex   : int  1 2 2 1 1 2 1 1 2 1 ...
##  $ SATV  : int  65 58 56 42 55 55 57 53 67 41 ...
##  $ SATM  : int  62 64 60 53 52 56 65 62 77 44 ...
##  $ SATSum: int  127 122 116 95 107 111 122 115 144 85 ...
##  $ HSGPA : num  3.4 4 3.75 3.75 4 4 2.8 3.8 4 2.6 ...
##  $ FYGPA : num  3.18 3.33 3.25 2.42 2.63 2.91 2.83 2.51 3.82 2.54 ...

The satGPA dataset contains 1000 observation with 6 variables. The variables being sex for the gender of student, SATV for verbal SAT percentile, SATM for math SAT percentile, SATsum for total of verbal and math SAT perentiles, HSGPA for highschool grade point average, FYGPA for first year of college grade point average. The three variables that we are focusing on are SATsum, HSGPA and FYGPA.

scatterplots

# Creating scatterplot with SATsum and FYGPA variables
plot(satGPA$SATSum, satGPA$FYGPA, main = "sum of SAT scores and first of college GPA", xlab = "SATsum", ylab = "FYGPA")

# Creating scatterplot with HSGPA and FYGPA variables
plot(satGPA$HSGPA, satGPA$FYGPA, main = "High school GPA and first of college GPA", xlab = "HSGPA", ylab = "FYGPA")

Analysis

correlation coefficients

# Finding the correlation coefficient between SATSum and FYGPA
cor(satGPA$SATSum, satGPA$FYGPA)
## [1] 0.460281

The linear correlation coefficient is 0.46028

# Finding the correlation coefficient between HSGPA and FYGPA
cor(satGPA$HSGPA, satGPA$FYGPA)
## [1] 0.5433535

The linear correlation coefficient is 0.5433535
Based on the correlation coefficient, I would say there is stronger relationship with the High school GPA

Conclusions

Based on the data we analyzed, there is a stronger relationship between high school GPA and First of college GPA than with the sum of SAT scores and the first year GPA. Therefore, we conclude that the high school GPA can be a better predictor of how well a student will do in college durign his or her first year.

Limitations

One thing that concerned me was the sat scores. It appears like they used the old type of SAT in which there was verbal and math. Nowadays we have 3 sections reading(that might be considered as verbal), math, and writing. We don’t know for sure if we were to use the SAT score composed of 3 sections how different the results would have been.
One other thing is that they collected the data from one college. If they used other colleges around the country or even better around the world we would have had a richer data.


This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Oreadj Chavannes
Semester: Fall 2018