This report will analyze data from https://www.lock5stat.com specifically the CollegeScores4yr data set.
I propose the following 12 questions based on my own understanding of the data.
What is the mean of the average SAT scores for all the colleges in the data?
What is the standard deviation in the average SAT score for all the colleges the data set?
What is the range of the average SAT score for the entire data set?
What is the distribution of the average SAT scores for the entire data set?
What is the median of the average SAT scores for the entire data set?
What is the mean of the average SAT scores for private colleges in the data
What is the standard deviation of the average SAT scores for private colleges in the data
What is the median of the average SAT scores for private colleges in the data
What is the mean of the average SAT scores for public colleges in the data
What is the standard deviation of the average SAT scores for the public colleges in the data
What is the median of the average SAT scores for public colleges in the data
How do the performance and distribution of the average SAT scores differ between public and private schools
## [1] 1135.25
The mean of the average SAT scores across the data set is 1135.25
## [1] 128.9077
The standard deviation in the average SAT score for the data set is 128.91
## [1] 564 1558
The range of the average SAT scores for the data set is 564-1558
## [1] 1121
The median of the average SAT score for the entire data set is 1121
## [1] 1145.839
The mean of the average SAT scores for private and profit colleges is 1145.84
## [1] 139.3497
The standard deviation of the average SAT scores for private and profit colleges in the data is 139.35
## [1] 1124
The median of the average SAT scores for private and profite colleges in the data is 1124
## [1] 1118.91
The mean of the average SAT scores for public colleges in the data is 1118.91
## [1] 109.2472
The standard deviaton of the average SAT scores for public colleges in the data is 109.25
## [1] 1105
The median of the SAT scores for public colleges in the data is 1105
Here is the code for the analysis
college=read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
sd(college$AvgSAT, na.rm=TRUE)
## [1] 128.9077
sd(college$AvgSAT, na.rm=TRUE)
## [1] 128.9077
range(college$AvgSAT, na.rm=TRUE)
## [1] 564 1558
hist(college$AvgSAT, main="Histogram of average SAT scores", xlab="Average SAT Score")
median(college$AvgSAT, na.rm=TRUE)
## [1] 1121
private_colleges <- college[college$Control %in% c("Private", "Profit"), ]
mean_private_sat <- mean(private_colleges$AvgSAT, na.rm = TRUE)
mean_private_sat
## [1] 1145.696
sd_private_sat <- sd(private_colleges$AvgSAT, na.rm = TRUE)
sd_private_sat
## [1] 139.1025
median_private_sat <- median(private_colleges$AvgSAT, na.rm=TRUE)
median_private_sat
## [1] 1124
public_colleges <- college[college$Control == "Public", ]
mean_public_sat <- mean(public_colleges$AvgSAT, na.rm = TRUE)
mean_public_sat
## [1] 1118.91
sd_public_sat <- sd(public_colleges$AvgSAT, na.rm = TRUE)
sd_public_sat
## [1] 109.2472
median_public_sat <- median(public_colleges$AvgSAT, na.rm = TRUE)
median_public_sat
## [1] 1105
college$Control <- factor(college$Control, levels = c("Public", "Private"))
boxplot(AvgSAT ~ Control, data = college, main = "SAT Scores by School Type",
ylab = "Average SAT Score", xlab = "School Type")