Introduction

This report is an analysis on colleges and various data about them. I am using data from a different site to evaluate the college data. The link to this site is https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv

Ihave 10 questions to evaluate and analyze based on the data provided/

Analysis

This topic is to further analyze the data provided in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

1. What is the average enrollment number vs completion rate for all the colleges in the data.

mean(college$Enrollment, na.rm = TRUE)
## [1] 4484.831
mean(college$CompRate, na.rm = TRUE)
## [1] 52.13524

The average undergraduate enrollment number of students into college is 4485 students and the college completion rate is at 52.14%

2. What is the average and variance of the midACT test taken at all the colleges.

mean(college$MidACT, na.rm = TRUE)
## [1] 23.53514

The mean of the midACT test taken at all the colleges is 23.54 points.

var(college$MidACT, na.rm = TRUE)
## [1] 13.34888

The variance of the midACT test taken at all the colleges is 13.35 points.

3. What is the mean and standard deviation of the percentage of students awarded the Pell grant in colleges.

mean(college$Pell, na.rm = TRUE)
## [1] 37.85296

The mean of the percentage of students awarded the Pell grant in colleges is 37.85%

sd(college$Pell, na.rm = TRUE)
## [1] 17.88267

The standard deviation of the percantage of students awarded the Pell grant in colleges is 17.88

4. What is the correlation between Black and Hispanic students enrolling in colleges.

cor(college$Black, college$Hispanic, use = "complete.obs")
## [1] -0.1528694

The correlation between Black and Hispanic students enrolling in colleges is -0.15.

5. What is the distribution of the median income of the families of students enrolled in college (histogram).

hist(college$MedIncome, main = "Median Income Histogram", xlab = "Cost per $1000", ylab = "No of families", col = "blue")

6. What is the median and 90th percentile of the average SAT scores.

median(college$AvgSAT, na.rm = TRUE)
## [1] 1121

The median SAT scores for students enrolled in college is 1121.

quantile(college$AvgSAT, 0.90, na.rm = TRUE)
##    90% 
## 1315.4

The 90th percentile of the AvgSAT scores is 1315.

7. How much does the admission rate to the colleges vary.

boxplot(college$AdmitRate, main = "Admission Rate Distribution", col = "red", xlab = "AdmitRate", horizontal = TRUE)

8. What is the correlation between the first generation students and completion rate.

cor(college$FirstGen, college$CompRate, use = "complete.obs")
## [1] -0.6643909

The correlation between the cost and completion rate is -0.66

9. What is the median of the female students attending college.

median(college$Female, na.rm = TRUE)
## [1] 59.15

The median percantage of the female students attending college is 59.15%

10. What is the average number of first generation students.

mean(college$FirstGen, na.rm = TRUE)
## [1] 33.55713

The average number of first generation students is 33.56%

11.What is the correlation between the net price of attending college versus the completion rate of the four year degree (plot).

cor(college$CompRate, college$NetPrice, use = "complete.obs")
## [1] 0.419522

The correlation between the net price of attending college versus the completion rate of the four year degree is 0.42

plot(college$NetPrice, college$CompRate, main = "Net Price vs. Completion Rate", xlab = "Net Price", ylab = "Completion Rate", col = "magenta")