Introduction

This report uses data from a data set describing a number of different statistics arcoss multiple different colleges.

Based on my understanding of statistics I asked the following questions:

Analyasis

The data being used:

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the correlation between the percentage of full-time faculty and the average faculty salary across schools?

cor(college$FullTimeFac, college$FacSalary, use = "complete.obs")
## [1] 0.1447413

The correlation between the percentage of full-time faculty and their average monthly salary across different schools is 0.1447413.

Q2: What is the range of average SAT scores across schools?

median(college$AvgSAT, na.rm = TRUE)
## [1] 1121

The median of average SAT scores across the schools is 1121.

Q3: What is the standard deviation of undergraduate enrollment?

sd(college$Enrollment, na.rm = TRUE)
## [1] 7473.072

The standard deviation of undergraduate enrollment is 7473.072.

Q4: What is the average percentage of students receiving Pell Grants?

mean(college$Pell, na.rm = TRUE)
## [1] 37.85296

The average percentage of students recieving the Pell Grant is 37.85296.

Q5: How does the distribution of average net price vary by school control type (Private, Profit, Public)?

boxplot(college$NetPrice ~ college$Control, 
        main = "Distribution of Net Price by School Control",
        xlab = "School Control", 
        ylab = "Average Net Price", 
        col = c("lightblue", "lightgreen", "lightcoral"))

The box plot shows the distribution of average net prices across different school types, helping compare tuition costs.

Q6: How does the relationship between the percentage of students receiving Pell grants and the average faculty salary differ for public schools?

public_schools <- subset(college, Control == "Public")
plot(public_schools$Pell, public_schools$FacSalary, 
     main = "Scatterplot of Pell Grants Percentage vs Faculty Salary for Public Schools", 
     xlab = "Percentage of Students Receiving Pell Grants", 
     ylab = "Average Faculty Salary", 
     pch = 19, col = "green")

This scatterplot shows the relationship between the percentage of students receiving Pell grants and the average faculty salary for public schools.

Q7: What is the standard deviation of in-state tuition compared to out-of-state tuition?

sd(college$TuitionIn, na.rm = TRUE)
## [1] 14130.3
sd(college$TuitonOut, na.rm = TRUE)
## [1] 12436.1

The standard deviation of in-state tuition is 14130.3, and for out-of-state is 12436.1.

Q8: What is the mean percentage of part-time students across different colleges?

mean(college$PartTime, na.rm = TRUE)
## [1] 16.46559

The mean percentage of part-time students across the different colleges is 16.46559.

Q9: What is the distribution of student debt in the dataset?

hist(college$Debt, main = "Distribution of Student Debt", xlab = "Debt", col = "skyblue", border = "black")

The distribution of student debt across different colleges.

Q10: What is the correlation between the percentage of first-generation students and the average student debt?

cor(college$FirstGen, college$Debt, use = "complete.obs")
## [1] 0.2210194

The correlation between the percentage of first-generation sutdents and with average student debt it 0.2210194.

Summary

This analysis explored a number of areas of college statistics, using statistical methods to answer a series of ten problems. The problems addressed a number of issues ranging from correlations, central tendency, and distribution in a number of colleges. The analysis provided insights into the diversity of colleges and their characteristics, offering a deeper insight into key trends and statistics present in the data set. In the process, we used methods such as correlation, mean, median, standard deviation, and box plots to summarize and visualize the data efficiently.

Appendix

The following R code was used to create this analysis and come ulp with answers to the questions in this report.

# Load the dataset
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7
# Q1: Correlation between the percentage of full-time faculty and average faculty salary
cor(college$FullTimeFac, college$FacSalary, use = "complete.obs")
## [1] 0.1447413
# Q2: Median of average SAT scores
median(college$AvgSAT, na.rm = TRUE)
## [1] 1121
# Q3: Standard deviation of undergraduate enrollment
sd(college$Enrollment, na.rm = TRUE)
## [1] 7473.072
# Q4: Average percentage of students receiving Pell Grants
mean(college$Pell, na.rm = TRUE)
## [1] 37.85296
# Q5: Distribution of average net price by school control type (boxplot)
boxplot(college$NetPrice ~ college$Control, 
        main = "Distribution of Net Price by School Control",
        xlab = "School Control", 
        ylab = "Average Net Price", 
        col = c("lightblue", "lightgreen", "lightcoral"))

# Q6: Scatterplot for pell grant vs faculty
public_schools <- subset(college, Control == "Public")
plot(public_schools$Pell, public_schools$FacSalary, 
     main = "Scatterplot of Pell Grants Percentage vs Faculty Salary for Public Schools", 
     xlab = "Percentage of Students Receiving Pell Grants", 
     ylab = "Average Faculty Salary", 
     pch = 19, col = "green")

# Q7: Standard deviation of in-state and out-of-state tuition
sd(college$TuitionIn, na.rm = TRUE)
## [1] 14130.3
sd(college$TuitonOut, na.rm = TRUE)
## [1] 12436.1
# Q8: Mean percentage of part-time students across colleges
mean(college$PartTime, na.rm = TRUE)
## [1] 16.46559
# Q9: Distribution of student debt (histogram)
hist(college$Debt, main = "Distribution of Student Debt", xlab = "Debt", col = "skyblue", border = "black")

# Q10: Correlation between the percentage of first-generation students and the average student debt
cor(college$FirstGen, college$Debt, use = "complete.obs")
## [1] 0.2210194