Introduction

This report analyzes various aspects of college data from CollegeScores4yr dataset, focusing on metrics such as tuition, admission rates, net price variation, SAT scores, and demographic distribution across institutions. This analysis applies various statistical methods, including measures of central tendency, variability, correlation, and visualizations, to better understand the factors affecting college data.

Some questions that will be explored are:

  1. What is the average in-state tuition for colleges across different regions?

  2. What is the median admission rate among colleges in this dataset?

  3. How much variation exists in net price among colleges?

  4. What is the standard deviation of average SAT scores across colleges?

  5. Does the percentage of first-generation students correlate with the completion rate?

  6. What are the quartiles for average debt among students who complete the program?

  7. How are undergraduate enrollment numbers distributed across different institutions?

  8. How does the percent of faculty that are full-time vary across different types of control (Private, Public, Profit)?

  9. How are average SAT scores distributed across colleges?

  10. What is the percentage distribution of colleges based on region?

Methodology

In this analysis, we are utilizing R programming language, specifically within the RStudio environment, to process and interpret college data from the CollegeScores4yr dataset.

  1. We will first load the dataset directly into R using the read.csv() function, which enabled seamless access to the data for analysis.

  2. To explore the dataset, we will apply fundamental descriptive statistics to compute measures of central tendency and variability such as mean(), var(), sd(),stem(), hist(), etc.

  3. Each analysis will be followed by an interpretation of results, providing context.

Analysis

In analysis, we are exploring the questions relating to CollegeScores4yr dataset. By applying descriptive statistics, we examine both the financial and academic environments of colleges.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1. What is the average in-state tuition for colleges across different regions?

mean(college$TuitionIn, na.rm = TRUE)
## [1] 21948.55

The average in-state tuition for colleges across different regions is $21948.55. This metric helps compare the tuition burden on students across various colleges.

Q2. What is the median admission rate among colleges in this dataset?

median(college$AdmitRate, na.rm = TRUE)
## [1] 0.69505

The median admission rate among colleges in this dataset is 69.51% indicating that at least half of the institutions admit approximately 70% or more of their applicants. This can give insight into the selectiveness of institutions.

Q3. How much variation exists in net price among colleges?

var(college$NetPrice, na.rm = TRUE)
## [1] 61686826

The variance that exists in net price among colleges is 61,686,826.A higher variance indicates a large spread in net prices, suggesting significant differences in costs across colleges due to factors like financial aid, state funding, and institutional type.

Q4. What is the standard deviation of average SAT scores across colleges?

sd(college$AvgSAT, na.rm = TRUE)
## [1] 128.9077

The standard deviation of average SAT scores across colleges is 128.91. This spread can reflect differences in academic rigor or student body composition across institutions.

Q5. Does the percentage of first-generation students correlate with the completion rate?

cor(college$FirstGen, college$CompRate, use = "complete.obs")
## [1] -0.6643909

The percentage of first-generation students that correlates with the completion rate is -0.6644. This negative correlation suggests that as the percentage of first-generation students increases, completion rates tend to decrease.

Q6. What are the quartiles for average debt among students who complete the program?

quantile(college$Debt, probs = c(0.25, 0.5, 0.75), na.rm = TRUE)
##     25%     50%     75% 
##  325.00  713.50 2203.25

The 25th quartiles for average debt is $325,50th quartile is $713.50 and 75th quartile is $2203.25 which helps to understand the debt burden on students at various levels within this dataset.

Q7. How are undergraduate enrollment numbers distributed across different institutions?

hist(college$Enrollment, main = "Distribution of Undergraduate Enrollment", xlab = "Enrollment")

The histogram shows the distribution of undergraduate enrollment numbers across institutions, highlighting the variation in student body size. This visualization helps to identify if most colleges have large or small enrollments.

Q8. How does the percent of faculty that are full-time vary across different types of control (Private, Public, Profit)?

boxplot(college$FullTimeFac ~ college$Control, main = "Full-Time Faculty Percentage by Control Type", ylab = "Full-Time Faculty %")

The boxplot displays the distribution of full-time faculty percentages based on the control type of each institution (Private, Public, or Profit). This can reflect differences in staffing models or educational priorities across various institution types.

Q9. How are average SAT scores distributed across colleges?

stem(college$AvgSAT)
## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##    5 | 6
##    6 | 
##    6 | 
##    7 | 
##    7 | 
##    8 | 234
##    8 | 555556677789999
##    9 | 00111122222233333334444
##    9 | 55555555555566666666666667777777777777788888888888888888899999999999
##   10 | 00000000000000000000000000011111111111111111111111111111111111111222+103
##   10 | 55555555555555555555555555555555555555555555555556666666666666666666+141
##   11 | 00000000000000000000000000000000000000000000000000011111111111111111+174
##   11 | 55555555555555555555555555555555555666666666666666666666666666666666+85
##   12 | 00000000000000000000000000000000000001111111111111111111111111122222+34
##   12 | 55555555555555555666666666667777777777777777888888888888888889999999
##   13 | 0000000000011111122222222222223333333344444444
##   13 | 555555666666667777778888899999999
##   14 | 00000111111111222233334444444
##   14 | 5555555677788888999999
##   15 | 000111222222334
##   15 | 6

The stem-and-leaf plot provides a quick look at the distribution of SAT scores across colleges.

Q10. What is the percentage distribution of colleges based on region?

pie(table(college$Region), main="Percentage of Colleges by Region")

The pie chart illustrates the regional distribution of colleges, showing the proportion of institutions in each region.

Summary

This report looks at important details about colleges in the U.S. The average in-state tuition is about 21,948 dollars, showing that college can be expensive. The median admission rate is 69.51%, meaning many colleges accept a good number of applicants. The costs of attending vary a lot, depending on the school. SAT scores differ across schools, showing a range of student preparation levels. There is a negative link between the number of first-generation students and completion rates, suggesting these students face more challenges. The average student debt is high, with the 75th percentile being $2,203.25. The size of schools varies, with some being small and others large. The percentage of full-time faculty changes depending on whether a school is public, private, or for-profit. The distribution of colleges across different regions shows where schools are located in the U.S. This data helps understand patterns in college costs, student success, and types of schools.