Introduction To the Project

1. What can be the questions which canhelp us find the reserch nicely??

These are some of the questios to explore. -1. What is the average ACT score across all colleges? -2. What is the median SAT score for colleges, and how does it vary across colleges? -3. What is the distribution of undergraduate enrollment across colleges? -4. What percentage of students identify as White, Black, Hispanic, Asian, or Other across colleges? -5. What is the average net price (NetPrice) for students across colleges, and how does it compare to the average total cost (Cost)? -6. Is there a difference in average tuition for in-state students (TuitionIn) versus out-of-state students (TuitonOut)? -7. What is the correlation between undergraduate enrollment (Enrollment) and average ACT scores (MidACT)? -8.What is the variance in average debt (Debt) for students who complete the program? -9. What is the average monthly salary for full-time faculty, and how does it compare to the percentage of full-time faculty? -10. What is the percentage of first-generation students across colleges, and how does it compare with the completion rate?

The Analysis of the Project can be done by :

The details of the questions are discussed as followed:

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

What is the ACT Score across all the colleges?

mean_ACT <- mean(college$MidACT, na.rm = TRUE)
mean_ACT
## [1] 23.53514

The mean Avg ACT score is 23.53514.

Q2. What is the median SAT score for colleges, and how does it vary across colleges?

median_SAT <- median(college$AvgSAT, na.rm = TRUE)
median_SAT
## [1] 1121

The meadian SAT socre for all colleges is 1121.

Q3. What is the distribution of undergraduate enrollment across colleges?

mean_enrollment <- mean(college$Enrollment, na.rm = TRUE)
median_enrollment <- median(college$Enrollment, na.rm = TRUE)
mean_enrollment
## [1] 4484.831
hist(college$Enrollment, main = "Distribution of Undergraduate Enrollment", xlab = "Enrollment", col = "lightcoral")

The enrollment across the college is 4484.831 The median of the undergraduate enrollment is 1722. The histogram diagram is shown in the figure.

Q4. What percentage of students identify as White, Black, Hispanic, Asian, or Other across colleges?

ethnicity_means <- data.frame(
  White = mean(college$White, na.rm = TRUE),
  Black = mean(college$Black, na.rm = TRUE),
  Hispanic = mean(college$Hispanic, na.rm = TRUE),
  Asian = mean(college$Asian, na.rm = TRUE),
  Other = mean(college$Other, na.rm = TRUE)
)
ethnicity_means
##      White    Black Hispanic    Asian    Other
## 1 55.10905 13.92342 13.10273 4.422476 13.46579
# Barplot of average ethnicity percentages
barplot(as.numeric(ethnicity_means), names.arg = colnames(ethnicity_means), main = "Average Ethnicity Distribution", col = rainbow(5))

The Average ethnicity districution is shown in the figure. white: 55.10905 black: 13.92342 hispanic: 13.10273 asian: 4.422476 other: 13.46579

Q5. What is the average net price (NetPrice) for students across colleges, and how does it compare to the average total cost (Cost)?

mean_net_price <- mean(college$NetPrice, na.rm = TRUE)
mean_cost <- mean(college$Cost, na.rm = TRUE)
mean_net_price
## [1] 19886.82
mean_cost
## [1] 34277.31
# Boxplots to compare NetPrice and Cost
boxplot(college$NetPrice, college$Cost, names = c("Net Price", "Total Cost"), main = "Comparison of Net Price and Total Cost", col = c("lightblue", "lightgreen"))

The average net price for students is 19886.82 The average total cost for students is 34277.21 the difference is 14390.39

Q6. Is there a difference in average tuition for in-state students (TuitionIn) versus out-of-state students (TuitonOut)?

mean_in_state <- mean(college$TuitionIn, na.rm = TRUE)
mean_out_state <- mean(college$TuitonOut, na.rm = TRUE)
mean_in_state
## [1] 21948.55
mean_out_state
## [1] 25336.66
# Boxplot comparison
boxplot(college$TuitionIn, college$TuitonOut, names = c("In-State", "Out-of-State"), main = "In-State vs Out-of-State Tuition", col = c("lightblue", "lightpink"))

look like there is a difference between both the parties. The difference is: 2288.11

Q7. What is the correlation between undergraduate enrollment and average ACT scores?

cor_enrollment_ACT <- cor(college$Enrollment, college$MidACT, use = "complete.obs")
cor_enrollment_ACT
## [1] 0.2572878
# Scatter plot to visualize the relationship
plot(college$Enrollment, college$MidACT, main = "Enrollment vs ACT Scores", xlab = "Enrollment", ylab = "ACT Score", col = "darkblue", pch = 16)

The co-relation between undergraduate enrollment and average ACT Score is: 0.2572

Q8. What is the variance in average debt for students who complete the program?

variance_debt <- var(college$Debt, na.rm = TRUE)
variance_debt
## [1] 28740171
# Histogram of debt
hist(college$Debt, main = "Distribution of Average Student Debt", xlab = "Debt ($)", col = "purple")

From the Data the Average Debt Turns out to be 28740171.

Q9. What is the average monthly salary for full-time faculty, and how does it compare to the percentage of full-time faculty?

mean_fac_salary <- mean(college$FacSalary, na.rm = TRUE)
mean_full_time_fac <- mean(college$FullTimeFac, na.rm = TRUE)
mean_fac_salary
## [1] 7465.778
mean_full_time_fac
## [1] 64.8313
# Scatter plot to examine the relationship
plot(college$FullTimeFac, college$FacSalary, main = "Full-Time Faculty Percentage vs Average Salary", xlab = "Percentage of Full-Time Faculty", ylab = "Average Faculty Salary ($)", col = "darkgreen", pch = 16)

Looks like the Avg monthly gross salary of a full time faculty is: 7465.778

Q10. What is the percentage of first-generation students across colleges, and how does it compare with the completion rate?

cor_firstgen_comprate <- cor(college$FirstGen, college$CompRate, use = "complete.obs")
cor_firstgen_comprate
## [1] -0.6643909
# Scatter plot to visualize the relationship
plot(college$FirstGen, college$CompRate, main = "First-Generation Students vs Completion Rate", xlab = "Percentage of First-Generation Students", ylab = "Completion Rate (%)", col = "orange", pch = 16)

The co-relation is -0.6643909.

Moving to the Conclusion of the Reserch :

In my analysis, I discovered several key patterns in college characteristics:

Enrollment vs. ACT Scores: I found that higher enrollment didn’t necessarily mean higher ACT scores, showing that larger student populations don’t always correlate with better academic performance.

Tuition Costs: I observed a noticeable difference in tuition costs between in-state and out-of-state students, which reflects the financial burden faced by those attending from out of state.

Ethnicity Distribution: The variation in ethnicity distribution across colleges stood out to me, pointing to differences in campus diversity and how this can shape the student experience.

Faculty Salaries: I noticed that the average faculty salary seemed linked to the percentage of full-time faculty, which suggests possible implications for recruitment and retention strategies.

Overall, this project allowed me to apply descriptive statistical methods to uncover insights into the educational landscape. These findings have laid the foundation for further analysis, such as exploring how college characteristics influence student success or comparing this dataset with others to assess trends over time.