Introduction

We use the data from statistics of 4 year colleges.

I propose the following 10 questions on my own understanding of the data.

  1. What is the mean cost of all colleges?
  2. What is the correlation between Faculty salary and students who Complete their college degree?
  3. What is the Average cost diffrence of in state vs out of state?
  4. What is the distribution of students who report being white in all schools
  5. what is the correlation between being a first generation college student and completeing college?
  6. what percent of undergrad students are online school only?
  7. of the students that revieved a pell grant what is there amount of debt?
  8. what is the median family income of black students?
  9. what is the average ACT score of females in college?
  10. what is the percentage of students who are first generation and are black?

Now i will ask ChatGPT for 10 questions based on the data parameters.

  1. What is the mean of the NetPrice for all schools?
  2. What is the median ACT score for the schools in the dataset?
  3. What is the standard deviation of TuitionIn for in-state tuition and fees across all schools?
  4. How is Pell (percent of students receiving Pell grants) distributed across the schools? Create a histogram to visualize it.
  5. What is the variance of Cost (average total cost for tuition, room, board, etc.) across the schools?
  6. Is there a correlation between Female (percent of female students) and FullTimeFac (percent of faculty that are full-time)?
  7. What is the mean of MedIncome (median family income) for schools that have a high percentage of FirstGen students?
  8. Create a barplot to compare the percentage of undergraduates who report being Hispanic, Black, and Asian.
  9. Create a boxplot to visualize the distribution of Debt (average debt for students who complete the program) across schools.
  10. What is the mean of TuitionOut (out-of-state tuition and fees) for schools with an enrollment of more than 10,000 students?

Anaylisis

We will explore the questions in more detail.

college=read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1 What is the mean cost of all colleges?

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31

The mean cost of of all colleges is $34277.

Q2 What is the correlation between Faculty salary and students who Complete their college degree?

cor(college$FacSalary, college$CompRate, use="complete.obs")
## [1] 0.577221

The correlation between the pay of faculty and the students who succesfully get there degree is 57.7%.

Q3 What is the median ACT score for the schools in the dataset?

median(college$MidACT, na.rm = TRUE)
## [1] 23

The median of ACT scores for all colleges in the dataset is a score of 23.

Q4 What is the Average cost diffrence of in state vs out of state?

mean(college$TuitonOut, na.rm = TRUE)-mean(college$TuitionIn, na.rm = TRUE)
## [1] 3388.112

on average the out of state tuition cost $3388 more than living in state.

Q5 What is the standard deviation of TuitionIn for in-state tuition and fees across all schools?

sd(college$TuitionIn, na.rm=TRUE)
## [1] 14130.3

The standard deviation of in state tuition across al schools is $14130.

Q6 What is the distribution of students who report being white in all schools?

hist(college$White, main="distribution of whites in colleges", xlab="population white")

Q7 what is the correlation between being a first generation college student and completeing college?

cor(college$CompRate, college$FirstGen , use="complete.obs")
## [1] -0.6643909

the correlation between being a first gen college student and completeing college is -66.4%

Q8 what is the average percent of undergrad students that are online school only?

mean(college$Online, na.rm = TRUE)/mean(college$Enrollment, na.rm = TRUE)
## [1] 3.103015e-06

the percent of students who are enrolled in college that are only online only is about 0.0031%

Q9 what is the average debt of students who complete their degree?

boxplot(college$Debt, ylab="Dollars of debt")

Q10 Is there a correlation between Female (percent of female students) and FullTimeFac (percent of faculty that are full-time)?

cor(college$Female, college$FullTimeFac, use="complete.obs")
## [1] -0.1777673

there is a correlation of -17.7% of female students and fulltime faculty.

Summary

Four-year college statistics

After doing some statistical analysis on colleges throughout the United States I found some very interesting things. The first stat was the average cost of room, board, and tuition for colleges. The research shows that it costs approximately 34,277 dollars. I find that crazy, but that is because I don’t have to pay for housing. Following that I was curious about how much more out-of-state tuition is vs in-state tuition. The data shows that it costs on average $3,400 more if you go to an out-of-state college. The next thing I was wondering about was if there is any correlation between how much the average faculty salary is and the success of students completing their degrees. after research, it shows that there is a positive 57.7% correlation, showing that if faculty get paid better the more success a student will have. Another curious thing I was wondering was the correlation between a student being a first-generation college student and completing their degree. It shows that there is a negative 66.4% correlation, saying that it is less likely you will not finish your degree if you are a first-generation college student. I was wondering what percent of college students in the US are all online for college, and it showed that approximately 0.0031% of all undergrad students are online. I was very intrigued by this answer, especially in today’s age where almost everything we do is online. I expected it to be a higher percentage. The final thing I wondered about was the amount of debt that students have after completing their degree this box plot shows the data that it produced.

Appendix

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31
cor(college$FacSalary, college$CompRate, use="complete.obs")
## [1] 0.577221
median(college$MidACT, na.rm = TRUE)
## [1] 23
mean(college$TuitonOut, na.rm = TRUE)-mean(college$TuitionIn, na.rm = TRUE)
## [1] 3388.112
sd(college$TuitionIn, na.rm=TRUE)
## [1] 14130.3
hist(college$White, main="distribution of whites in colleges", xlab="population white")

cor(college$CompRate, college$FirstGen , use="complete.obs")
## [1] -0.6643909
mean(college$Online, na.rm = TRUE)/mean(college$Enrollment, na.rm = TRUE)
## [1] 3.103015e-06
boxplot(college$Debt, ylab="Dollars of debt")

cor(college$Female, college$FullTimeFac, use="complete.obs")
## [1] -0.1777673