In this project, I will use data from the CollegeScores4Yr dataset from Lock5Stat.com
Based on the dataset, I ask 10 questions from my own understanding.
I asked ChatGPT to produce 10 questions based on the dataset. Here are 10 questions from ChatGPT.
Based off both sets of 10 questions. I will choose 10 to answer.
max(college$Cost, na.rm = TRUE)
## [1] 72717
The maximum cost of a college in the data is $72,717.
mean(college$Female, na.rm = TRUE)
## [1] 59.29588
The average percentage of female students in college is 59.3%.
cor(college$TuitionFTE, college$CompRate, use = "complete.obs")
## [1] 0.4556305
The correlation between tuition cost and completion rate is 0.456 meaning that their is a positive correlation between high tuition cost and higher completion rate.
sd(college$MedIncome, na.rm = TRUE)
## [1] 22.85785
The standard deviation of the median income levels of students’ families is $22,858.85.
var(college$FirstGen, na.rm = TRUE)
## [1] 122.8821
The variance of first generation students is 122.88.
hist(college$CompRate, main = "Completion rate across institutions", xlab = "Completion Rate", ylab = "Frequency", col = "green")
The completion rate of different colleges is shown above. The graph appears to have a standard curve with the average completion rate around 50-60%.
min(college$AvgSAT, na.rm = TRUE)
## [1] 564
The minimum average SAT among these colleges is 564.
median(college$NetPrice, na.rm = TRUE)
## [1] 19337.5
The median net price of colleges in the data is $19,337.50.
stem(college$MidACT)
##
## The decimal point is at the |
##
## 6 | 0
## 8 | 0
## 10 | 0
## 12 |
## 14 | 00
## 16 | 00000000000000000000000
## 18 | 00000000000000000000000000000000000000000000000000000000000000000000+4
## 20 | 00000000000000000000000000000000000000000000000000000000000000000000+182
## 22 | 00000000000000000000000000000000000000000000000000000000000000000000+251
## 24 | 00000000000000000000000000000000000000000000000000000000000000000000+174
## 26 | 00000000000000000000000000000000000000000000000000000000000000000000+37
## 28 | 00000000000000000000000000000000000000000000000000000000000000000000
## 30 | 00000000000000000000000000000000000000000000000
## 32 | 00000000000000000000000000000000000000000
## 34 | 0000000000000
The stem-and-leaf plot above takes into account all of the colleges in the data. Their is a large concentration with the average ACT being around 20-26.
boxplot(AdmitRate ~ Region, data = college, main = "Completiton Rate Across Regions", ylab = "Admittance Rate", col = "green")
The boxplot above shows us admittance rates across different regions in the United States. We can see the first, median, and third quartile rates for each region.
In this project, we examined data given to us from Lock5Stat.com analyzing their CollegeScores4yr dataset. We chose questions from our own thinking and from ChatGPT to show a wide understanding of the methods in Chapter 6.
R code
Q1: max(college$Cost, na.rm = TRUE)
Q2: mean(college$Female, na.rm = TRUE)
Q3: cor(college$TuitionFTE, college$CompRate, use = "complete.obs")
Q4: sd(college$MedIncome, na.rm = TRUE)
Q5: var(college$FirstGen, na.rm = TRUE)
Q6: hist(college$CompRate, main = "Completion rate across institutions", xlab = "Completion Rate", ylab = "Frequency", col = "green")
Q7: min(college$AvgSAT, na.rm = TRUE)
Q8: median(college$NetPrice, na.rm = TRUE)
Q9: stem(college$MidACT)
Q10: boxplot(AdmitRate ~ Region, data = college, main = "Completiton Rate Across Regions", ylab = "Admittance Rate", col = "green")