I used the data from (“https://www.lock5stat.com/datapage3e.html”).Then refer to the data “CollegeScores4yr”.
I propose the following 10 questions based on my own understanding of the data given.
What is the mean cost of tuition fees for in-state and out-of-state students?
What does the distribution of SAT scores look like among colleges?
What is the frequency distribution of school locales?
What are the average SAT scores by region?
What is the average tuition for in-state students for colleges in each Region?
What does the distribution of Average SAT scores look like for the colleges listed in the dataset?
How does the debt of students vary by the type of school? (Public, Profit, Private)
Is there a relationship between the percentage of Part-Time students and the completion rate?
What is the mean average net cost after aid for colleges located in different Locale settings?
How does median family income relate to the percentage of students receiving Pell grants?
We will explore the following questions in detail.
mean_in_state <- mean(CollegeScores4yr$TuitionIn, na.rm = TRUE)
mean_out_state <- mean(CollegeScores4yr$TuitonOut, na.rm = TRUE)
mean_in_state
## [1] 21948.55
mean_out_state
## [1] 25336.66
The mean cost of tuition fees for in-state students is $21,948.55 and for out-state students it is $25,336.66.
ggplot(CollegeScores4yr[!is.na(CollegeScores4yr$AvgSAT), ], aes(x = AvgSAT)) +
geom_histogram(binwidth = 50, fill = "lightblue", color = "black") +
labs(title = "Distribution of SAT Scores", x = "SAT Score", y = "Frequency")
locale_frequency <- table(CollegeScores4yr$Locale)
locale_frequency
##
## City Rural Suburb Town
## 1019 112 510 371
The frequency distribution of school locales for City is 1019, Rural is 112, Suburb is 510, and Town is 371.
ggplot(na.omit(CollegeScores4yr), aes(x = Region, y = AvgSAT, fill = Region)) +
geom_boxplot() +
labs(title = "SAT Scores by Region", x = "Region", y = "SAT Score")
avg_tuition_region <- tapply(CollegeScores4yr$TuitionIn, CollegeScores4yr$Region, mean, na.rm = TRUE)
avg_tuition_region
## Midwest Northeast Southeast Territory West
## 22834.785 26915.949 18598.277 5096.872 20188.200
The avergae tuition in Midwest is $22,834.79, Northeast it is $26,915.95, Southeast it is $18,598.28, Territory is $5096.87 and West is $20,188.20.
ggplot(CollegeScores4yr[!is.na(CollegeScores4yr$AvgSAT), ], aes(x = AvgSAT)) +
geom_histogram(binwidth = 50, fill = "steelblue", color = "black") +
labs(title = "Distribution of Average SAT Scores", x = "Average SAT Score", y = "Frequency")
ggplot(na.omit(CollegeScores4yr), aes(x = Control, y = Debt, fill = Control)) +
geom_boxplot() +
labs(title = "Debt Levels by School Control Type", x = "School Control Type", y = "Average Debt") +
scale_fill_brewer(palette = "Pastel1")
ggplot(na.omit(CollegeScores4yr), aes(x = PartTime, y = CompRate)) +
geom_point(color = "darkgreen") +
labs(title = "Part-Time Percentage vs. Completion Rate", x = "Percentage of Part-Time Students", y = "Completion Rate") +
geom_smooth(method = "lm", color = "blue", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
cor(CollegeScores4yr$PartTime, CollegeScores4yr$CompRate, use = "complete.obs")
## [1] -0.4190961
mean_netprice_locale <- tapply(CollegeScores4yr$NetPrice, CollegeScores4yr$Locale, mean, na.rm = TRUE)
mean_netprice_locale
## City Rural Suburb Town
## 20527.98 18707.56 20226.53 18130.08
ggplot(na.omit(CollegeScores4yr), aes(x = MedIncome, y = Pell)) +
geom_point(color = "purple") +
labs(title = "Median Family Income vs Pell Grant Percentage", x = "Median Family Income ($1000s)", y = "Pell Grant Percentage") +
geom_smooth(method = "lm", color = "red", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
cor(CollegeScores4yr$MedIncome, CollegeScores4yr$Pell, use = "complete.obs")
## [1] -0.7079352
The correleation between median family income and Pell grant percentage is -0.71.
##Summary