1.Introduction

I used the data from (“https://www.lock5stat.com/datapage3e.html”).Then refer to the data “CollegeScores4yr”.

I propose the following 10 questions based on my own understanding of the data given.

  1. What is the mean cost of tuition fees for in-state and out-of-state students?

  2. What does the distribution of SAT scores look like among colleges?

  3. What is the frequency distribution of school locales?

  4. What are the average SAT scores by region?

  5. What is the average tuition for in-state students for colleges in each Region?

  6. What does the distribution of Average SAT scores look like for the colleges listed in the dataset?

  7. How does the debt of students vary by the type of school? (Public, Profit, Private)

  8. Is there a relationship between the percentage of Part-Time students and the completion rate?

  9. What is the mean average net cost after aid for colleges located in different Locale settings?

  10. How does median family income relate to the percentage of students receiving Pell grants?

Analysis

We will explore the following questions in detail.

Q1: What is the mean cost of tuition fees for in-state and out-of-state students?

mean_in_state <- mean(CollegeScores4yr$TuitionIn, na.rm = TRUE)
mean_out_state <- mean(CollegeScores4yr$TuitonOut, na.rm = TRUE)

mean_in_state
## [1] 21948.55
mean_out_state
## [1] 25336.66

The mean cost of tuition fees for in-state students is $21,948.55 and for out-state students it is $25,336.66.

Q2: What does the distribution of SAT scores look like among colleges?

ggplot(CollegeScores4yr[!is.na(CollegeScores4yr$AvgSAT), ], aes(x = AvgSAT)) +
  geom_histogram(binwidth = 50, fill = "lightblue", color = "black") +
  labs(title = "Distribution of SAT Scores", x = "SAT Score", y = "Frequency")

Q3: What is the frequency distribution of school locales?

locale_frequency <- table(CollegeScores4yr$Locale)
locale_frequency
## 
##   City  Rural Suburb   Town 
##   1019    112    510    371

The frequency distribution of school locales for City is 1019, Rural is 112, Suburb is 510, and Town is 371.

Q4: What are the average SAT scores by region?

ggplot(na.omit(CollegeScores4yr), aes(x = Region, y = AvgSAT, fill = Region)) +
  geom_boxplot() +
  labs(title = "SAT Scores by Region", x = "Region", y = "SAT Score")

Q5: What is the average tuition for in-state students for colleges in each Region?

avg_tuition_region <- tapply(CollegeScores4yr$TuitionIn, CollegeScores4yr$Region, mean, na.rm = TRUE)
avg_tuition_region
##   Midwest Northeast Southeast Territory      West 
## 22834.785 26915.949 18598.277  5096.872 20188.200

The avergae tuition in Midwest is $22,834.79, Northeast it is $26,915.95, Southeast it is $18,598.28, Territory is $5096.87 and West is $20,188.20.

Q6: What does the distribution of Average SAT scores look like for the colleges listed in the dataset?

ggplot(CollegeScores4yr[!is.na(CollegeScores4yr$AvgSAT), ], aes(x = AvgSAT)) +
  geom_histogram(binwidth = 50, fill = "steelblue", color = "black") +
  labs(title = "Distribution of Average SAT Scores", x = "Average SAT Score", y = "Frequency")

Q7: How does the debt of students vary by the type of school? (Public, Profit, Private)

ggplot(na.omit(CollegeScores4yr), aes(x = Control, y = Debt, fill = Control)) +
  geom_boxplot() +
  labs(title = "Debt Levels by School Control Type", x = "School Control Type", y = "Average Debt") +
  scale_fill_brewer(palette = "Pastel1")

Q8: Is there a relationship between the percentage of Part-Time students and the completion rate?

ggplot(na.omit(CollegeScores4yr), aes(x = PartTime, y = CompRate)) +
  geom_point(color = "darkgreen") +
  labs(title = "Part-Time Percentage vs. Completion Rate", x = "Percentage of Part-Time Students", y = "Completion Rate") +
  geom_smooth(method = "lm", color = "blue", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

cor(CollegeScores4yr$PartTime, CollegeScores4yr$CompRate, use = "complete.obs")
## [1] -0.4190961

Q9: What is the mean average net cost after aid for colleges located in different Locale settings?

mean_netprice_locale <- tapply(CollegeScores4yr$NetPrice, CollegeScores4yr$Locale, mean, na.rm = TRUE)
mean_netprice_locale
##     City    Rural   Suburb     Town 
## 20527.98 18707.56 20226.53 18130.08

Q10: How does median family income relate to the percentage of students receiving Pell grants?

ggplot(na.omit(CollegeScores4yr), aes(x = MedIncome, y = Pell)) +
  geom_point(color = "purple") +
  labs(title = "Median Family Income vs Pell Grant Percentage", x = "Median Family Income ($1000s)", y = "Pell Grant Percentage") +
  geom_smooth(method = "lm", color = "red", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

cor(CollegeScores4yr$MedIncome, CollegeScores4yr$Pell, use = "complete.obs")
## [1] -0.7079352

The correleation between median family income and Pell grant percentage is -0.71.

##Summary