Introduction

Suppose, you are a consultant at a education development agency. You are given a data file college.csv which you need to use to answer some business problems.

Packages

Load required packages:

library(tidyverse)
theme_set(theme_minimal())  # sets a theme for ggplot2

Data

college <- read.csv("Data/college.csv")

Question 1 (2 points)

Filter all colleges that are private and have an average SAT score greater than 1100. For these filtered colleges, calculate the average admission rate and median debt.
Hint: Use filter() and summarise() from dplyr.

Avg_admission_rate Median_debt
0.5552191 25000

Question 2 (2 points)

Group the colleges (use all data, not the filtered data from question 1) by region and calculate the average tuition and average faculty salary for each region.

Hint: Use group_by() and summarise().

region Avg_tuition Avg_faculty_salary
Midwest 22114.78 7059.065
Northeast 25297.97 8755.258
South 17263.13 7036.401
West 21430.99 8709.639

Question 3 (2 points)

Find the top 10 colleges with the highest admission rate in the dataset. Display only the college name, city, state, and admission rate.

Hint: Use arrange() and slice_head(n = 10) to get the top 10. See help to understand how slice_head() function works.

name city state admission_rate
Southeastern Bible College Birmingham AL 1
Adventist University of Health Sciences Orlando FL 1
Trinity International University-Illinois Deerfield IL 1
Saint Mary-of-the-Woods College Saint Mary of the Woods IN 1
Cleveland University-Kansas City Overland Park KS 1
University of Pikeville Pikeville KY 1
Calvary Bible College and Theological Seminary Kansas City MO 1
Montana State University-Northern Havre MT 1
Cleveland State Community College Cleveland TN 1
The King’s University Southlake TX 1

Question 4 (2 points)

Create a scatter plot that shows the relationship between tuition and average faculty salary for all colleges. Use control (Private/Public) to color the points.

Hint: Use ggplot2 to create the plot and color points by control.

Question 5 (2 points)

Filter the colleges in West and South region where the loan default rate is below 0.1 and plot a bar chart showing the count of such colleges by control.

Hint: Use filter() from dplyr and geom_bar() in ggplot2.

Question 6 (Bonus 2 points)

Find the top 10 states with the highest number of undergraduates enrolled. Visualize it in a bar chart.

Hint: Use group_by(state) and summarise() to calculate the total, and then arrange() to rank them. Use geom_col() to visualize.