Suppose, you are a consultant at a education development
agency. You are given a data file college.csv which you
need to use to answer some business problems.
Load required packages:
library(tidyverse)
theme_set(theme_minimal()) # sets a theme for ggplot2
college <- read.csv("Data/college.csv")
Filter all colleges that are private and have an average SAT
score greater than 1100. For these filtered colleges, calculate the
average admission rate and median debt.
Hint: Use filter() and summarise() from dplyr.
| Avg_admission_rate | Median_debt |
|---|---|
| 0.5552191 | 25000 |
Group the colleges (use all data, not the filtered data from question 1) by region and calculate the average tuition and average faculty salary for each region.
Hint: Use group_by() and summarise().
| region | Avg_tuition | Avg_faculty_salary |
|---|---|---|
| Midwest | 22114.78 | 7059.065 |
| Northeast | 25297.97 | 8755.258 |
| South | 17263.13 | 7036.401 |
| West | 21430.99 | 8709.639 |
Find the top 10 colleges with the highest admission rate in the dataset. Display only the college name, city, state, and admission rate.
Hint: Use arrange() and slice_head(n = 10) to get the top 10. See help to understand how slice_head() function works.
| name | city | state | admission_rate |
|---|---|---|---|
| Southeastern Bible College | Birmingham | AL | 1 |
| Adventist University of Health Sciences | Orlando | FL | 1 |
| Trinity International University-Illinois | Deerfield | IL | 1 |
| Saint Mary-of-the-Woods College | Saint Mary of the Woods | IN | 1 |
| Cleveland University-Kansas City | Overland Park | KS | 1 |
| University of Pikeville | Pikeville | KY | 1 |
| Calvary Bible College and Theological Seminary | Kansas City | MO | 1 |
| Montana State University-Northern | Havre | MT | 1 |
| Cleveland State Community College | Cleveland | TN | 1 |
| The King’s University | Southlake | TX | 1 |
Create a scatter plot that shows the relationship between tuition and average faculty salary for all colleges. Use control (Private/Public) to color the points.
Hint: Use ggplot2 to create the plot and color points by control.
Filter the colleges in West and South region where the loan default rate is below 0.1 and plot a bar chart showing the count of such colleges by control.
Hint: Use filter() from dplyr and geom_bar() in ggplot2.
Find the top 10 states with the highest number of undergraduates enrolled. Visualize it in a bar chart.
Hint: Use group_by(state) and summarise() to calculate the total, and then arrange() to rank them. Use geom_col() to visualize.