The first step in analyzing U.S. colleges for Cornelia is to import the data we need for her criteria. The Crime_2015 CSV file contains data about violent crime rates in U.S. cities. The CollegeScorecard Excel file contains data about college names, cities, accrediting agencies, and the midpoint of SAT math scores. The code below shows that I only imported the columns of interest in each data set and dropped any NA values.
I noticed that Dallas, TX, had multiple violent crime rates, so I pre-aggregated violent crime by mean violent crime grouped by city. The next step was to merge the two data sets by city and state (given the same city names appear in different states).
library(tidyverse)
library(readxl)
url <- "http://jsuleiman.com/datasets/CollegeScorecard.xlsx"
destfile <- "./data/CollegeScorecard.xlsx"
curl::curl_download(url, destfile)
CollegeScorecard <- read_excel("./data/CollegeScorecard.xlsx")[c(4:6, 8, 46)] %>% rename(State = STABBR, College_Name = INSTNM, SAT_Math_Mid = SATMTMID, City = CITY)
CollegeRequire <- CollegeScorecard %>% drop_na()
url <- "http://jsuleiman.com/datasets/Crime_2015.csv"
destfile <- "./data/Crime_2015.csv"
curl::curl_download(url, destfile)
Crime_2015 <- read_csv("./data/Crime_2015.csv")[c(2, 11, 12)]
Crime_2015 <- Crime_2015 %>% drop_na() %>% group_by(City, State) %>% summarize(ViolentCrime_avg = mean(ViolentCrime))
CollegeList <- merge(CollegeRequire, Crime_2015,
by.x = c("City", "State"),
by.y = c("City", "State"),
all.x = FALSE,
all.y = FALSE)
Now I take a look at the first few rows of data to see if everything looks okay.
head(CollegeRequire)
## # A tibble: 6 x 5
## College_Name City State AccredAgency SAT_Math_Mid
## <chr> <chr> <chr> <chr> <dbl>
## 1 Alabama A & M Unive… Normal AL Southern Association of Coll… 400
## 2 University of Alaba… Birming… AL Southern Association of Coll… 585
## 3 University of Alaba… Huntsvi… AL Southern Association of Coll… 580
## 4 Alabama State Unive… Montgom… AL Southern Association of Coll… 425
## 5 The University of A… Tuscalo… AL Southern Association of Coll… 570
## 6 Auburn University Auburn … AL Southern Association of Coll… 595
head(Crime_2015)
## # A tibble: 6 x 3
## # Groups: City [5]
## City State ViolentCrime_avg
## <chr> <chr> <dbl>
## 1 Abilene TX 412.
## 2 Aguadilla Aguadilla-Isabela, Puerto Rico M.S.A. 109.
## 3 Akron OH 238.
## 4 Albany GA 668.
## 5 Albany OR 114.
## 6 Albuquerque NM 793.
head(CollegeList)
## City State College_Name
## 1 Abilene TX Abilene Christian University
## 2 Abilene TX Hardin-Simmons University
## 3 Abilene TX McMurry University
## 4 Akron OH University of Akron Main Campus
## 5 Albany GA Albany State University
## 6 Albuquerque NM University of New Mexico-Main Campus
## AccredAgency
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 North Central Association of Colleges and Schools The Higher Learning Commission
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 North Central Association of Colleges and Schools The Higher Learning Commission
## SAT_Math_Mid ViolentCrime_avg
## 1 545 412.5
## 2 520 412.5
## 3 495 412.5
## 4 540 238.4
## 5 435 667.9
## 6 535 792.6
The next step is to filter the data based on the following criteria:
I sort the data by the midpoint SAT math score, from highest to lowest.
CollegeSelect <- CollegeList %>% filter(AccredAgency == "Southern Association of Colleges and Schools Commission on Colleges" & ViolentCrime_avg < median(ViolentCrime_avg) & SAT_Math_Mid > median(SAT_Math_Mid)) %>% select(College_Name, City, ViolentCrime_avg, SAT_Math_Mid, AccredAgency) %>% arrange(desc(SAT_Math_Mid))
The easiest way I can get a count of the number of colleges on Cornelia’s list is to turn the categorical variable College_Name into a factor. This creates a count of the number of College_Names that meet Cornelia’s criteria. We can see that the number of colleges she has to pick from is 25.
CollegeSelect$College_Name <- factor(CollegeSelect$College_Name)
summary(CollegeSelect)
## College_Name City
## Austin College : 1 Length:25
## Citadel Military College of South Carolina: 1 Class :character
## College of Charleston : 1 Mode :character
## Dallas Baptist University : 1
## Florida Southern College : 1
## Guilford College : 1
## (Other) :19
## ViolentCrime_avg SAT_Math_Mid AccredAgency
## Min. :145.8 Min. :530 Length:25
## 1st Qu.:256.3 1st Qu.:550 Class :character
## Median :325.7 Median :570 Mode :character
## Mean :307.1 Mean :583
## 3rd Qu.:359.4 3rd Qu.:600
## Max. :399.0 Max. :685
##
I use the kable function from the knitr package to make a prettier table of Cornelia’s colleges.
library(knitr)
kable(CollegeSelect, align = "llccl", col.names = c('College Name', 'City', 'Violent Crime', 'SAT Math Mid', 'Accrediting Agency'), caption = "Table of Cornelia's Colleges by Criteria")
| College Name | City | Violent Crime | SAT Math Mid | Accrediting Agency |
|---|---|---|---|---|
| University of Virginia-Main Campus | Charlottesville | 164.6 | 685 | Southern Association of Colleges and Schools Commission on Colleges |
| Southern Methodist University | Dallas | 334.1 | 660 | Southern Association of Colleges and Schools Commission on Colleges |
| The University of Texas at Austin | Austin | 287.7 | 650 | Southern Association of Colleges and Schools Commission on Colleges |
| University of Georgia | Athens | 325.7 | 625 | Southern Association of Colleges and Schools Commission on Colleges |
| Austin College | Sherman | 327.1 | 615 | Southern Association of Colleges and Schools Commission on Colleges |
| Texas A & M University-College Station | College Station | 315.5 | 605 | Southern Association of Colleges and Schools Commission on Colleges |
| Texas Christian University | Fort Worth | 351.2 | 600 | Southern Association of Colleges and Schools Commission on Colleges |
| University of North Carolina at Asheville | Asheville | 202.0 | 595 | Southern Association of Colleges and Schools Commission on Colleges |
| University of South Florida-Main Campus | Tampa | 384.1 | 595 | Southern Association of Colleges and Schools Commission on Colleges |
| Mercer University | Macon | 319.9 | 590 | Southern Association of Colleges and Schools Commission on Colleges |
| College of Charleston | Charleston | 399.0 | 585 | Southern Association of Colleges and Schools Commission on Colleges |
| James Madison University | Harrisonburg | 145.8 | 580 | Southern Association of Colleges and Schools Commission on Colleges |
| Millsaps College | Jackson | 359.4 | 570 | Southern Association of Colleges and Schools Commission on Colleges |
| Transylvania University | Lexington | 256.3 | 570 | Southern Association of Colleges and Schools Commission on Colleges |
| The University of Alabama | Tuscaloosa | 392.9 | 570 | Southern Association of Colleges and Schools Commission on Colleges |
| University of Kentucky | Lexington | 256.3 | 565 | Southern Association of Colleges and Schools Commission on Colleges |
| Dallas Baptist University | Dallas | 334.1 | 562 | Southern Association of Colleges and Schools Commission on Colleges |
| Saint Edward’s University | Austin | 287.7 | 560 | Southern Association of Colleges and Schools Commission on Colleges |
| Citadel Military College of South Carolina | Charleston | 399.0 | 550 | Southern Association of Colleges and Schools Commission on Colleges |
| Virginia Commonwealth University | Richmond | 236.5 | 550 | Southern Association of Colleges and Schools Commission on Colleges |
| Florida Southern College | Lakeland | 339.7 | 545 | Southern Association of Colleges and Schools Commission on Colleges |
| The University of Texas at Tyler | Tyler | 295.7 | 544 | Southern Association of Colleges and Schools Commission on Colleges |
| The University of Tampa | Tampa | 384.1 | 540 | Southern Association of Colleges and Schools Commission on Colleges |
| Randolph College | Lynchburg | 180.4 | 535 | Southern Association of Colleges and Schools Commission on Colleges |
| Guilford College | Greensboro | 398.2 | 530 | Southern Association of Colleges and Schools Commission on Colleges |