Data Import

The first step in analyzing U.S. colleges for Cornelia is to import the data we need for her criteria. The Crime_2015 CSV file contains data about violent crime rates in U.S. cities. The CollegeScorecard Excel file contains data about college names, cities, accrediting agencies, and the midpoint of SAT math scores. The code below shows that I only imported the columns of interest in each data set and dropped any NA values.

I noticed that Dallas, TX, had multiple violent crime rates, so I pre-aggregated violent crime by mean violent crime grouped by city. The next step was to merge the two data sets by city and state (given the same city names appear in different states).

library(tidyverse)
library(readxl)
url <- "http://jsuleiman.com/datasets/CollegeScorecard.xlsx"
destfile <- "./data/CollegeScorecard.xlsx"
curl::curl_download(url, destfile)
CollegeScorecard <- read_excel("./data/CollegeScorecard.xlsx")[c(4:6, 8, 46)] %>% rename(State = STABBR, College_Name = INSTNM, SAT_Math_Mid = SATMTMID, City = CITY)
CollegeRequire <- CollegeScorecard %>% drop_na()
url <- "http://jsuleiman.com/datasets/Crime_2015.csv"
destfile <- "./data/Crime_2015.csv"
curl::curl_download(url, destfile)
Crime_2015 <- read_csv("./data/Crime_2015.csv")[c(2, 11, 12)]
Crime_2015 <- Crime_2015 %>% drop_na() %>% group_by(City, State) %>% summarize(ViolentCrime_avg = mean(ViolentCrime))
CollegeList <- merge(CollegeRequire, Crime_2015, 
  by.x = c("City", "State"),
  by.y = c("City", "State"),
  all.x = FALSE,
  all.y = FALSE)

Quick Check of Data

Now I take a look at the first few rows of data to see if everything looks okay.

head(CollegeRequire)
## # A tibble: 6 x 5
##   College_Name         City     State AccredAgency                  SAT_Math_Mid
##   <chr>                <chr>    <chr> <chr>                                <dbl>
## 1 Alabama A & M Unive… Normal   AL    Southern Association of Coll…          400
## 2 University of Alaba… Birming… AL    Southern Association of Coll…          585
## 3 University of Alaba… Huntsvi… AL    Southern Association of Coll…          580
## 4 Alabama State Unive… Montgom… AL    Southern Association of Coll…          425
## 5 The University of A… Tuscalo… AL    Southern Association of Coll…          570
## 6 Auburn University    Auburn … AL    Southern Association of Coll…          595
head(Crime_2015)
## # A tibble: 6 x 3
## # Groups:   City [5]
##   City        State                                 ViolentCrime_avg
##   <chr>       <chr>                                            <dbl>
## 1 Abilene     TX                                                412.
## 2 Aguadilla   Aguadilla-Isabela, Puerto Rico M.S.A.             109.
## 3 Akron       OH                                                238.
## 4 Albany      GA                                                668.
## 5 Albany      OR                                                114.
## 6 Albuquerque NM                                                793.
head(CollegeList)
##          City State                         College_Name
## 1     Abilene    TX         Abilene Christian University
## 2     Abilene    TX            Hardin-Simmons University
## 3     Abilene    TX                   McMurry University
## 4       Akron    OH      University of Akron Main Campus
## 5      Albany    GA              Albany State University
## 6 Albuquerque    NM University of New Mexico-Main Campus
##                                                                       AccredAgency
## 1              Southern Association of Colleges and Schools Commission on Colleges
## 2              Southern Association of Colleges and Schools Commission on Colleges
## 3              Southern Association of Colleges and Schools Commission on Colleges
## 4 North Central Association of Colleges and Schools The Higher Learning Commission
## 5              Southern Association of Colleges and Schools Commission on Colleges
## 6 North Central Association of Colleges and Schools The Higher Learning Commission
##   SAT_Math_Mid ViolentCrime_avg
## 1          545            412.5
## 2          520            412.5
## 3          495            412.5
## 4          540            238.4
## 5          435            667.9
## 6          535            792.6

Subset Data

The next step is to filter the data based on the following criteria:

I sort the data by the midpoint SAT math score, from highest to lowest.

CollegeSelect <- CollegeList %>% filter(AccredAgency == "Southern Association of Colleges and Schools Commission on Colleges" & ViolentCrime_avg < median(ViolentCrime_avg) & SAT_Math_Mid > median(SAT_Math_Mid)) %>% select(College_Name, City, ViolentCrime_avg, SAT_Math_Mid, AccredAgency) %>% arrange(desc(SAT_Math_Mid))

Count of Colleges

The easiest way I can get a count of the number of colleges on Cornelia’s list is to turn the categorical variable College_Name into a factor. This creates a count of the number of College_Names that meet Cornelia’s criteria. We can see that the number of colleges she has to pick from is 25.

CollegeSelect$College_Name <- factor(CollegeSelect$College_Name)
summary(CollegeSelect)
##                                      College_Name     City          
##  Austin College                            : 1    Length:25         
##  Citadel Military College of South Carolina: 1    Class :character  
##  College of Charleston                     : 1    Mode  :character  
##  Dallas Baptist University                 : 1                      
##  Florida Southern College                  : 1                      
##  Guilford College                          : 1                      
##  (Other)                                   :19                      
##  ViolentCrime_avg  SAT_Math_Mid AccredAgency      
##  Min.   :145.8    Min.   :530   Length:25         
##  1st Qu.:256.3    1st Qu.:550   Class :character  
##  Median :325.7    Median :570   Mode  :character  
##  Mean   :307.1    Mean   :583                     
##  3rd Qu.:359.4    3rd Qu.:600                     
##  Max.   :399.0    Max.   :685                     
## 

Table of Cornelia’s Colleges

I use the kable function from the knitr package to make a prettier table of Cornelia’s colleges.

library(knitr)
kable(CollegeSelect, align = "llccl", col.names = c('College Name', 'City', 'Violent Crime', 'SAT Math Mid', 'Accrediting Agency'), caption = "Table of Cornelia's Colleges by Criteria")
Table of Cornelia’s Colleges by Criteria
College Name City Violent Crime SAT Math Mid Accrediting Agency
University of Virginia-Main Campus Charlottesville 164.6 685 Southern Association of Colleges and Schools Commission on Colleges
Southern Methodist University Dallas 334.1 660 Southern Association of Colleges and Schools Commission on Colleges
The University of Texas at Austin Austin 287.7 650 Southern Association of Colleges and Schools Commission on Colleges
University of Georgia Athens 325.7 625 Southern Association of Colleges and Schools Commission on Colleges
Austin College Sherman 327.1 615 Southern Association of Colleges and Schools Commission on Colleges
Texas A & M University-College Station College Station 315.5 605 Southern Association of Colleges and Schools Commission on Colleges
Texas Christian University Fort Worth 351.2 600 Southern Association of Colleges and Schools Commission on Colleges
University of North Carolina at Asheville Asheville 202.0 595 Southern Association of Colleges and Schools Commission on Colleges
University of South Florida-Main Campus Tampa 384.1 595 Southern Association of Colleges and Schools Commission on Colleges
Mercer University Macon 319.9 590 Southern Association of Colleges and Schools Commission on Colleges
College of Charleston Charleston 399.0 585 Southern Association of Colleges and Schools Commission on Colleges
James Madison University Harrisonburg 145.8 580 Southern Association of Colleges and Schools Commission on Colleges
Millsaps College Jackson 359.4 570 Southern Association of Colleges and Schools Commission on Colleges
Transylvania University Lexington 256.3 570 Southern Association of Colleges and Schools Commission on Colleges
The University of Alabama Tuscaloosa 392.9 570 Southern Association of Colleges and Schools Commission on Colleges
University of Kentucky Lexington 256.3 565 Southern Association of Colleges and Schools Commission on Colleges
Dallas Baptist University Dallas 334.1 562 Southern Association of Colleges and Schools Commission on Colleges
Saint Edward’s University Austin 287.7 560 Southern Association of Colleges and Schools Commission on Colleges
Citadel Military College of South Carolina Charleston 399.0 550 Southern Association of Colleges and Schools Commission on Colleges
Virginia Commonwealth University Richmond 236.5 550 Southern Association of Colleges and Schools Commission on Colleges
Florida Southern College Lakeland 339.7 545 Southern Association of Colleges and Schools Commission on Colleges
The University of Texas at Tyler Tyler 295.7 544 Southern Association of Colleges and Schools Commission on Colleges
The University of Tampa Tampa 384.1 540 Southern Association of Colleges and Schools Commission on Colleges
Randolph College Lynchburg 180.4 535 Southern Association of Colleges and Schools Commission on Colleges
Guilford College Greensboro 398.2 530 Southern Association of Colleges and Schools Commission on Colleges