The purpose of this document is to show how the scorecard data set has been edited in order to provide clear and concise visualizations and calculations

The data set includes a list of colleges and other variables that describe different aspects of it. This includes location, ACT and SAT Scores, cost, admission rate, and others.

One will see graphs and tables that compare ohio schools to national averages and bordering averages and numbers in areas like number of institutions and cost averages

#install.packages("tidyverse") #for all ofthe ggplots
#install.packages("sqldf") #for sql commands i run to make tables
#install.packages("dplyr") #for filtering, selecting, and summarizing data
#install.packages("skimr") #Skim the data for column bar charts and a data summary
#install.packages('DT') #Create Data table
library(DT)
library(skimr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
##   dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
##   Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
##   Reason: image not found
## Could not load tcltk.  Will use slower R code instead.
## Loading required package: RSQLite
library(tidyverse)
## ── Attaching packages ─────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ stringr 1.4.0
## ✓ tidyr   1.0.2     ✓ forcats 0.4.0
## ✓ readr   1.3.1
## ── Conflicts ────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#Read in the dataset
scorecard <- read_csv("http://asayanalytics.com/scorecard")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   INSTNM = col_character(),
##   CITY = col_character(),
##   STABBR = col_character(),
##   ZIP = col_character(),
##   CONTROL = col_character(),
##   HBCU = col_character(),
##   UGDS = col_character()
## )
## See spec(...) for full column specifications.

This data set was not cleaned of its missing values. Instead, Data conformity was peformed on smaller errors. Mis-inputs and blatant errors were addressed directly, And dummy variables were created. One indicates whether the average family income of students is above or below the Ohio average. Another indicated whether a school is a “College” or “University.”

#Data Cleaning
#convert Undergrads and ZIP values into a numeric value
scorecard$UGDS <- as.numeric(scorecard$UGDS)
## Warning: NAs introduced by coercion
scorecard$ZIP <- as.numeric(scorecard$ZIP)
#Convert all undergrad values to positive
scorecard$UGDS <- abs(scorecard$UGDS) 
#sum(scorecard$FEMALE > 1.0)
#Ran this for every percent column and found no values above 1

Here I removed the SQL terminoloy used on missing variables. The CONTROL column had words for some values, so i blended them into character numbers.

Five dummy variables are created here.

Summary of the variables

Variable Description
LATITUDE Latitude
LONGITUDE Longitude
HBCU Flag for Historically Black College and University
MENONLY Flag for men-only college
WOMENONLY Flag for women-only college
ADM_RATE Admission rate
ACTCM25 25th percentile of the ACT cumulative score
ACTCM75 75th percentile of the ACT cumulative score
ACTCMMID Midpoint of the ACT cumulative score
SAT_AVG Average SAT equivalent score of students
UGDS Enrollment of degree-seeking students
COSTT4_A Average cost of attendance (academic year)
AVGFACSAL Average faculty salary (monthy)
PCTPELL Percentage of undergraduates who receive a Pell Grant
PCTFLOAN Percent of all undergraduate students receiving a federal/student loan
AGE_ENTRY Average age of entry
FEMALE Share of female students
MARRIED Share of married students
DEPENDENT Share of dependent students
VETERAN Share of veteran students
FIRST_GEN Share of first-generation students
FAMINC Average family income in real 2015 dollars
CONTROL 1 Public
2 Private nonprofit
3 Private for-profit
LOCALE:
11 City: Large (population of 250,000 or more) Shown/used on consumer website.
12 City: Midsiz e (population of at least 100,000 but less than 250,000)
13 City: Small (population less than 100,000)
21 Suburb: Larg e (outside principal city, in urbanized area with population of 250,000 or more)
22 Suburb: Mids ize (outside principal city, in urbanized area with population of at least 100,000 but less than 250,000)
23 Suburb: Smal l (outside principal city, in urbanized area with population less than 100,000)
31 Town: Fringe (in urban cluster up to 10 miles from an urbanized area)
32 Town: Distan t (in urban cluster more than 10 miles and up to 35 miles from an urbanized area)
33 Town: Remote (in urban cluster more than 35 miles from an urbanized area)
41 Rural: Fring e (rural territory up to 5 miles from an urbanized area or up to 2.5 miles from an urban cluster)
42 Rural: Dista nt (rural territory more than 5 miles but up to 25 miles from an urbanized area or more than 2.5 and up to 10 miles from an urban cluster)
43 Rural: Remot e (rural territory more than 25 miles from an urbanized area and more than 10 miles from an urban cluster)

Data Table

## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

Summary and skim of the variables

Data summary
Name scorecard
Number of rows 7115
Number of columns 35
_______________________
Column type frequency:
character 9
numeric 26
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
INSTNM 0 1.00 3 93 0 6981 0
CITY 0 1.00 3 24 0 2475 0
STABBR 0 1.00 2 2 0 59 0
CONTROL 0 1.00 1 1 0 3 0
HBCU 0 1.00 1 4 0 3 0
avg_family_income 500 0.93 13 13 0 2 0
ZIP_Size 0 1.00 5 5 0 2 0
Veteran_Above_10 4538 0.36 9 9 0 2 0
Young_Entry 500 0.93 3 5 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ID 0 1.00 1866527.32 6957330.02 100654.00 174096.50 229027.00 450609.50 49005401.00 ▇▁▁▁▁
ZIP 0 1.00 48903.82 29606.47 602.00 23826.00 46208.00 75441.50 99801.00 ▇▇▆▇▇
LOCALE 444 0.94 19.79 9.54 -3.00 12.00 21.00 22.00 43.00 ▁▇▅▂▂
LATITUDE 445 0.94 37.36 5.84 -14.32 33.96 38.79 41.33 71.32 ▁▁▆▇▁
LONGITUDE 445 0.94 -90.32 17.84 -170.74 -97.34 -86.34 -78.89 171.38 ▂▇▁▁▁
MENONLY 444 0.94 0.01 0.10 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
WOMENONLY 444 0.94 0.01 0.08 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
ADM_RATE 5078 0.29 0.68 0.21 0.00 0.55 0.71 0.84 1.00 ▁▂▅▇▇
ACTCM25 5823 0.18 20.55 3.88 1.00 18.00 20.00 22.00 34.00 ▁▁▇▅▁
ACTCM75 5823 0.18 25.82 3.65 9.00 23.00 26.00 28.00 35.00 ▁▁▆▇▂
ACTCMMID 5823 0.18 23.44 3.71 6.00 21.00 23.00 25.00 35.00 ▁▁▇▅▁
SAT_AVG 5795 0.19 1131.77 130.10 564.00 1044.75 1117.00 1195.00 1558.00 ▁▁▇▃▁
UGDS 748 0.89 2426.06 5481.33 0.00 106.00 401.00 2018.00 77269.00 ▇▁▁▁▁
COSTT4_A 3531 0.50 26337.07 15015.59 0.00 14000.25 22646.50 33941.75 93704.00 ▇▇▃▁▁
AVGFACSAL 2868 0.60 6617.13 2458.97 0.00 4965.00 6364.00 7945.50 22924.00 ▂▇▂▁▁
PCTPELL 770 0.89 0.48 0.22 0.00 0.31 0.46 0.65 1.00 ▃▇▇▆▂
PCTFLOAN 770 0.89 0.48 0.28 0.00 0.26 0.54 0.70 1.00 ▆▃▆▇▃
AGE_ENTRY 500 0.93 26.01 3.99 17.43 23.18 25.78 28.51 58.90 ▇▇▁▁▁
FEMALE 1429 0.80 0.64 0.19 0.02 0.55 0.63 0.77 0.98 ▁▁▆▇▅
MARRIED 1392 0.80 0.16 0.10 0.00 0.10 0.15 0.22 0.82 ▇▆▁▁▁
DEPENDENT 921 0.87 0.49 0.25 0.03 0.29 0.46 0.68 0.99 ▃▇▆▅▅
VETERAN 4538 0.36 0.02 0.02 0.00 0.01 0.01 0.02 0.35 ▇▁▁▁▁
FIRST_GEN 1247 0.82 0.46 0.13 0.09 0.38 0.48 0.54 0.96 ▁▅▇▁▁
FAMINC 500 0.93 38482.72 23169.95 321.39 22668.03 31447.49 48098.65 174263.25 ▇▅▁▁▁
school_category 0 1.00 0.65 0.48 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
bordering_ohio 0 1.00 0.13 0.33 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁

Below are a few visualizations on Column comparisons

Number of Institutions in Ohio and in each of the States that borders Ohio

The Cost for Attendance varies by Family Income for all Institutions.

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 3554 rows containing non-finite values (stat_smooth).

The Number of Undergraduates across each of the 3 Institutional Control Types for all Institutions

## Warning: Removed 748 rows containing missing values (position_stack).

Relationship between ACT or SAT scores and Family Income across each of the states that border the state of Ohio.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 689 rows containing non-finite values (stat_smooth).

“Private schools cost more than public schools”

## Warning: Removed 3531 rows containing missing values (position_stack).

One can see that private schools have a generally higher cost then public schools, so I support the adage. (2=private, 1=public)

Average family income of students at Xavier University compared nationally?

##              INSTNM AVG(FAMINC)
## 1 Xavier University    114329.6

National Average

##   AVG(FAMINC)
## 1    38471.25

Average family income of students at Xavier University compared against Ohio schools?

##       ID            INSTNM AVG(FAMINC)
## 1 206622 Xavier University    114329.6

Ohio Average

##   AVG(FAMINC)
## 1    42379.96

The cost of attending an Ohio ‘university’ compared to universities in states that border Ohio The cost of attending an Ohio ‘university’ compared to universities nationally

##   avg(COSTT4_A) STABBR
## 1      26951.46     OH

National Average

##   avg(COSTT4_A)
## 1      26337.07

The schools that have the highest and lowest percentage of undergraduate students receiving a Pell grant

MAX Pell Percent

##    PCTPELL                                                              INSTNM
## 1        1                                            MTI Business College Inc
## 2        1                                 Mr Bela's School of Cosmetology Inc
## 3        1                                       Southern School of Beauty Inc
## 4        1                                         Victoria Beauty College Inc
## 5        1                                 Central School of Practical Nursing
## 6        1                                      Virginia School of Hair Design
## 7        1          Instituto de Educacion Tecnica Ocupacional La Reine-Manati
## 8        1                                     Colegio Mayor de Tecnologia Inc
## 9        1                                     Liceo de Arte-Dise-O y Comercio
## 10       1                                                  Nouvelle Institute
## 11       1                              Professional Technical Institution Inc
## 12       1                                          Professional Beauty School
## 13       1                                            Yukon Beauty College Inc
## 14       1                     College of Business and Technology-Miami Campus
## 15       1                                Quality Technical and Beauty College
## 16       1                                                    D'Mart Institute
## 17       1                                        Ailano School of Cosmetology
## 18       1                                         Merkaz Bnos-Business School
## 19       1                                      Trinity College of Puerto Rico
## 20       1                                      Vogue Beauty and Barber School
## 21       1                                           Franklin Career Institute
## 22       1                                            Dewey University-Bayamon
## 23       1                                                     Century College
## 24       1                  Monteclaro Escuela de Hoteleria y Artes Culinarias
## 25       1                                      MDT College of Health Sciences
## 26       1                                      International Training Careers
## 27       1                                     Dallas Barber & Stylist College
## 28       1                                       Cambridge Technical Institute
## 29       1                                     Expertise Cosmetology Institute
## 30       1                                        Industrial Technical College
## 31       1                Institute of Beauty Occupation and Technology Course
## 32       1                             Rosslyn Training Academy of Cosmetology
## 33       1                                     Palladium Technical Academy Inc
## 34       1                      College of Business and Technology-Main Campus
## 35       1                          College of Business and Technology-Hialeah
## 36       1                       College of Business and Technology-Cutler Bay
## 37       1                                           National Career Institute
## 38       1                                               Future-Tech Institute
## 39       1                                        American Technical Institute
## 40       1                                                    Global Institute
## 41       1                                              Larry's Barber College
## 42       1                                        Belle Academy of Cosmetology
## 43       1                    College of Business and Technology-Miami Gardens
## 44       1                                       New Beginnings Beauty Academy
## 45       1                                             Celebrity Barber School
## 46       1                                        Construction Training Center
## 47       1                                       DMost Beauty & Body Institute
## 48       1 Institute of Beauty Occupations and Technology Course-San Sebastian

MIN Pell Percent

##    PCTPELL                                                         INSTNM
## 1        0                                          Bais Binyomin Academy
## 2        0                              United States Coast Guard Academy
## 3        0                                       American Islamic College
## 4        0                                              Principia College
## 5        0                      The Southern Baptist Theological Seminary
## 6        0                       New Orleans Baptist Theological Seminary
## 7        0                                    United States Naval Academy
## 8        0                            MGH Institute of Health Professions
## 9        0                                          Saint John's Seminary
## 10       0                                              Hillsdale College
## 11       0                                 Assumption College for Sisters
## 12       0                 Saint Francis Medical Center School of Nursing
## 13       0                      Southeastern Baptist Theological Seminary
## 14       0                                              Athenaeum of Ohio
## 15       0                  University of Oklahoma-Health Sciences Center
## 16       0                         American College of Financial Services
## 17       0                                    Bidwell Training Center Inc
## 18       0                                                  Gratz College
## 19       0                                             Grove City College
## 20       0            Washington Hospital School of Radiologic Technology
## 21       0        Avera McKennan Hospital School of Radiologic Technology
## 22       0                                    Avera Sacred Heart Hospital
## 23       0                                         Sanford Medical Center
## 24       0                               Pentecostal Theological Seminary
## 25       0                                Ecumenical Theological Seminary
## 26       0 HSHS St. John's Hospital School of Clinical Laboratory Science
## 27       0                   Pike County Joint Vocational School District
## 28       0                          Howell Cheney THS/CT Aero Tech School
## 29       0                       Forbes Road Career and Technology Center
## 30       0                             Bristol Technical Education Center
## 31       0                                   Mitchell Cosmetology College
## 32       0                        Atlantic Institute of Oriental Medicine
## 33       0                 National Graduate School of Quality Management
## 34       0                                 Faith International University
## 35       0                                               Mid City College
## 36       0                                             W L Bonner College
## 37       0                                         Taft University System
## 38       0                     Focus-Hope Information Technologies Center
## 39       0           Stratford School of Aviation Maintenance Technicians
## 40       0                                          Mauna Loa Helicopters
## 41       0                             Relay Graduate School of Education
## 42       0                                                  Brand College
## 43       0                          American Academy of Personal Training
## 44       0                                        Lynnes Welding Training
## 45       0                                             IGlobal University
## 46       0        Montessori Education Institute of the Pacific Northwest
## 47       0                                Unilatina International College
## 48       0                                            The Training Domain
## 49       0            American Academy of Personal Training-Boston Campus
## 50       0                              Regional Center for Border Health
## 51       0                                            Martinsburg College
## 52       0                                     Abraham Lincoln University
## 53       0                                     Northeast Lakeview College
## 54       0                              U.S. Truck Driver Training School
## 55       0   South Dade Technical College-South Dade Skills Center Campus
## 56       0                      Caribbean Aviation Training Institute Inc
## 57       0                                     Southern States University
## 58       0                                    Gadsden Technical Institute
## 59       0                                           Westcliff University
## 60       0                                 Elite Institute of Cosmetology
## 61       0                                           MAK Beauty Institute
## 62       0                                            Reformed University
## 63       0                                      Mobile Technical Training
## 64       0                      California Institute of Arts & Technology
## 65       0                                                  Elyon College
## 66       0                           Digital Media Institute at InterTech
## 67       0                                       Allgood Beauty Institute
## 68       0               Cincinnati School of Barbering & Hair Design Inc
## 69       0                                             L Makeup Institute
## 70       0                                 Yeshiva Kollel Tifereth Elizer
## 71       0                                   Unlimited Cosmetology School
## 72       0                  Johnny Matthew's Hairdressing Training School
## 73       0                          University of Wisconsin-Parkside Flex
## 74       0                          University of Wisconsin Colleges Flex

Compare the average cost of attendance across the number of undergraduates, the percent of students receiving a Pell grant, the average faculty salary and the average family income in whatever way you choose. If one of these variables was classified as a ‘dependent’ variable, which would you say it is and how would you evaluate the effect of the other variables on your dependent variable?


The average cost of attendance across the number of undergraduates

## Warning: Ignoring unknown parameters: zoom
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 189 rows containing non-finite values (stat_smooth).

The average cost of attendance across the percent of students receiving a Pell grant.

## Warning: Ignoring unknown parameters: zoom
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 4 rows containing non-finite values (stat_smooth).

The average cost of attendance across the the average faculty salary

## Warning: Ignoring unknown parameters: zoom
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 517 rows containing non-finite values (stat_smooth).

The average cost of attendance across the the average family income

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 1943 rows containing non-finite values (stat_smooth).

The Average cost of the attendance seems to have a positive correlation with Faculty Salary. Meaning: A hbe a dependent variable against these variables. There’s a major drop-off in Pell grants when the university cost approaches 30k. Attendance cost also has a negative correlation with Faculty Salary and Avg Family Income.

Student populations of schools in heavily urbanized areas vs. those in very rural areas

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 748 rows containing non-finite values (stat_smooth).

One can see that Urban (Lower number around 12) are only slightly above the other extreme of rural areas.Those in the middle have much lower student population

A multiple regression would be a good way to student populations without any bias. Here one would be able to see which variables best show student population in rural vs. urban areas

Self-Directed Questions

Question 1 - How is Admission Rate related to the cost of the institutions

APPROACH: I will graph the Admission rate against the attendance cost and seek a correlation

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5282 rows containing non-finite values (stat_smooth).

The graph shows that the admission rate is about the same for colleges under 50k, but then drops off greatly as it approaches 60k. I believe this shows that the “elite” schools have higher costs than other schools.

More variables could be added to make this more accurate, such as ACT and SAT scores, high school GPA accepted, or even location and cost of living in that location. In this case, a multiple regression would be a useful tool in showing how greatly multiple variables affect admission rate.

Question 2 - WHow does marriage affect one’s requirement for a loan?

APPROACH: I intend to graph the percent of married at a chool with the percent needing a loan. I’ll also include age of entry because a younger couple could be needing more of a loan then older

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 1809 rows containing non-finite values (stat_smooth).

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
## # A tibble: 72 x 2
##    MARRIED PCTFLOAN
##      <dbl>    <dbl>
##  1    0       0.14 
##  2    0.01    0.524
##  3    0.02    0.561
##  4    0.03    0.562
##  5    0.04    0.540
##  6    0.05    0.509
##  7    0.06    0.426
##  8    0.07    0.478
##  9    0.08    0.408
## 10    0.09    0.400
## # … with 62 more rows

An ANOVA test could be used on multiple variables to see which results in the most impact on the Percent looking to get a loan