Data for The Ohio Department of Education Comparing in State and National Higher Education Institutions

This document was created for BAIS 496 Exam 1 to explore data from the “collegescorecard” data set and create a deliverable based on our findings. In the segments to follow I will go over the process I took to clean the data given and who I choose to answer and create each of my visuals for each question.

Packages Used

Packages Description
RMySQL Allows for interface with SQL
readxl The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf)
tidyverse The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy
dplyr provides a consistent set of verbs that help you solve the most common data manipulation challenges
DT R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides filtering, pagination, sorting, and many other features in the tables
naniar provides data structures and functions that facilitate the plotting of missing values and examination of imputations

Data Cleaning Process

Everyone does data cleaning differenly and I am no exception. Where I started was with data uniformity. Some of the data read in with SQL syntax i.e “Null”. I converted these syntaxual errors to match R syntax and the majority of the data by replacing “NUll” values with “NA”. From there I unified the Control variable changing the numeric vaule into their character value, replaceing what would return 7 possible outcomes to what should have been 3. I accomplished this by using gsub. Next I did the same for the Locale variable. I found this to be more pleasing to the eye in my visualizations below, though not without concequence that I will explain with said visual. The final piece of cleaning I did early on was converting the variable UGDS(undergraduate degree seeking) from a character value to a numeric vaule

Variable and Final Data Table

Packages Description
CITY City
STABBR State postcode
ZIP ZIP code
CONTROL Control of institution
1 Public
2 Private nonprofit
3 Private for-profit
LOCALE Location/locale of Insitution
11 City: Large (population of 250,000 or more) Shown/used on consumer website.
12 City: Midsize (population of at least 100,000 but less than 250,000)
13 City: Small (population less than 100,000
21 Suburb: Large (outside principal city, in urbanized area with population of 250,000 or more)
22 Suburb: Midsize (outside principal city, in urbanized area with population of at least 100,000 but less than 250,000)
23 Suburb: Small (outside principal city, in urbanized area with population less than 100,000)
31 Town: Fringe (in urban cluster up to 10 miles from an urbanized area)
32 Town: Distant (in urban cluster more than 10 miles and up to 35 miles from an urbanized area)
31 Town: Remote (in urban cluster more than 35 miles from an urbanized area)
41 Rural: Fringe (rural territory up to 5 miles from an urbanized area or up to 2.5 miles from an urban cluster)
42 Rural: Distant (rural territory more than 5 miles but up to 25 miles from an urbanized area or more than 2.5 and up to 10 miles from an urban cluster)
43 Rural: Remote (rural territory more than 25 miles from an urbanized area and more than 10 miles from an urban cluster)
LATITUDE Latitude
LONGITUDE Longitude
HBCU Historically Black College and University
MENONLY Flag for men-only college
WOMENONLY Flag for women-only college
ADM_RATE Admission rate
ACTCM25 25th percentile of the ACT cumulative score
ACTCM75 75th percentile of the ACT cumulative score
ACTCMMID Midpoint of the ACT cumulative score
SAT_AVG Average SAT equivalent score of students admitted
UGDS Enrollment of undergraduate certificate/degree-seeking students
COSTT4_A Average cost of attendance (academic year institutions)
AVGFACSAL Average faculty salary
PCTPELL Percentage of undergraduates who receive a Pell Grant
PCTFLOAN Percent of all undergraduate students receiving a federal student loan
AGE_ENTRY Average age of entry
FEMALE Share of female students
MARRIED Share of married students
DEPENDENT Share of dependent students
VETERAN Share of veteran students
FIRST_GEN Share of first-generation students
FAMINC Average family income in real 2015 dollars
(datatable(df))

Analysis of Data by Question 4.1 - 6.3

The following portion of this page will be the visualizations I used to answer each question given in our exam with my rational behind using the plots I choose if need be.

Question 4.1

Show the number of institutions in Ohio and in each of the states that borders Ohio

Question 4.2

Illustrate how the cost for attendance varies by family income for all institutions

Question 4.3

Compare the number of undergraduates across each of the 3 institutional control types for all institutions

Question 4.4

Show a relationship between ACT or SAT scores and family income across each of the states that border the state of Ohio.

ACT

SAT

Question 5.1

5.1 Do you find support for the old adage: “Private schools cost more than public schools” Explain.

Based on the above boxplots, outliers aside, it would appear that the old adage does hold some weight as the mean for both control types of private schools far exceed the mean for public schools

Question 5.2

5.2 How does the average family income of students at Xavier University compare nationally? Within Ohio?

## [1] 193

The avgerage family income at Xavier far exceeds the national average.

Question 5.3

5.3 How does the cost of attending an Ohio ‘university’ compare to universities in states that border Ohio? What about universities nationally, not considering state?

The average cost of attendance throughout Ohio and the States that border it are fairly similar with Iniana having the highest average cost of attendance and West Virgina the lowest.

Question 5.4

5.4 What schools have the highest and lowest percentage of undergraduate students receiving a Pell grant?

Above are datatables for school that have the highest possible precentage (100%) of students receiving Pell Grants and the lowest possible (0%) respectivley

Question 6.1

6.1 Compare the average cost of attendance across the number of undergraduates, the percent of students receiving a Pell grant, the average faculty salary and the average family income in whatever way you choose. If one of these variables was classified as a ‘dependent’ variable, which would you say it is and how would you evaluate the effect of the other variables on your dependent variable?

Question 6.2

Compare the student populations of schools in heavily urbanized areas with those in very rural areas. Keep in mind, the type of school varies considerably by urban and rural areas. Do your best to control for this bias with the variables you have available to focus on differences within the populations of urban and rural schools and NOT the differences between the type of school.

Additional Self-Directed Analysis

For my additional reseach I wanted to see how the average ACT score and SAT score compared to the Average Cost of Attendance. In other words did those who scored higher on their standardized tests choose to go to schools that had a high cost of attendance.

While this reseach doesn’t provide a reason for why it can lead to some speculation and further research.

Conclusion

I hope my research was able to provide answers to the above questions and I thank anyone who made it this far for looking at my work!!!