Section 1: Introduction

The purpose of the document is to illustrate how Ohio colleges and universities compare to colleges and universities in other states. The data used in this document is only a subset of all college scorecard data available for the 2018-2019 reporting period and is from the US Department of Education, which is one of the many federal organizations responsible for collecting and reporting data on institutions of higher education. Part of the data collected and reported by the department of education includes all institutions of post-secondary education receiving some form of financial aid, and various statistics about each institution.

The analysis in this document is, in hope, to help an individual to better understand how institutions of post-secondary education in Ohio compare to other schools regionally and nationally. For example, how the cost of attendance varies by family income for all institutions, how Ohio compares to neighboring states with respect to the percent of institutions that are HBCU’s, men only or women only, how the average family income of Xavier University students compares nationally and within Ohio, and how Xavier University of Ohio compares with the other ‘Xavier’ schools in the country.

Section 2: Required Packages

The packages required for this markdown are:

Package Summary
tidyverse The tidyverse collection of packages
rmdformats RMarkdown themes
knitr RMarkdown documents
DT Javascript enabled data tables
stargazer Fancy regression tables
corrplot Simple correlation plots
PerformanceAnalytics Detailed plots and tables for analytics
pander Pretty summary tables output
lubridate Working with date formats
scales Providing methods for breaks and labels for axes

Section 3: Data Preparation (Cleaning & Wrangling)

Source Data Explained

Data was imported from http://asayanalytics.com/scorecard_csv. After cleaning & wrangling, there are a total of 7,115 entries and 34 columns in the scorecard data frame, where 5 columns are added columns for the ease of further analysis. There are 3 parts to data cleaning & wrangling: Data Conformity, Blatant Data Errors, and Dummy Variables.

Data conformity: - Missing values are recorded as “0” since data type of the fields is numeric. - Character values recorded as “NULL” and “Not Available” are also replaced with “0” and then converted to numeric values.

Blatant Data Errors: - Anomalies and errors are addressed. For example, percenteages should never exceed 100% or be negative, and any number except longitude should not be negative.d Any rows with such values would be removed. However, there are not such values in this subset of data, so no rows were removed. - Control of institution where there are multiple variations on 3 possible values (1 - Public, 2 - Private nonprofit, and 3 - Private for-profit). Entries with character values are converted into appropriate number of control, and the column is converted into an appropriate data type, numeric. - Other fields that were character type are also converted into numeric as they should be, including FAMINC, UGDS, CONTROL, and HBCU. Data type of ZIP entity is also converted from numeric to character data type.

Dummy Variables: - A dummy variable for average family income of an institution whose income is greater than the median of household income for Ohio in 2019, which is $54,021. Values are recorded as TRUE if average family income is higher than the median of Ohio household income in 2019, otherwise recorded as FALSE. - Dummy variables indicating whether the institution is a ‘University,’ ‘College,’ or ‘Other’ based on the words used in the institution name. Values are recorded as TRUE if name of institution matches any of the above condition. - A dummy variable for institutions in states bordering Ohio. Values are recorded as TRUE if institutions are in states bordering Ohio.

Below is a summary table explaining each variable in the data set.

Variable Name Explanation
ID Unique ID for institution
INSTNM Institution Name
CITY City of institution
STABBR State postcode
ZIP ZIP code
CONTROL Control of institution (1 - Public, 2 - Private nonprofit, 3 - Private for-profit)
LOCALE Locale of institution
LATITUDE Latitude
LONGITUDE Longitude
HBCU Flag for Historically Black College and University
MENONLY Flag for men-only institution
WOMENONLY Flag for women-only institution
ADM_RATE Admission rate
ACTCM25 25th percentile of the ACT cumulative score
ACTCM75 75th percentile of the ACT cumulative score
ACTCMMID Midpoint of the ACT cumulative score
SAT_AVG Average SAT equivalent score of students admitted
UGDS Enrollment of undergraduate certificate/degree-seeking students
COSTT4_A Average cost of attendance (academic year institutions)
AVGFACSAL Average faculty salary
PCTPELL Percentage of undergraduates who receive a Pell Grant
PCTFLOAN Percentage of all undergraduates students receiving a federal student loan
AGE_ENTRY Average age of entry
FEMALE Share of female students
MARRIED Share of marrie students
DEPENDENT Share of dependent students
VETERAN Share of veteran students
FIRST_GEN Share of first-generation students
FAMINC Average family income in real 2015 dollars
Above MED TRUE if FAMINC is above the median of Ohio household income of $54,021
University TRUE if an institution name contains the word ‘University’
College TRUE if an institution name contains the word ‘College’
Other TRUE if an institution name contains neither the word ‘University’ nor ‘College’
BORDEROH TRUE is an institution is in stated bordering Ohio.

Data Table

Below is a sample data table of the Institutions Scorecard.

Summary Information

Summary 1

Below are summary tables summarizing the overall average cost of attendance of universities and colleges. It shows that University has higher average cost of attendance than College

“College” Average Cost of Attendance
College 23065
“University” Average Cost of Attendance
University 30373

Summary 2

The tables below shows institutions with highest midpoint of the ACT cumulative score (ACTCMMID) and another with highest average SAT equivalent score of students admitted (SAT_AVG). I found the result interesting because I anticipated that the scores would be higher in top institution like MIT and Harvard. However, the result shows that the institution with the both highest of ACTCMMID and SAT_AVG is California Institute of Technology.

Institution Name Midpoint of ACT cum score
California Institute of Technology 35
Carnegie Mellon University 34
Franklin W Olin College of Engineering 34
Harvard University 34
Harvey Mudd College 34
Institution Name SAT Score
California Institute of Technology 1558
Rice University 1535
Massachusetts Institute of Technology 1532
Harvey Mudd College 1527
Johns Hopkins University 1524

Summary 3

This last table shows institutions in Ohio whose average family income (as recorded in 2015 dollars) is higher than the median of Ohio in 2019. The result shows that Xavier University is ranked number 4 on the list, with average family income in 2015 of $114,330, which is more than twice as high as the median of Ohio household income in 2019.

Institution Name Average Family Income
Kenyon College 137788
University of Dayton 134621
Oberlin College 130188
Xavier University 114330
John Carroll University 111360
Case Western Reserve University 109858
The College of Wooster 107065
Cleveland Institute of Music 104656
Denison University 103917
Franciscan University of Steubenville 103566

Section 5: Directed Analysis

5.1

Question: Do you find support for the old adage: “Private schools cost more than public schools”?

From the bar chart below, I do find support for the old adage: “Private schools cost more than public schools.” The chart shows the average cost of attendance of all institutions in each category. It shows that the average cost of attendance of private nonprofit schools is $26,264, which is more than twice higher than the average cost of attendance of public institutions.

5.2

Question: How does the average family income of Xavier University students compare nationally and within Ohio?

From the tables below, comparing with other institutions within Ohio and nationally, the average family income of Xavier University students, $114,330, is much higher, whereas the average family income within Ohio and nationally are only 39,848 and 35,778 dollars, respectively.

Institution Name Average Family Income
Xavier University $114,330
State Average Family Income
OH $39,848
“Nationally” Average Family Income
Nationally $35,778

5.3

Question: How much variance exists in the cost of attending an Ohio college or university? How does this variance compare to colleges and universities in the states that border Ohio?

Variance in the cost of attending an Ohio college or university is $191,608,900, which is similar to the variance in the cost of attending colleges and universities in Indiana and Michigan, whereas the variances in Kentucky and West Virginia are lower (with WV being the lowest), and the variance is highest in Philadelphia, 264,049,015 dollars.

5.4

Question: What types of schools have the highest and lowest percentage of students receiving Pell grants?

As shown in the bar chart below, the type of school that has the highest percentage of students receiving Pell grants is private for-profit institutions with 60.99%, whereas the type with lowest percentage is private nonprofit institutions, with 41.26 percents of students receiving Pell grants.

5.5

Question: Do schools with wealthier student families and high dependency rates have a higher or lower average age of entry?

From the points graph below, the higher the average student family income and higher dependency rates, the lower the average of entry.

5.6

Question: Does there appear to be a relationship between the tuition charged by the institution and the average faculty salary?

Yes, there appears to be a relationship between the tuition charged by the institution and the average faculty salary. From the jitter plot below, the higher the cost of attendance, the higher the average faculty salary.

Section 6: Self-Directed Analysis

6.1

Instruction: Compare the relationship between the average cost of attendance, the percent of students receiving a Pell grant, the percent of students receiving a federal student loan and the average family income in whatever way you choose. If one of these variables was thought to cause the others, which would you say it is?

Visualization 4.2 already showed that the higher the average cost of attendance, the average family income of students is also higher. From the graphs below, there are a few relationships shown. First, there is a correlation between percentage of students receiving Pell grants and receiving Federal loans and family income - there are more students receiving Pell grants and Federal loans when the average family income is lower.

Note that an outlier is taken out from the following graphs for clearer visualization

Comparing Pell Grant & Federal Loan with Family Income and Cost of Attendance

6.2

Instructions: How does Xavier University of Ohio compare with the other ‘Xavier’ schools in the country? Identify at least 2 similarities and 2 differences between our Xavier University and the other Xavier schools.

Similarities:

  1. All ‘Xavier’ institutions are private schools.

  2. The admission rates of all ‘Xavier’ institutions are +/-10 from the average admission rate within its control type (private nonprofit).

  3. All ‘Xavier’ institutions having Private nonprofit control type are all Universities.

Differences:

  1. Xavier University of Ohio has the highest average family income out of all ‘Xavier’ institutions.

  2. Xavier University of Ohio also has the highest average cost of attendance.

  3. Xavier University of Ohio has the lowest percentage of female and first generation students.

  4. Xavier University of Ohio has the lowest percetange of students receiving Pell Grants and Federal Student Loans.

Note that Xavier College School of Nursing of California is not included as its cost of attendance is missing in the data

Family Income and Control Type

Cost of Attendance and Family Income

Institution Name Family Income Cost of Attendance
Xavier University $114,330 $50,880
Xavier University of Louisiana $49,214 $33,593
Saint Xavier University $49,968 $38,775

Cost of Attendance and Rate of Admission

Institution Name Admission Rate Cost of Attendance
Xavier University of Louisiana 64% $33,593
Saint Xavier University 75% $38,775
Xavier University 74% $50,880
Control Type Admission Rate
Private nonprofit 66.06%

Female and First Generation Rates

Institution Name Female First Generation
Saint Xavier University 66% 49%
Xavier College School of Nursing 78% 44%
Xavier University 57% 16%
Xavier University of Louisiana 75% 27%

Pell Grant and Federal Student Loan Rates

Institution Name Pell Grant Rate Federal Loan Rate
Saint Xavier University 54% 77%
Xavier College School of Nursing 82% 100%
Xavier University 16% 55%
Xavier University of Louisiana 54% 67%

Control Type and Institution Type

Institution Name University Control Type
Saint Xavier University TRUE 2
Xavier University TRUE 2
Xavier University of Louisiana TRUE 2

6.3 Additional Self-Directed Analysis Using a Framework

6.3.1 Self-Directed Analysis #1

Question: What is the relationship between first generation rates, dependency rates, average age of entry and the average family income? Does the relationship make sense?

I find this question interesting because we already know that the higher the average family income, the higher the dependency rates. This question intends to see whether this correlation relates to the percentage of first generations.

Proposed Solution I intend to answer this question by doing two separate analysis. First, I will filter out rows that have zero percents of first generation rates and of dependency rates as well as rows that have $0 cost of attendance for more accurate results. Next, I use group_by() function to include average family income (for x-axis) and dependency rates (for factoring). Then, I will summarize the data by using first generation rates (y-axis) to see how it depends on the average family income. On the second analysis, the only difference is with the age of entry instead of dependency rates. Here, I will filter out AGE_ENTRY that are zero instead of the dependency rates. And the rest of wrangling is the same, only substitute DEPENDENT with AGE_ENTRY. Finally, I will choose to use point graph since all FAMINC, DEPENDENT, and AGE_ENTRY are all continuous variables.

Analysis From the result below, it shows that the higher the average family income, the higher the dependency rates, the lower the first generation rates, and the lower the average age of entry. This makes sense as first generations are usually older and are more independent.

6.3.2 Self-Directed Analysis #2

Question: How are SAT scores related to the percentage of student receiving Pell grants?

As we already know from Visualization 4.4 that the higher the average family income, the higher the SAT score seems to be, and from Analysis 6.1 that the higher the average family income, the lower the percentage of students receiving Pell grants. This question is intended to find out whether there is any correlation between the average SAT scores and the percentage of students receiving Pell grant.

Proposed Solution How I intend to answer this question is to first filter out rows with missing average SAT scores and percentage of Pell grants for accurate analysis, as well as to filter out an outlier of an SAT score that scored less than 750 points. Then, I will group data by the average SAT score and summarize the percentage of Pell grants received to see how Pell grant rates depend on the SAT score. Finally, I choose to use point graph as both SAT_AV and PCTPELL are continuous variables.

Analysis As shown in the graph below, it shows that most students who received Pell grants are those who scored lower in on the SAT exam. As the smooth graph shows, there is a downward slope indicating that the higher the average SAT scores, it is less likely for students to receive Pell grants.

Fun Stuff

This section exists just to explore fun data.

Latitude and Longitude Graph

The graph below plots out latitude and longitude of each institution in the data set. Its aesthetics shows the map of the United States. From the graph, it looks like most of the institutions from the data set are mostly from the Eastern side of the U.S.

Note: longitudes and latitudes of some institutions seem to be recorded incorrectly making the graphs look odd, so I had to filter out those institutions to get the map below.