This document shows analysis of over 7,000 post-secondary education schools using data from the Department of Education which is updated yearly. The purpose is to discover how the institutions in Ohio compare to those outside of Ohio on a multitude of factors. Hopefully, you will gain insights on where Ohio excels in education and where they are able to improve after reading.
Stay tuned for my first exam to see data cleaning, visualizations, and beautiful coding/formatting. I hope you enjoy the ride.
library(tidyverse) – Takes advantage of all the capabilities of the tidyverse to work with data
library(dplyr) – Allows for easy manipulation of dataframes to work with variables
library(DT) – Allows me to create aesthetically pleasing data tables
library(data.table) – Is useful for adding new columns and quickly analyzing data ## Initial Steps
This step is essential - downloading the data from the website it is stored and into the environment for further manipulation
Investigating missing values within each column
ID INSTNM CITY STABBR ZIP CONTROL LOCALE LATITUDE
0 0 0 0 0 0 444 445
LONGITUDE HBCU MENONLY WOMENONLY ADM_RATE ACTCM25 ACTCM75 ACTCMMID
445 0 444 444 5078 5823 5823 5823
SAT_AVG UGDS COSTT4_A AVGFACSAL PCTPELL PCTFLOAN AGE_ENTRY FEMALE
5795 0 3531 2868 770 770 500 1429
MARRIED DEPENDENT VETERAN FIRST_GEN FAMINC
1392 921 4538 1247 500
Change 0% admission rates to NA, a data entry error
Adjust values within Locale column to fix data entry error
Create uniformity in missing values by changing “NULL” to NA which is R friendly.
Change data types once Nulls are removed
Adjust control values to be dummy variables rather than character strings for analysis
Some additional variables are necessary to do the analysis I am looking to do on the data. These dummy variables will make visualizations in the future easier to create and understand!
For the columns created in these six questions, a value of 1 indicates Yes, and 0 indicates No
Does this post-secondary institution have a household income higher than the Ohio median in 2018?
Is this institution a University or College?
Does the state this institution is located in border Ohio?
Is the enrollment of this institution higher than the nationwide average?
Are majority of entrants above the age of 22 (the societal norm for a graudation age)?
If the post-secondary institution is public, is its admission rate below average (when available)?
| Variable | Description |
|---|---|
| ID | A unique ID assigned to each institution |
| INSTNM | Name of institution |
| CITY | City of institution |
| STABBR | State of institution |
| ZIP | Zipcode of institution |
| CONTROL | Type of university (1 = public, 2 = private, 3 = private for-profit) |
| LOCALE | Type of location (urban, suburbs, rural) |
| LATITUDE | Latitude |
| LONGITUDE | Longitude |
| HBCU | 1 if it is a Historically Black College/University |
| MENONLY | 1 if only men are admitted |
| WOMENONLY | 1 if only women are admitted |
| ADM_RATE | Admission rate as a percentage |
| ACT | 25th, 75th, and middle percentiles of ACT scores |
| SAT_AVG | Average SAT scores |
| UGDS | Number of enrolled undergraduates |
| COSTT4_A | Cost for attendance each year |
| AVGFACSAL | Average faculty salary |
| PCTPELL | Percent of students on Pell Grant |
| PCTFLOAN | Percent of students using federal loans |
| AGE_ENTRY | Average age of students when enrolling |
| FEMALE | Percent of students that are female |
| MARRIED | Percent of students that are married |
| DEPENDENT | Percent of students that are dependent |
| VETERAN | Percent of students that are veterans |
| FIRST_GEN | Percent of students that are first-generation college students |
| FAMINC | Average family income of students enrolled |
| Dummy Var. | The last six columns were created based on questions in Variable Creation tab. |
In the 7,115 institutions listed in the data set, the maximum amount of students at one place is 77269 but the overall average is only 2426.
Interestingly the average admission rate for this for these institutions is 2426.06 and an average SAT score of 1132.
In regards to the actual students, 46% of students are the first member of their family to attend college (first generation). On average, 64% of students at an institution are females and the average age a student enters into a school is 26
Lastly, families that send students to post-seconday institutions have an average income of 38483. The richest family sending someone to school? Their income is 174263.
avg_income
1 114329.6
avg_income
1 42138.52
avg_income
1 38471.25
How does the cost to attend a university in Ohio compare nationally?
There are 74 institutions where 0% of their students have Pell grants. I chose to show this within a searchable data table because there are too many to just list, and it allows for easy use for viewers.
On the other hand, there are 48 institutions where 100% of their students use Pell Grants
1. Compare the average cost of attendance across the number of undergraduates, the percent of students receiving a Pell grant, the average faculty salary and the average family income.
2. Compare the student populations of schools in heavily urbanized areas with those in very rural areas.
The populations that attend heavily urbanized schools vs. definitively rural schools are different - and not just in number of students. Rural schools have a lower percentage of females compared to males, and their rate of students using federal loans is almost halved.
Likewise, the distribution of students who are fist generation college students is much wider for urban institutions and there are outliers on both sides of the scale. This shows that there is a clear difference in the type of students who seek out a rural school - they are more likely to have a parent that also attended college.
Lastly, students who go attend a rural institution are younger, on average, by about 2 years. People who are older may not want to go to the middle of nowhere to get their degree and instead stick to a urban area where they are able to work or be close to family and friends.
I would use a two sample t-test to evaluate if the differences between the two means in these cases is meaningful
Approach: I will use the columns which designate men and women only as 1s and 0s, the control variable, as well as the SAT/ACT columns, admission rates, and the number of undergraduates to compare samples.I want to first understand the populations of men vs. women’s schools.
Visualization:
*Statistical Test to Confirm Results: I’d use a two sample ttest here to look at the differences between the means of each “type” of school to see if it is significantly different. I could also use a regression analysis to look at the correlation in the last visualization.
As someone who is a member of the dependent college student population, I think it will be interesting to see if there is a difference in where they attend school in terms of location and cost. I will look at schools that have an above average rate of dependency
Approach: I want to use the dummy variable which determines if it is a college or university, as well as the locale, and control variables to determine characteristics of the school.
Visualization:
There are 3301 schools below average dependent rate and 2893 above the average. 921 schools do not report a rate for dependent students.
These visualizations show that private, non profit group of schools have the highest count of above average dependency rate as well as a family income above the Ohio median. This shows that students that are dependent on someone else’s income (aka do not pay for their own school), and have a higher family income will likely choose a private, non-profit school.
Private, non-profit schools have the highest average dependent rate of around65%%, compared to public schools at around 60%.
When looking specifically at the location of these schools, those located in “31” which is a town, less than ten miles from an urban cluster, have the highest rate. I attribute this to be the college town phenomenon. 21 has the lowest rate, and these are schools located in suburbs.
Statistical Test to Confirm Results: I would use a one-way ANOVA to test for differences in dependency in both the location of the school and the type of school.