#This Document is a record of College and Universities in the United States and evaluating financial aid for students based on what university the student goes too. The we will compare post secondary education option in Ohio to other schools regionally and nationally #The data comes from the Department of Education’s website http://collegescorecard.ed.gov
library(tidyverse)
#The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. This is the most important package. It the one of the best packages to analyzing data, and running other functions on r string
library(dbplyr)
#A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.This package allows you to do sql commands in r, very good for making graphs looks cleanner and other analytical functions
library(skimr)
#he package's API is tidy, functions take data frames, return data frames and can work as part of a pipeline. The returned skimr object is subsettable and offers a human readable output. Easier for cleaning up datasets , ane performing other analytical functions.
library(stringr)
#A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
library(sqldf)
#This is package is used for more sql functions so you can build graphs or run functions easier
library(DT)
#This package is good for creating data tables with all the information in your dataframe.
library(knitr)
#General Purpose for Dynamci Reports
CollegeScoreCard <- read_csv("http://asayanalytics.com/scorecard")
#This Document is a record of College and Universities in the United States and evaluating financial aid for students based on what university the student goes too
###This analysis will allow students and parents better understand how institutions of secondary education in ohio compares to other schools in regions of the country and nationally. So students and parents can see if they are insterest in going to a school in ohio or if they should look else where. ###The Metrics: Looks at schools ID, Institution, City, State Postcode, Zip code, control of Instiution, Locale of insitution, Lattitude & Longitutde of School, if the school was a historically black college and university (1yes, 0 No), if was a men only college (1yes, 0 No), a women only college (1yes, 0 No), Admission Rate, 25th percentile of ACT scores cumulative, 75th percwentile of of ACT scores, Midpotint of Act Cumulative score, Average SAT score of a studeent admitted, enrollment of undergraduate seeking student, average cost of attending the school, average faculty salary, percentage of undergrad who recieve a pell grant, percentage of undergraduates recieveing a federal student loan, avera age of entry for the university/college, Percentage of female & Married & dependent & veteran & first generation students, and average family income in real dollars based off 2015.
summary(CollegeScoreCard)
## ID INSTNM CITY STABBR
## Min. : 100654 Length:7115 Length:7115 Length:7115
## 1st Qu.: 174096 Class :character Class :character Class :character
## Median : 229027 Mode :character Mode :character Mode :character
## Mean : 1866527
## 3rd Qu.: 450610
## Max. :49005401
##
## ZIP CONTROL LOCALE LATITUDE
## Length:7115 Length:7115 Min. :-3.00 Min. :-14.32
## Class :character Class :character 1st Qu.:12.00 1st Qu.: 33.96
## Mode :character Mode :character Median :21.00 Median : 38.79
## Mean :19.79 Mean : 37.36
## 3rd Qu.:22.00 3rd Qu.: 41.33
## Max. :43.00 Max. : 71.32
## NA's :444 NA's :445
## LONGITUDE HBCU MENONLY WOMENONLY
## Min. :-170.74 Length:7115 Min. :0.0000 Min. :0.0000
## 1st Qu.: -97.34 Class :character 1st Qu.:0.0000 1st Qu.:0.0000
## Median : -86.34 Mode :character Median :0.0000 Median :0.0000
## Mean : -90.32 Mean :0.0094 Mean :0.0058
## 3rd Qu.: -78.89 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. : 171.38 Max. :1.0000 Max. :1.0000
## NA's :445 NA's :444 NA's :444
## ADM_RATE ACTCM25 ACTCM75 ACTCMMID SAT_AVG
## Min. :0.000 Min. : 1.00 Min. : 9.00 Min. : 6.00 Min. : 564
## 1st Qu.:0.550 1st Qu.:18.00 1st Qu.:23.00 1st Qu.:21.00 1st Qu.:1045
## Median :0.710 Median :20.00 Median :26.00 Median :23.00 Median :1117
## Mean :0.682 Mean :20.55 Mean :25.82 Mean :23.44 Mean :1132
## 3rd Qu.:0.840 3rd Qu.:22.00 3rd Qu.:28.00 3rd Qu.:25.00 3rd Qu.:1195
## Max. :1.000 Max. :34.00 Max. :35.00 Max. :35.00 Max. :1558
## NA's :5078 NA's :5823 NA's :5823 NA's :5823 NA's :5795
## UGDS COSTT4_A AVGFACSAL PCTPELL
## Length:7115 Min. : 0 Min. : 0 Min. :0.0000
## Class :character 1st Qu.:14000 1st Qu.: 4965 1st Qu.:0.3100
## Mode :character Median :22646 Median : 6364 Median :0.4600
## Mean :26337 Mean : 6617 Mean :0.4822
## 3rd Qu.:33942 3rd Qu.: 7946 3rd Qu.:0.6500
## Max. :93704 Max. :22924 Max. :1.0000
## NA's :3531 NA's :2868 NA's :770
## PCTFLOAN AGE_ENTRY FEMALE MARRIED
## Min. :0.0000 Min. :17.43 Min. :0.0200 Min. :0.0000
## 1st Qu.:0.2600 1st Qu.:23.18 1st Qu.:0.5500 1st Qu.:0.1000
## Median :0.5400 Median :25.78 Median :0.6300 Median :0.1500
## Mean :0.4832 Mean :26.01 Mean :0.6402 Mean :0.1638
## 3rd Qu.:0.7000 3rd Qu.:28.50 3rd Qu.:0.7700 3rd Qu.:0.2200
## Max. :1.0000 Max. :58.90 Max. :0.9800 Max. :0.8200
## NA's :770 NA's :500 NA's :1429 NA's :1392
## DEPENDENT VETERAN FIRST_GEN FAMINC
## Min. :0.0300 Min. :0.000 Min. :0.0900 Min. : 321.4
## 1st Qu.:0.2900 1st Qu.:0.010 1st Qu.:0.3800 1st Qu.: 22668.0
## Median :0.4600 Median :0.010 Median :0.4800 Median : 31447.5
## Mean :0.4889 Mean :0.015 Mean :0.4557 Mean : 38482.7
## 3rd Qu.:0.6800 3rd Qu.:0.020 3rd Qu.:0.5400 3rd Qu.: 48098.7
## Max. :0.9900 Max. :0.350 Max. :0.9600 Max. :174263.2
## NA's :921 NA's :4538 NA's :1247 NA's :500
###This summary gives a quick overview of all the metrics presented in the table so that if you didnt want to go spend a lot of time looking at the data. You can get a general idea from this summary table
#I Clean up the data to make sure it was all consistant. The columns HBCU, Locale, UGDS, and Control were all adjusted. HBCU it was necessary to remove some Nulls and change them to NA as there was and error in the transfering from CSV to R. Locale had a variable of -3, it wasn’t mentioned in the data dictionary, so I decide to change to NA to make it constist since there was no mention of it. UGDS had some nulls so I changed it NA like the HBCU to help make it Consistant. THe last variable was Control. It had some text explaining what type of school district it was instead of being a number of 1,2,or 3, so I cleaned up the data so all the variable types is numeric and not text and numeric. Missing variables were recorded as NA.
##I created the Dummy variables of Greater than or less than or equal to OH average family income columns,Indicating if a institution is a university or college or trade school, If the state boarder ohio. All of these are important factors when you compare institutions to Ohio institutions when deciding where you want to attend college and how it compares to an ohio school. THe other variables I create are Schools with an SAT AVG Greater Than Or equal 1300, Schools with an avg age of entry greater than 40 years old, College Administration Rate less than or equal to 10%.
#This is a condense version of the actual data table that shows institutions, city, states, what type of school the schoool is (1= Public, 2= Private Nonprofit, 3= Private for profit), the admission rate of students, the average family income of the students who attend the university, and average cost of attending the university. All important things that high school students might want to know in a short data table that they don’t have to look at excess of data for.summary(CollegeScoreCard$SAT_AVG)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 564 1045 1117 1132 1195 1558 5795
summary(CollegeScoreCard$FAMINC)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 321.4 22668.0 31447.5 38482.7 48098.7 174263.2 500
summary(CollegeScoreCard$AGE_ENTRY)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 17.43 23.18 25.78 26.01 28.50 58.90 500
summary(CollegeScoreCard$ADM_RATE)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.550 0.710 0.682 0.840 1.000 5078
#This represents a more quick summary of the important variables presented in the table that people might be more curious at viewing when learning about the data.
#This graph shows the number of instituitions in the stats of Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina.
###This Graphs shows the correaltion between the cost of attending school by the average family income for every institution. The graph shows a postive correlation between these two metrics.
###1 represents Public institution, 2 represents Private nonprofit institution, and 3 represents Private for Profit institution. This graph shows the number of undergrads at the different type of universities.
###This graph shows the correlation of Sat average score and Family in Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina. As you can see, in every state there is a positive correlation between families that make more money and perfroming better on the SAT on average. This is not too surprising as families with more money can higher tutors for there kids so they can perform better on this test.
###This graph shpws a box plot of the type of institution ( 1 represents Public institution, 2 represents Private nonprofit institution, and 3 represents Private for Profit institution), and the average cost of atttending the university. The box plot clearly shows that there is differences between the cost of attending a public university and a private. This generally isn’t too surprising as state university can have lower cost to attend as incentive for kids in that state to stay and go to college in the state. While most private university don’t have that same mentality, they can choose a price that all the students have to pay to attend.
| INSTNM | FAMINC |
|---|---|
| Xavier University | 114329.6 |
| mean(FAMINC, na.rm = TRUE) |
|---|
| 38482.72 |
| OH_AVG_FAMINC |
|---|
| 42379.96 |
###These chart show how Xavier University Average Family Income compares to the National and Ohio’s Average family income. Surprisingly Xavier University families make a strong amount more income that the national average and the state of Ohio. Which kind of makes the sterotype of private institutions being filled with “richer” families true. As a student of Xavier University, this number was quite surprising since when you talk and to most of the kids at Xavier University, you would not get that perspective from them.
###This graph show the cost of attending schools in Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina. This is very important especially if you are from Ohio because you don’t want to pay a ton for school if you don’t have to. By doing this analysis of the graph you can get a better perspective of what states you might want to apply too to go to school in.
###This graph shows the National average cost of attendance and Ohio’s cost of attendance. This graph can be a key indicator for staying in state or looking nationally. This might not be the best indictator directly on this decision since state schools generally offer lower prices for in state universities. So to truly make this decision, more analysis and research is required but this graph gives us a quick response to our question.
| INSTNM | PCTPELL | UGDS |
|---|---|---|
| MTI Business College Inc | 1 | 75 |
| Mr Bela’s School of Cosmetology Inc | 1 | 34 |
| Southern School of Beauty Inc | 1 | 24 |
| Victoria Beauty College Inc | 1 | 113 |
| Central School of Practical Nursing | 1 | 18 |
| Virginia School of Hair Design | 1 | 21 |
| Instituto de Educacion Tecnica Ocupacional La Reine-Manati | 1 | 176 |
| Colegio Mayor de Tecnologia Inc | 1 | 62 |
| Liceo de Arte-Dise-O y Comercio | 1 | 197 |
| Nouvelle Institute | 1 | 86 |
| INSTNM | PCTPELL | UGDS |
|---|---|---|
| Bais Binyomin Academy | 0 | 25 |
| United States Coast Guard Academy | 0 | 1044 |
| American Islamic College | 0 | 16 |
| Principia College | 0 | 447 |
| The Southern Baptist Theological Seminary | 0 | 873 |
| New Orleans Baptist Theological Seminary | 0 | 0 |
| United States Naval Academy | 0 | 4495 |
| MGH Institute of Health Professions | 0 | 195 |
| Saint John’s Seminary | 0 | 23 |
| Hillsdale College | 0 | 1511 |
###This shows the list of the universities with the Highest and lowest percentages of getting a Pell Grant fund from the government.
##I would say the dependant variable is Cost for attendence. As the cost of attendence allow the number of undergraduates that can afford to go to college, it needs to be a attainable cost. The number of pell grants are giving out to the students that get into college and cant afford to pay it since there family makes generally $10,000-$20,000 dollars annaully. Based on the graph the cost of the attendence has a negative correlation to pell grants percentage getting receive. Which could be interpreted as cost of college goes up, the less pell grants can ve recieved by the students. Average faculty salary and family income both has positive correlations to the cost of the attenence. So families with more money are willing to pay for the more expensive schools than families with not as much money. For average faculty salary, it looks as the more the professors start to make on average, the cost of the attending rises. WHich isn’t surprising as schools have to pay professors somehow, and tution money can be a factor in this. For these reason, I think the cost of attendence is dependent on on the other variables.
| LOCALE | count | SAT_AVG | FAMINC | PCTPELL | ADM_RATE | COSTT4_A | PCTFLOAN |
|---|---|---|---|---|---|---|---|
| 11 | 1570 | 1149.870 | 37103.87 | 0.4964127 | 0.6743053 | 29895.12 | 0.5064058 |
| 43 | 62 | 1064.462 | 32992.44 | 0.4519298 | 0.6316667 | 19744.89 | 0.2707018 |
###I think the best form to take is a table of a side by side comparison. As you can see the differences very easily between the two areas, As you Look at some similar metrics from both of these areas. Some statical inference you can make is the cost for attending is lower for heavy rural areas. But you don’t know what type of school they are attending which could make a big difference in the cost. Another big difference is the average number of federal student loans the students recieve. The next steps would to run regression on comparing the factors on why the metrics look different and doing some research about these areas as well to help explain the regression findings.
##This will show the more competitive schools in the country as a 1300 is high performing SAT Score. Is specially if some scored this well on there sat, it can give them some option of what college they might want to apply to in there area or nationally.
#By using a data table we are able to analysis all the colleges that meet the criteria that SAT score average is greater than 1300. This can lead to futher analysis on other factors that makes you more eligible to get accepted into the school. Therefore if one of your dream schools meets one of these requirements, you can do a self analysis of your chances of getting into the school. Or simply calculate how much money or financial aid it would take to go to this school.
#For some people, this is a big issue when choosing a school. As some men have more troubles than others of finding someone to date. But when the your school is greater than 65% female, its could statistically help yours odds at finding someone. Plus, with a score on the sat average of 1200 or greater your surrounded by smarter people most likely. SO you can get a good education and maybe a date!
#Well if you these are the types of schools you are looking at then you will receive a good eduacation. You run a futher analysis of a statistical analysis of your chances of getting a date based on the total student body and percentage of female in the school. You run a analysis of how far the school is from your current location from your house. Plus if you will be flying or driving to this school, and how will you transport your supplies to your dorm room. ALl of these analysis and more can be taken if you pick one of these schools from the data table above.