The purpose of this document

#This Document is a record of College and Universities in the United States and evaluating financial aid for students based on what university the student goes too. The we will compare post secondary education option in Ohio to other schools regionally and nationally #The data comes from the Department of Education’s website http://collegescorecard.ed.gov

Packages you will need and the Dataset

library(tidyverse)
#The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. This is the most important package. It the one of the best packages to analyzing data, and running other functions on r string
library(dbplyr)
#A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.This package allows you to do sql commands in r, very good for making graphs looks cleanner and other analytical functions
library(skimr)
#he package's API is tidy, functions take data frames, return data frames and can work as part of a pipeline. The returned skimr object is subsettable and offers a human readable output. Easier for cleaning up datasets , ane performing other analytical  functions. 
library(stringr)
#A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
library(sqldf)
#This is package is used for more sql functions so you can build graphs or run functions easier 
library(DT)
#This package is good for creating data tables with all the information in your dataframe.
library(knitr)
#General Purpose for Dynamci Reports
CollegeScoreCard <- read_csv("http://asayanalytics.com/scorecard")
#This Document is a record of College and Universities in the United States and evaluating financial aid for students based on what university the student goes too

Explantation of Data Continued

###This analysis will allow students and parents better understand how institutions of secondary education in ohio compares to other schools in regions of the country and nationally. So students and parents can see if they are insterest in going to a school in ohio or if they should look else where. ###The Metrics: Looks at schools ID, Institution, City, State Postcode, Zip code, control of Instiution, Locale of insitution, Lattitude & Longitutde of School, if the school was a historically black college and university (1yes, 0 No), if was a men only college (1yes, 0 No), a women only college (1yes, 0 No), Admission Rate, 25th percentile of ACT scores cumulative, 75th percwentile of of ACT scores, Midpotint of Act Cumulative score, Average SAT score of a studeent admitted, enrollment of undergraduate seeking student, average cost of attending the school, average faculty salary, percentage of undergrad who recieve a pell grant, percentage of undergraduates recieveing a federal student loan, avera age of entry for the university/college, Percentage of female & Married & dependent & veteran & first generation students, and average family income in real dollars based off 2015.

summary(CollegeScoreCard)
##        ID              INSTNM              CITY              STABBR         
##  Min.   :  100654   Length:7115        Length:7115        Length:7115       
##  1st Qu.:  174096   Class :character   Class :character   Class :character  
##  Median :  229027   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 1866527                                                           
##  3rd Qu.:  450610                                                           
##  Max.   :49005401                                                           
##                                                                             
##      ZIP              CONTROL              LOCALE         LATITUDE     
##  Length:7115        Length:7115        Min.   :-3.00   Min.   :-14.32  
##  Class :character   Class :character   1st Qu.:12.00   1st Qu.: 33.96  
##  Mode  :character   Mode  :character   Median :21.00   Median : 38.79  
##                                        Mean   :19.79   Mean   : 37.36  
##                                        3rd Qu.:22.00   3rd Qu.: 41.33  
##                                        Max.   :43.00   Max.   : 71.32  
##                                        NA's   :444     NA's   :445     
##    LONGITUDE           HBCU              MENONLY         WOMENONLY     
##  Min.   :-170.74   Length:7115        Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: -97.34   Class :character   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : -86.34   Mode  :character   Median :0.0000   Median :0.0000  
##  Mean   : -90.32                      Mean   :0.0094   Mean   :0.0058  
##  3rd Qu.: -78.89                      3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   : 171.38                      Max.   :1.0000   Max.   :1.0000  
##  NA's   :445                          NA's   :444      NA's   :444     
##     ADM_RATE        ACTCM25         ACTCM75         ACTCMMID        SAT_AVG    
##  Min.   :0.000   Min.   : 1.00   Min.   : 9.00   Min.   : 6.00   Min.   : 564  
##  1st Qu.:0.550   1st Qu.:18.00   1st Qu.:23.00   1st Qu.:21.00   1st Qu.:1045  
##  Median :0.710   Median :20.00   Median :26.00   Median :23.00   Median :1117  
##  Mean   :0.682   Mean   :20.55   Mean   :25.82   Mean   :23.44   Mean   :1132  
##  3rd Qu.:0.840   3rd Qu.:22.00   3rd Qu.:28.00   3rd Qu.:25.00   3rd Qu.:1195  
##  Max.   :1.000   Max.   :34.00   Max.   :35.00   Max.   :35.00   Max.   :1558  
##  NA's   :5078    NA's   :5823    NA's   :5823    NA's   :5823    NA's   :5795  
##      UGDS              COSTT4_A       AVGFACSAL        PCTPELL      
##  Length:7115        Min.   :    0   Min.   :    0   Min.   :0.0000  
##  Class :character   1st Qu.:14000   1st Qu.: 4965   1st Qu.:0.3100  
##  Mode  :character   Median :22646   Median : 6364   Median :0.4600  
##                     Mean   :26337   Mean   : 6617   Mean   :0.4822  
##                     3rd Qu.:33942   3rd Qu.: 7946   3rd Qu.:0.6500  
##                     Max.   :93704   Max.   :22924   Max.   :1.0000  
##                     NA's   :3531    NA's   :2868    NA's   :770     
##     PCTFLOAN        AGE_ENTRY         FEMALE          MARRIED      
##  Min.   :0.0000   Min.   :17.43   Min.   :0.0200   Min.   :0.0000  
##  1st Qu.:0.2600   1st Qu.:23.18   1st Qu.:0.5500   1st Qu.:0.1000  
##  Median :0.5400   Median :25.78   Median :0.6300   Median :0.1500  
##  Mean   :0.4832   Mean   :26.01   Mean   :0.6402   Mean   :0.1638  
##  3rd Qu.:0.7000   3rd Qu.:28.50   3rd Qu.:0.7700   3rd Qu.:0.2200  
##  Max.   :1.0000   Max.   :58.90   Max.   :0.9800   Max.   :0.8200  
##  NA's   :770      NA's   :500     NA's   :1429     NA's   :1392    
##    DEPENDENT         VETERAN        FIRST_GEN          FAMINC        
##  Min.   :0.0300   Min.   :0.000   Min.   :0.0900   Min.   :   321.4  
##  1st Qu.:0.2900   1st Qu.:0.010   1st Qu.:0.3800   1st Qu.: 22668.0  
##  Median :0.4600   Median :0.010   Median :0.4800   Median : 31447.5  
##  Mean   :0.4889   Mean   :0.015   Mean   :0.4557   Mean   : 38482.7  
##  3rd Qu.:0.6800   3rd Qu.:0.020   3rd Qu.:0.5400   3rd Qu.: 48098.7  
##  Max.   :0.9900   Max.   :0.350   Max.   :0.9600   Max.   :174263.2  
##  NA's   :921      NA's   :4538    NA's   :1247     NA's   :500

###This summary gives a quick overview of all the metrics presented in the table so that if you didnt want to go spend a lot of time looking at the data. You can get a general idea from this summary table

Wrangling data in dataset to clean up information

#I Clean up the data to make sure it was all consistant. The columns HBCU, Locale, UGDS, and Control were all adjusted. HBCU it was necessary to remove some Nulls and change them to NA as there was and error in the transfering from CSV to R. Locale had a variable of -3, it wasn’t mentioned in the data dictionary, so I decide to change to NA to make it constist since there was no mention of it. UGDS had some nulls so I changed it NA like the HBCU to help make it Consistant. THe last variable was Control. It had some text explaining what type of school district it was instead of being a number of 1,2,or 3, so I cleaned up the data so all the variable types is numeric and not text and numeric. Missing variables were recorded as NA.

Dummy Variables and other variables

##I created the Dummy variables of Greater than or less than or equal to OH average family income columns,Indicating if a institution is a university or college or trade school, If the state boarder ohio. All of these are important factors when you compare institutions to Ohio institutions when deciding where you want to attend college and how it compares to an ohio school. THe other variables I create are Schools with an SAT AVG Greater Than Or equal 1300, Schools with an avg age of entry greater than 40 years old, College Administration Rate less than or equal to 10%.

#This is a condense version of the actual data table that shows institutions, city, states, what type of school the schoool is (1= Public, 2= Private Nonprofit, 3= Private for profit), the admission rate of students, the average family income of the students who attend the university, and average cost of attending the university. All important things that high school students might want to know in a short data table that they don’t have to look at excess of data for.

Summary of Variables

summary(CollegeScoreCard$SAT_AVG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     564    1045    1117    1132    1195    1558    5795
summary(CollegeScoreCard$FAMINC)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    321.4  22668.0  31447.5  38482.7  48098.7 174263.2      500
summary(CollegeScoreCard$AGE_ENTRY)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   17.43   23.18   25.78   26.01   28.50   58.90     500
summary(CollegeScoreCard$ADM_RATE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.550   0.710   0.682   0.840   1.000    5078

#This represents a more quick summary of the important variables presented in the table that people might be more curious at viewing when learning about the data.

Section 4 Visulations

4.1 Show the number of institutions in Ohio and in each of the states that borders Ohio.

#This graph shows the number of instituitions in the stats of Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina.

4.2 Illustrate how the cost for attendance varies by family income for all institutions.

###This Graphs shows the correaltion between the cost of attending school by the average family income for every institution. The graph shows a postive correlation between these two metrics.

4.3 Compare the number of undergraduates across each of the 3 institutional control types for all institutions

###1 represents Public institution, 2 represents Private nonprofit institution, and 3 represents Private for Profit institution. This graph shows the number of undergrads at the different type of universities.

4.4 Show a relationship between ACT or SAT scores and family income across each of the states that border the state of Ohio.

###This graph shows the correlation of Sat average score and Family in Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina. As you can see, in every state there is a positive correlation between families that make more money and perfroming better on the SAT on average. This is not too surprising as families with more money can higher tutors for there kids so they can perform better on this test.

Section 5

5.1 Public vs Private Post Secondary Education

###This graph shpws a box plot of the type of institution ( 1 represents Public institution, 2 represents Private nonprofit institution, and 3 represents Private for Profit institution), and the average cost of atttending the university. The box plot clearly shows that there is differences between the cost of attending a public university and a private. This generally isn’t too surprising as state university can have lower cost to attend as incentive for kids in that state to stay and go to college in the state. While most private university don’t have that same mentality, they can choose a price that all the students have to pay to attend.

5.2 How does the average family income of students at Xavier University compare nationally? Within Ohio?

Xavier University FAMINC
INSTNM FAMINC
Xavier University 114329.6
National AVG FAMINC
mean(FAMINC, na.rm = TRUE)
38482.72
Ohio Average FAMINC
OH_AVG_FAMINC
42379.96

###These chart show how Xavier University Average Family Income compares to the National and Ohio’s Average family income. Surprisingly Xavier University families make a strong amount more income that the national average and the state of Ohio. Which kind of makes the sterotype of private institutions being filled with “richer” families true. As a student of Xavier University, this number was quite surprising since when you talk and to most of the kids at Xavier University, you would not get that perspective from them.

5.3 How does the cost of attending an Ohio ‘university’ compare to universities in states that border Ohio?What about universities nationally, not considering state?

###This graph show the cost of attending schools in Ohio and the surrounding boarding states of Indiana, Kentucky, Michigan, Pennsylvania, and West Virgina. This is very important especially if you are from Ohio because you don’t want to pay a ton for school if you don’t have to. By doing this analysis of the graph you can get a better perspective of what states you might want to apply too to go to school in.

###This graph shows the National average cost of attendance and Ohio’s cost of attendance. This graph can be a key indicator for staying in state or looking nationally. This might not be the best indictator directly on this decision since state schools generally offer lower prices for in state universities. So to truly make this decision, more analysis and research is required but this graph gives us a quick response to our question.

5.4 What schools have the highest and lowest percentage of undergraduate students receiving a Pell grant?

Top 10 Highest Receving Pell Grant
INSTNM PCTPELL UGDS
MTI Business College Inc 1 75
Mr Bela’s School of Cosmetology Inc 1 34
Southern School of Beauty Inc 1 24
Victoria Beauty College Inc 1 113
Central School of Practical Nursing 1 18
Virginia School of Hair Design 1 21
Instituto de Educacion Tecnica Ocupacional La Reine-Manati 1 176
Colegio Mayor de Tecnologia Inc 1 62
Liceo de Arte-Dise-O y Comercio 1 197
Nouvelle Institute 1 86
Top 10 Lowest Percentage receiving Pell Grant
INSTNM PCTPELL UGDS
Bais Binyomin Academy 0 25
United States Coast Guard Academy 0 1044
American Islamic College 0 16
Principia College 0 447
The Southern Baptist Theological Seminary 0 873
New Orleans Baptist Theological Seminary 0 0
United States Naval Academy 0 4495
MGH Institute of Health Professions 0 195
Saint John’s Seminary 0 23
Hillsdale College 0 1511

###This shows the list of the universities with the Highest and lowest percentages of getting a Pell Grant fund from the government.

Section 6

6.1 Compare the average cost of attendance across the number of undergraduates, the percent of students receiving a Pell grant, the average faculty salary and the average family income in whatever way you choose. If one of these variables was classified as a ‘dependent’ variable, which would you say it is and how would you evaluate the effect of the other variables on your dependent variable?

##I would say the dependant variable is Cost for attendence. As the cost of attendence allow the number of undergraduates that can afford to go to college, it needs to be a attainable cost. The number of pell grants are giving out to the students that get into college and cant afford to pay it since there family makes generally $10,000-$20,000 dollars annaully. Based on the graph the cost of the attendence has a negative correlation to pell grants percentage getting receive. Which could be interpreted as cost of college goes up, the less pell grants can ve recieved by the students. Average faculty salary and family income both has positive correlations to the cost of the attenence. So families with more money are willing to pay for the more expensive schools than families with not as much money. For average faculty salary, it looks as the more the professors start to make on average, the cost of the attending rises. WHich isn’t surprising as schools have to pay professors somehow, and tution money can be a factor in this. For these reason, I think the cost of attendence is dependent on on the other variables.

6.2 Compare the student populations of schools in heavily urbanized areas with those in very rural areas. Keep in mind, the type of school varies considerably by urban and rural areas. Do your best to control for this bias with the variables you have available to focus on differences within the populations of urban and rural schools and NOT the differences between the type of school.

Urban (11) vs Rural (43) School Populations
LOCALE count SAT_AVG FAMINC PCTPELL ADM_RATE COSTT4_A PCTFLOAN
11 1570 1149.870 37103.87 0.4964127 0.6743053 29895.12 0.5064058
43 62 1064.462 32992.44 0.4519298 0.6316667 19744.89 0.2707018

###I think the best form to take is a table of a side by side comparison. As you can see the differences very easily between the two areas, As you Look at some similar metrics from both of these areas. Some statical inference you can make is the cost for attending is lower for heavy rural areas. But you don’t know what type of school they are attending which could make a big difference in the cost. Another big difference is the average number of federal student loans the students recieve. The next steps would to run regression on comparing the factors on why the metrics look different and doing some research about these areas as well to help explain the regression findings.

6.3 Question 1: Count the numnber of schools that have an SAT Average Greater than 1300

##This will show the more competitive schools in the country as a 1300 is high performing SAT Score. Is specially if some scored this well on there sat, it can give them some option of what college they might want to apply to in there area or nationally.

#By using a data table we are able to analysis all the colleges that meet the criteria that SAT score average is greater than 1300. This can lead to futher analysis on other factors that makes you more eligible to get accepted into the school. Therefore if one of your dream schools meets one of these requirements, you can do a self analysis of your chances of getting into the school. Or simply calculate how much money or financial aid it would take to go to this school.

6.3 Question 2: What schools have a female percentage that is over 65% and SAT Average Greater than 1200?

#For some people, this is a big issue when choosing a school. As some men have more troubles than others of finding someone to date. But when the your school is greater than 65% female, its could statistically help yours odds at finding someone. Plus, with a score on the sat average of 1200 or greater your surrounded by smarter people most likely. SO you can get a good education and maybe a date!

#Well if you these are the types of schools you are looking at then you will receive a good eduacation. You run a futher analysis of a statistical analysis of your chances of getting a date based on the total student body and percentage of female in the school. You run a analysis of how far the school is from your current location from your house. Plus if you will be flying or driving to this school, and how will you transport your supplies to your dorm room. ALl of these analysis and more can be taken if you pick one of these schools from the data table above.