If you made it to this page, chances are you have yet to fall victum to the Virus!!! Good for you :)
This document is going to explore my finding from the Exam and some of the results I found when exploring the data.
The data provided is from the Ohio Department of education. This analyzes multiple universities and colleges accross the states and compares charecteristics like ACT and SAT acceptance scores as well as average family income.
The next couple segments will explore the data into more depth with analysis and graphs. These will better equip you to understand the differences in secondary education in Ohio compared to schools regionally as well as nationally.
The packages I used for this Exam were the tidyverse dplyr knitr DT These package will allow me to use the basic functions needed to both analyze and graph my results. These packages must be installed in order to properly run the commands below.
In this part of the assignment, I looked into the cleanliness of the data as well as some of the missing variables. One thing that I noticed was the negative values within the LOCALE column. I decided to leave these as they were and remove them later when running any analysis. I also changed any Null within the data to NA. I also checked the data types for each variable. In addition to these checks, it was important to address the NA values that were present throughout the dataset. Many of these values are either missing or not provided from the data gathering process.
The next few sections will explore some questions that were raised regarding the dataset.
| Variable | Description |
|---|---|
| CONTROL | This section had a few datapoints that were texts |
| Locale | The negative numbers were replaced with NA |
| HBCU | Some of these values carried over as NULL |
Please note: These errors were fixed within the dataset
This Graph shows the count for each instition in the state of Ohio and the surrounding states.
The next graph shows how cost varies by family income for all institutions
The Next Graph compares the number of undergraduates across the three types of institutional controls
This graph shows the relationship between SAT and family income accrosse the states that boarder OHIO
My response is that It does cost more to attend a private institution than a public institution. I created a box plot to show the ranges for each type of institute as well as the means. The average cost to attend a public institution is roughly 14 thousand a year. The cost to attend a private institution is roughly 30 thousand a year. So I do believe it cost more to go to a private institute.
The Average family income of students at Xavier University according to the data is $114,329.60. The Ohio family income average for students is roughly $30,000. The national Average Family income for the nation is roughly $20,000 as well. Based on these findings, it is true that Xavier is one of the highest average family incomes in the state and in the nation.
There is not much difference when comparing the average cost of attendence between Ohio and surrounding schools. However, there is a wider cost range when looking at the surround schools in states boardering Ohio. When looking at schools nationally, the average is roughly higher when compared to the other two.
There are a few schools that have a 0% for undergraduates that recieve a Pell grant as well as a few schools that have 100% for undergraduates that recieve a Pell grant. These schools are listed within the markdown file.
This section will dive deeper into more analysis regarding the data.
The question asked to Compare the average cost of attendance across the number of undergraduates, the percent of students receiving a Pell grant, the average faculty salary and the average family income in whatever way you choose.
If one of these variables was classified as a ‘dependent’ variable, I would you say the Average Cost of Attendence would be the dependent while the other variables are independent. The independent variables are the ones that affect the overall cost.
I would run a test with all of the variables to see which ones are truely significant to better estimate what affects average cost.
This next set of charts evaluates the student populations within heavily populated areas when compared to those in smaller population areas. These were then filtered by schools that had a mean family income greater than the average. This helped to eliminate some of the bias that would have occured by lower average income universities skewing the data.
My data showed that the average admission number was higher in areas with lower total populations than the higher populated areas.
Compare the average family income in Texas Universities or colleges to Ohio.
This will be done by creating a dummy variable that finds the state abv. TX and then finding the ones that are OH. Then I will create a graph that compares the two. I thought this would be an interesting question to ask because one of my good friends for high school went to a University in Texas. I wanted to compare what the family incomes were like between the two states. I found from my research that the average family income in Texas was actually half of what the Average Family income was in Ohio. Performing a hypothesis test would help to show if the findings are significant.
I will do this by creating creating a dotplot that has SAT scores on the X axis and Number of students on the y axis. I have always heard that schools with a higher SAT average are harder to get into. I wanted to see if that means they accept less students.I found that There is a slight increase as SAT scores reach their mean, then there is a slight decrease as the SAT scores start to reach their highest scores. I am not sure how significant my findings are. Performing a hypothesis test would help to show if the findings are significant.