Introduction

For my final project in my intro to analytics programming course I decided I wanted to run an analysis on the college scorecard data set provided by the U.S. Department of Education. They have a great API that which can be used to obtain a JSON file to work with in R.

The college scorecard data set includes various aspects like the cost to attend a school, its accreditation, and some general attributes like if its an HBCU or a Woman’s only college. The data set includes all kinds of secondary school options from Harvary to various Trade schools.

Here is a link to the College Scorecard Data if you would like to check it out: https://collegescorecard.ed.gov/data/documentation/

In this article, I will be conducting an analysis on some data within the college scorecard data set and I will also run a sentiment analysis on my soon to be alma mater Xavier University. This analysis will show how the University is doing from the perspective of its former and current students.

For the sentiment analysis, I will be using data which I scrapped from a website called Niche.com. The cite rates various different elements of the university. I decided to scrap different reviews from the Academics, Health and safety, and food pages of the website to see how the sentiment of the university as a whole was.

If you are interested please check out ths link which will take you to the source of the data https://www.niche.com/colleges/xavier-university/reviews/

In the end, I hope to discover some trends within colleges today and view the sentiment of my soon to be alma mater. Below is a brief data dictionary outlining some of the variables in the data set.

Data Dictionary

Summary Statistics

Before going into further analysis, lets take a moment to review some general summary statistics for the data that we are using.

Next, I want to see how this differs among the various different secondary school types. Do certain schools cost more, have better students, etc.?

The Control is they type of university. Control 1 is For Profit, 2 is Private Non-Profit, 3 is Private For Profit.Private for profit schools appear to be most prevalent. However, Average admissions rate tends to be slightly high for For Profit Public schools. Private non profit school tended to have the highest SAT scores on average but also the more expensive tuition. It may be that the Private Non Profit schools attract richer families in the United States who may also be able to afford SAT Prep which may enable their child to achieve a better test score. However, it may also be that students with better SAT scores are attracted to Private Non Profit schools.

Now I want to examine the average admission rates for each type of secondary school to see how the average admission rates vary among different Controls or School Types. As a reminder: Control 1 is For Profit, 2 is Private Non-Profit, 3 is Private For Profit.

It appears that Private for Profit schools have the highest median admission rate. Moreover, I noticed For Profit and Private Non-Profit had many outliers with much lower admissions rates.

Analysis of Colleges

This Analysis Portion of this article will try to examine deeper trends within secondary schools in the United States. First, lets look at where each of these colleges are on a map of the United States. I want to see if certain regions tend to have more universities.

It looks like most universities fall within the coastal United State regions. There are many schools in the eastern half of the United States which makes sense because the countries early colonist came from the east and moved west. Moreover, I was a bit surprised to see a huge empty area in the great planes region of the United States. I suppose people do not want to attend schools there because there is nothing around the area.

Next, I want to examine the distribution of admissions rates to see if it is a non normal distribution. I will do so by making a histogram to see if the distribution is normal, right tailed, or left tailed.

As seen in the graph above, it looks like we have a right sweked distribution. In general, this is not that surprising as there are few schools who can be super selective. On the other hand, many schools need attendees and will take lots of people.

Next, I want to examine if there is a trend in SAT Scores and Admissions Rate. Is there any correlation between the two? To solve this I will make a scatter plot.

In general, it looks like the higher the Average SAT score the Lower the admissions rates are. Think about schools like Harvard. They can only accept the best of the best but many apply to the University.

Next, I want to see if there is a correlation between the number of students a given University has and their admissions rate. To solve this I will make a scatter plot of the number of students by admissions rates.

There does not appear to be any relationship between the number of students a given university has and their admissions rates. I have to admit that I did not expect this result. However, it is clear that the total number of undergraduates does not affect admissions rates.

Sentiment Analysis

First, lets take a look at some of the summary Stats of Xavier University from the college scorecard data set to have a better understanding of what the school is like.

Xavier University is a small private non profit school in Cincinnati, Ohio. It is very expensive to attend however, the university has an admissions rate of upwards of 81%. As a former student, I was a bit shocked at this admissions rate. I would have expected it to be closer to 70%.

Before diving into the analysis, I wanted to share the results of the scraping on a high level so you may have an idea of what the data looks like.

As seen in the above, there are 256 reviews with varying groups of students. I found that we had more Freshman than any other category. Additionally it looks like the alumni tend to rate the school the highest as well as the unidentified current students.

Next, I want to see the sentiment of the reviews using the nrc lexicon. This should be a good tell of what emotions people are using when talking about Xavier.

First, I want to make a graph to find out which are the top words use when talking about Xavier on Niche.com’s reviews.

To do this, I will make a new data fram with the reviewer_description being broken up by word and remove stop words which do not have individual meanings. Than, i will graph the word count.

The words which appear are what I would have expected. Its good to see that words like police and safe are appearing. I have heard that Xavier’s Campus wasn’t safe but I guess that isn’t the case.

Now, I want to see what emotions people are using when they write reviews about Xavier.

I will use the NRC lexicon and graph the different emotions for some of the more used words in the reviews.

In general, the sentiment for Xavier University seems to be pretty positive. I was a bit concerned to see the word emergency there but that may just be misunderstood without proper context.

Next, I want to find out what are the most negative words being used in reviews about Xavier. I will use the afinn lexicon to quantify which words are the most negative from the reviews. I will make a table of these words

I found that the Most negatively used word is emergency. As mentioned above this may just be because the context is not properly understood. Moreover, I see that words like stuck and bad appeared which is a bit concerning for Xavier.

Now, I want to see to see if there is a month where reviews were more positive or more negative. To solve this, I will use the Bing Lexicon, and graph the count of each word by month.

Reviews appear to be the most positive in January and they appear to be the most negative in March. I wonder if this is because Xavier did not preform well during March Madness.

Conclusion

In general, it seems that Xavier universities sentiments are positive. I was surprised to see that the cost of the university along with its admissions rates were very high. It does make me slightly worried. However, it was reassuring to see the sentiment was positive. Maybe these means attending Xavier University is worth the money.

Further Analysis

To improve this analysis I would recommend getting more reviews and additionally trying to compare Xavier to another school like Butler University to see how it matches up.