Assignment 7

Author

Jamison Mogensen

Published

April 30, 2024

Overview

For this assignment I decided to compare the sentiment of Xavier University to the University of Cincinnati. I started by scraping data from niche.com which contains reviews writen by the students who have attended. To be able to scrape enough data, I had to use the filter aspect of the website which filtered reviews by excellent, very good, average, poor, or terrible. This also allowed me to have a wide variety of reviews to compare the schools by. One issue that I did have when scraping was that every review was scraped two times but it should not effect the data because it was done that way for each school.

Question 1

What are the most frequent words associated with each school?

For this question, the data that I am interested in collecting would be the content of each review associated with the respected school. To do this I scraped the reviews from each schools page on niche.com and I put all of the reviews into a data frame. Once I had each review in a data frame I created a new df called tidy_reviews where it broke each review up by the word. I then listed each word with the number of times it appeared. Below would be the top 30 words. I did end of excluding words that had no meaning to me such as “school” because those types of words would appear too often.

The results do not show me much. Somthing that does stand out to me would be the words on the bottom half of this list. Many of the words appear for both schools but for uc, some words that stand out would be city, money, and major. The reason that these words stand out to me is they are all ways in which the school differentiates itself from Xavier. With UC being closer to the city, being less money, and having different majors than Xavier, it is interesting to see those words show up on the list.

Question 2

Which school has more positive words and which has more negative?

For this question, the data that I am interested in collecting would be the content of each review associated with the respected school. To do this I scraped the reviews from each schools page on niche.com and I put all of the reviews into a data frame. I then used the tidy_reviews dataframe from the question above to find the sentiment of each word using the bing word dictionary.

The only thing that stands out to me in this would be how similar the results are. UC has a little bit more negativity but the schools have the same number for positive sentiment and that is interesting. By seeing this chart alone it may tell us that the reviewers have very similar feeling for both schools. The reason I find this interesting is because although both schools are in close proximity to each other, they are very different universities so the results being so similar might tell us that although both schools are different, they both have their pros and cons.

Question 3

How does positivity score fluctuate in regards to school year (ie. Senior, junior…)

For this question, not only am I interested in the content of each review but also the school year of the reviewer. Niche.com includes the reviewers school year at the bottom of their review so the only extra thing I had to do here was to pull the html nodes of the school year and include them within the data frame. This is my question that is meant to include a chronological component. I figured as the years go for srudents, do thier sentiments change? The other way I could have used a chronological component would have been by using the date at which the review was posted. My issue with this was that niche.com displays how long ago a review was posted, but not the date. So I could have scraped that but the issue is that some say “a day a ago” while others could say “4 months ago” and others might say “2years ago”. I just could not think of a way to clean that data and format it.

Several things to me stand out a lot and tell a story. The first would be how Xavier has noticeably more positive words coming from freshman. This could be for many different factors such as our freshman orientation, dorm halls, class size, etc. The next interesting thing would be that UC has better results from sophomores. Some hypothetical reasons for this could be sophomore dorm halls, off campus housing for sophomores, class difficulty, etc. What this tells me is that xavier makes a good first impression on its students but once the students ease in to college, by the next year UC is more enjoyable.