I decided I wanted to run a sentiment analysis on my soon to be alma mater Xavier University to determine whether the sentiment was good, bad, or meh. This analysis will show how the University is doing from the perspective of its former and current students.
I will be using data which I scrapped from a website called Niche.com. The cite rates various different elements of the university. I decided to scrap different reviews from the Academics, Health and safety, and food pages of the website to see how the sentiment of the university as a whole was.
If you are interested please check out ths link which will take you to the source of the data https://www.niche.com/colleges/xavier-university/reviews/
Before diving into the analysis, I wanted to share the results of the scraping on a high level so you may have an idea of what the data looks like.
## # A tibble: 1 × 1
## `Count of reviews`
## <int>
## 1 256
## # A tibble: 6 × 3
## reviewer_year `Count of Each Group of Reviewer` `Mean Review Rating`
## <chr> <int> <dbl>
## 1 College Freshman 125 3.47
## 2 College Junior 30 3.9
## 3 College Senior 31 3.42
## 4 College Sophomore 47 3.81
## 5 College Student 8 4.38
## 6 Recent Alumnus 15 3.93
As seen in the above, there are 256 reviews with varying groups of students. I found that we had more Freshman than any other category. Additionally it looks like the alumni tend to rate the school the highest as well as the unidentified current students.
Next, I want to see the sentiment of the reviews using the nrc lexicon. This should be a good tell of what emotions people are using when talking about Xavier.
First, I want to make a graph to find out which are the top words use when talking about Xavier on Niche.com’s reviews.
To do this, I will make a new data fram with the reviewer_description being broken up by word and remove stop words which do not have individual meanings. Than, i will graph the word count.
The words which appear are what I would have expected. Its good to see
that words like police and safe are appearing. I have heard that
Xavier’s Campus wasn’t safe but I guess that isn’t the case.
Now, I want to see what emotions people are using when they write reviews about Xavier.
I will use the NRC lexicon and graph the different emotions for some of the more used words in the reviews.
In general, the sentiment for Xavier University seems to be pretty positive. I was a bit concerned to see the word emergency there but that may just be misunderstood without proper context.
Next, I want to find out what are the most negative words being used in reviews about Xavier. I will use the afinn lexicon to quantify which words are the most negative from the reviews. I will make a table of these words
## # A tibble: 7 × 3
## word n value
## <chr> <int> <dbl>
## 1 annoying 3 -2
## 2 bad 6 -3
## 3 boring 4 -3
## 4 crime 5 -3
## 5 emergency 7 -2
## 6 sick 3 -2
## 7 stuck 3 -2
I found that the Most negatively used word is emergency. As mentioned above this may just be because the context is not properly understood. Moreover, I see that words like stuck and bad appeared which is a bit concerning for Xavier.
Now, I want to see to see if there is a month where reviews were more positive or more negative. To solve this, I will use the Bing Lexicon, and graph the count of each word by month.
Reviews appear to be the most positive in January and they appear to be
the most negative in March. I wonder if this is because Xavier did not
preform well during March Madness.
In general, it seems that Xavier universities sentiments are positive. To improve this analysis I would recommend getting more reviews and additionally trying to compare Xavier to another school like Butler University to see how it matches up.