In this report I perform text analysis of the publicly available review data posted on Trip Advisor for Hotel Jury’s Inn in Nottingham, GB. I scraped 3052 partial entries of reviews from tripadvisor’s website using R and analysed them. Below are some of the key findings.
The hotel in question is Jury’s Inn located in Nottingham, GB. According to Trip Advisor, it offers the second best value out of 90 places to stay in Nottingham. The hotel offers free wifi and has a restaurent. Being a mid-range hotel, it is very popular with more than 3000 reviews posted on Trip Advisor. Such volume of unstructured textual data is very difficult to acquire in a tidy format. In this project I scraped the reviews using Selector Gadget and the Rvest package. The initial starting point is the following URL:
Using Selector Gadget extension on chrome, I obtained the CSS selector for the partial entries of the reviews. Using the Rvest package in R I scraped the textual information contained within the corresponding HTML tags in this URL which only contained 5 reviews.
url <- 'https://www.tripadvisor.in/Hotel_Review-g186356-d574675-Reviews-Jurys_Inn_Nottingham-Nottingham_Nottinghamshire_England.html'
html <- read_html(url)
review_count<-html%>%
html_nodes('.hotels-hotel-review-community-content-TabBar__tabCount--37DbH')%>%
html_text()
reviews<-html%>%
html_nodes('.common-text-ReadMore__content--2ufdh')%>%
html_text()
review_count <- as.numeric(gsub(',','',review_count[1]))
num <- seq(5,review_count,5)
The next challenge was to scrape all reviews in a programmable and reproducible manner without opening each URL explicitly. After some investigation of the various URLs of the review pages, I identified a pattern. Notice the highlighted portion in each of the three URLs. The number following the characters “OR” increments by 5 each time the we move from page n to page n+1. Using this information I was able to write code that loops over all pages and scrape all available reviews.
The scraped reviews also included the response by the hotel staff to the customer feedback. These needed to be removed in order to only get the customer perception of the hotel. Once this was done we were left with only the 3052 customer reviews with textual data which was now ready to be analysed.
| reviews | |
|---|---|
| 1 | Great hotel, dated bathroom so that was a bit gross, but the rooms and beds were great. The view is great, No parking on site and the closest car park is really difficult to get to. However, everything in Nottingham is really walkable and this hotel is in a good location to walk… |
| 2 | I have stayed there twice myself. This time i booked it for myself and my husband. Rooms are very clean and quiet. Staff are very friendly. Amazing location and great price. Highly recommend to anyone |
| 3 | The hotel is just along from the train station so was a perfect location as my friend had travelled from Devon. The receptionist was lovely as were the breakfast staff. We were surprised with a bottle of Prosecco on ice in our room and a lovely message which was totally… |
| 4 | Great location, great rooms, very friendly staff. There was a mix up with our rooms but this was quickly rectified and we were even brought a bottle of Prosecco as an apology. The prices are really good for a hotel this nice! Thankyou ! |
| 5 | Well impressed, hotel was very clean, quiet, well maintained, breakfast was everything you want, staff very friendly. Close to the centre, no car park tho. 10 m8nute walk to 2 car parks, that would be the only negative thing |
| 6 | Location is superb, being right next to the train and tram station. Most hotel employees were friendly and helpful. Sleep wasn’t amazing as there were people around my room deciding to have a party till the wee hours in the morning. Room came short of a curtain but the… |
The first bit of analysis that I did using NLP techniques is to break the text into n-gram tokens (n ranging from 1-2) which basically means breaking the text reviews into n-word combinations called monogram (n=1) and bigram (n=2). Before doing so, I removed all the words that hold little relevance to our analysis such as auxillary verbs like ‘is’ and ‘are’.
Once the textual reviews have been broken into tokens, I aggregated them to count the total number of times they occur in the dataset. Now I can use the R package ‘Udpipe’ to perform advanced text analysis. This package uses pre-trained language models and in this project I use it to annotate our n-grams as nouns, adjectives, verbs or adverbs. Using this we can then start generating graphics that represent the customers’ perception towards the Jury’s Inn.
When it comes to hotel reviews, the best way to represent customer perception is to see what adjectives are commonly used by the people in their reviews. For bigrams, we can see which adjective containing bigram are the most common.
By looking at commonly occuring bigrams containing a keyword and an adjective we can extract the perception of the customers towards that aspect of the hotel. Here we look at what people think about the hotel in general, the staff, the breakfast, the bedrooms and the bathrooms. Many other keywords can be used to gather information about the other aspects of the hotel.
As seen above the most common words used to describe the staff at the Jury’s Inn are friendly, helpful, clean and welcoming. This is great feedback for the staff and will motivate them to keep up the good work that they have been doing.
As seen above the most common words used to describe the bedrooms at the Jury’s Inn are clean, spotless, and comfortable. This is great feedback for the cleaning staff of the hotel and will motivate them to keep up the good work that they have been doing.
As seen above the most common words used to describe the bathrooms at the Jury’s Inn are nice, spotless, and spacious. This is great feedback for the cleaning staff of the hotel and will motivate them to keep up the good work that they have been doing.
As seen above the most common words used to describe the Jury’s Inn are nice, lovely, clean, modern and excellent. This is great feedback for the hotel management and will come in handy in marketing.
We can calculate the sentiments of all the things being said about the hotel and rate them for 6 different emotions. By calculating the average of each of these 6 emotions for all reviews, we can understand the perception towards the hotel relative to each of these emotions.
On an average, the customers feel trust, joy and anticipation towards the Jury’s Inn, which is a great achievement for the hotel and its staff.