Executive summary

There is power in a “review.” In today’s digital era, many consumers check reviews before making purchasing decisions. Hearing directly from real users—rather than relying solely on corporate messaging—has significantly changed consumption behavior. Reviews reflect personal experiences, both positive and negative, and benefit not only consumers but also businesses. With the rise of accommodation booking platforms, travelers now have the opportunity to read and write reviews about their stays. Since accommodations often represent a carefully chosen part of one’s limited and valuable travel time, people tend to make more deliberate decisions—making reviews even more influential.

This report analyzes customer reviews of luxury hotels, focusing on the purpose of travel. The primary objective is to explore whether there are significant textual differences between reviews written by ‘business travelers’ and those written by ‘leisure travelers’.

Understanding these differences can benefit all stakeholders. Hotels can gain insight into what different types of guests value most. Travelers can use such insights to better select accommodations aligned with their travel purpose. Moreover, booking platforms can enhance their recommendation algorithms by incorporating travel-purpose-based keyword analysis, ultimately offering more tailored suggestions to users.

The analysis addresses two key questions: Do review scores and sentiment tendencies differ between business and leisure travelers? What specific words or expressions are frequently mentioned in each group’s reviews?

I hypothesized that reviews from business travelers would contain a higher proportion of negative sentiments, while reviews from leisure travelers would be more positive overall. I also expected that business trip reviews would frequently mention practical aspects such as Wi-Fi and bedding, whereas leisure trip reviews would highlight amenities like pools, bars, and other recreational facilities.

After conducting the analysis, the results revealed several notable differences between leisure and business trip reviewers. There are slight but observable differences between the two groups in both review scores and sentiment tendencies. Business travelers tend to give slightly lower scores and express more negative sentiments, while leisure travelers show more positive emotions such as joy and anticipation. These differences, though subtle, suggest that leisure trips are generally perceived more favorably.

keyword analysis indicates that while both groups use common terms, their word choices reflect distinct priorities: leisure travelers often employ emotionally rich and celebratory language, whereas business travelers focus on practicality and work-related contexts.

Bigram analysis further shows that leisure reviews emphasize emotional and experiential expressions, while business reviews highlight functional and convenience-oriented phrases. This suggests structural differences in how each group describes their stay.

Data background

The dataset was scraped from Booking.com and contains approximately 515,000 customer reviews and scores for 1,493 luxury hotels across Europe. The data was collected over a two-year period, from August 4, 2015 to August 3, 2017.

The CSV file includes 17 fields, out of which five variables were selected for the analysis. A brief description of each selected field is provided below: - Hotel_Name: The name of the hotel. - Negative_Review: The negative review provided by the reviewer. If there is no negative review, the field contains ‘No Negative’. - Positive_Review: The positive review provided by the reviewer. If there is no positive review, the field contains ‘No Positive’. - Reviewer_Score: The score assigned to the hotel by the reviewer, based on their overall experience. - Tags: Descriptive tags selected by the reviewer, often indicating the purpose of stay, type of trip, or room information.

The remaining variables not used in this analysis are described below. - Hotel_Address: Address of hotel. - Review_Date: Date when reviewer posted the corresponding review. - Average_Score: Average Score of the hotel, calculated based on the latest comment in the last year. - Reviewer_Nationality: Nationality of Reviewer - Review_Total_Negative_Word_Counts: Total number of words in the negative review. - Review_Total_Positive_Word_Counts: Total number of words in the positive review. - Total_Number_of_Reviews_Reviewer_Has_Given: Number of Reviews the reviewers has given in the past. - Total_Number_of_Reviews: Total number of valid reviews the hotel has. - days_since_review: Duration between the review date and scrape date. - Additional_Number_of_Scoring: There are also some guests who just made a scoring on the service rather than a review. This number indicates how many valid scores without review in there. - lat: Latitude of the hotel - lng: longtitude of the hotel

Data loading, cleaning and preprocessing

First, load Hotel_Reviews.csv dataset into ‘df_hotel’ variable.

df_hotel <- read.csv("Hotel_Reviews.csv")
head(df_hotel)

##                                               Hotel_Address
## 1  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 2  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 3  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 4  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 5  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 6  s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
##   Additional_Number_of_Scoring Review_Date Average_Score  Hotel_Name
## 1                          194    8/3/2017           7.7 Hotel Arena
## 2                          194    8/3/2017           7.7 Hotel Arena
## 3                          194   7/31/2017           7.7 Hotel Arena
## 4                          194   7/31/2017           7.7 Hotel Arena
## 5                          194   7/24/2017           7.7 Hotel Arena
## 6                          194   7/24/2017           7.7 Hotel Arena
##   Reviewer_Nationality
## 1              Russia 
## 2             Ireland 
## 3           Australia 
## 4      United Kingdom 
## 5         New Zealand 
## 6              Poland 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Negative_Review
## 1  I am so angry that i made this post available via all possible sites i use when planing my trips so no one will make the mistake of booking this place I made my booking via booking com We stayed for 6 nights in this hotel from 11 to 17 July Upon arrival we were placed in a small room on the 2nd floor of the hotel It turned out that this was not the room we booked I had specially reserved the 2 level duplex room so that we would have a big windows and high ceilings The room itself was ok if you don t mind the broken window that can not be closed hello rain and a mini fridge that contained some sort of a bio weapon at least i guessed so by the smell of it I intimately asked to change the room and after explaining 2 times that i booked a duplex btw it costs the same as a simple double but got way more volume due to the high ceiling was offered a room but only the next day SO i had to check out the next day before 11 o clock in order to get the room i waned to Not the best way to begin your holiday So we had to wait till 13 00 in order to check in my new room what a wonderful waist of my time The room 023 i got was just as i wanted to peaceful internal garden view big window We were tired from waiting the room so we placed our belongings and rushed to the city In the evening it turned out that there was a constant noise in the room i guess it was made by vibrating vent tubes or something it was constant and annoying as hell AND it did not stop even at 2 am making it hard to fall asleep for me and my wife I have an audio recording that i can not attach here but if you want i can send it via e mail The next day the technician came but was not able to determine the cause of the disturbing sound so i was offered to change the room once again the hotel was fully booked and they had only 1 room left the one that was smaller but seems newer 
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             No Negative
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Rooms are nice but for elderly a bit difficult as most rooms are two story with narrow steps So ask for single level Inside the rooms are very very basic just tea coffee and boiler and no bar empty fridge 
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   My room was dirty and I was afraid to walk barefoot on the floor which looked as if it was not cleaned in weeks White furniture which looked nice in pictures was dirty too and the door looked like it was attacked by an angry dog My shower drain was clogged and the staff did not respond to my request to clean it On a day with heavy rainfall a pretty common occurrence in Amsterdam the roof in my room was leaking luckily not on the bed you could also see signs of earlier water damage I also saw insects running on the floor Overall the second floor of the property looked dirty and badly kept On top of all of this a repairman who came to fix something in a room next door at midnight was very noisy as were many of the guests I understand the challenges of running a hotel in an old building but this negligence is inconsistent with prices demanded by the hotel On the last night after I complained about water damage the night shift manager offered to move me to a different room but that offer came pretty late around midnight when I was already in bed and ready to sleep 
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   You When I booked with your company on line you showed me pictures of a room I thought I was getting and paying for and then when we arrived that s room was booked and the staff told me we could only book the villa suite theough them directly Which was completely false advertising After being there we realised that you have grouped lots of rooms on the photos together leaving me the consumer confused and extreamly disgruntled especially as its my my wife s 40th birthday present Please make your website more clear through pricing and photos as again I didn t really know what I was paying for and how much it had wnded up being Your photos told me I was getting something I wasn t Not happy and won t be using you again 
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Backyard of the hotel is total mess shouldn t happen in hotel with 4 stars 
##   Review_Total_Negative_Word_Counts Total_Number_of_Reviews
## 1                               397                    1403
## 2                                 0                    1403
## 3                                42                    1403
## 4                               210                    1403
## 5                               140                    1403
## 6                                17                    1403
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Positive_Review
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Only the park outside of the hotel was beautiful 
## 2  No real complaints the hotel was great great location surroundings rooms amenities and service Two recommendations however firstly the staff upon check in are very confusing regarding deposit payments and the staff offer you upon checkout to refund your original payment and you can make a new one Bit confusing Secondly the on site restaurant is a bit lacking very well thought out and excellent quality food for anyone of a vegetarian or vegan background but even a wrap or toasted sandwich option would be great Aside from those minor minor things fantastic spot and will be back when i return to Amsterdam 
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Location was good and staff were ok It is cute hotel the breakfast range is nice Will go back 
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Great location in nice surroundings the bar and restaurant are nice and have a lovely outdoor area The building also has quite some character 
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Amazing location and building Romantic setting 
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Good restaurant with modern design great chill out place Great park nearby the hotel and awesome main stairs 
##   Review_Total_Positive_Word_Counts Total_Number_of_Reviews_Reviewer_Has_Given
## 1                                11                                          7
## 2                               105                                          7
## 3                                21                                          9
## 4                                26                                          1
## 5                                 8                                          3
## 6                                20                                          1
##   Reviewer_Score
## 1            2.9
## 2            7.5
## 3            7.1
## 4            3.8
## 5            6.7
## 6            6.7
##                                                                                                                                  Tags
## 1                                                         [' Leisure trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 6 nights ']
## 2                                                         [' Leisure trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 4 nights ']
## 3 [' Leisure trip ', ' Family with young children ', ' Duplex Double Room ', ' Stayed 3 nights ', ' Submitted from a mobile device ']
## 4                                                  [' Leisure trip ', ' Solo traveler ', ' Duplex Double Room ', ' Stayed 3 nights ']
## 5                                  [' Leisure trip ', ' Couple ', ' Suite ', ' Stayed 2 nights ', ' Submitted from a mobile device ']
## 6                                                           [' Leisure trip ', ' Group ', ' Duplex Double Room ', ' Stayed 1 night ']
##   days_since_review      lat      lng
## 1            0 days 52.36058 4.915968
## 2            0 days 52.36058 4.915968
## 3            3 days 52.36058 4.915968
## 4            3 days 52.36058 4.915968
## 5           10 days 52.36058 4.915968
## 6           10 days 52.36058 4.915968

Upon examining the Tags column in the df_hotel data frame, it was found that each entry contains multiple pieces of information related to the trip, such as: [‘Leisure trip’, ‘Couple’, ‘Duplex Double Room’, ‘Stayed 6 nights’]

Among these, the purpose of the trip—whether it was for leisure, business, or unspecified—is embedded within the string, which resembles a list but is actually stored as a single text value.

For this analysis, only the trip purpose was extracted and categorized into a new column called purpose, with three possible values: leisure, business, or unknown.

As a result of this classification: The number of reviews tagged as business trips was 82,939, Those tagged as leisure trips numbered 417,778, And 15,021 entries did not specify the purpose and were classified as unknown.

df_hotel <- df_hotel %>%
  mutate(purpose = ifelse(str_detect(Tags, "Leisure trip"), "leisure",
                          ifelse(str_detect(Tags, "Business trip"), "business", NA)))
df_hotel %>%
  count(purpose)

##    purpose      n
## 1 business  82939
## 2  leisure 417778
## 3     <NA>  15021

The 15,021 entries with an unknown trip purpose were excluded from the analysis. Although the number of leisure reviews was about five times greater than that of business reviews, the business category still contained a substantial number of entries. Therefore, to ensure a fair comparison between the two groups, the analysis focused on examining the relative proportions within each category, rather than relying on raw counts.

From the df_hotel data frame, five columns relevant to the analysis were selected and stored in a new variable called “total_review”.

total_review <- df_hotel %>% 
  filter(!is.na(purpose)) %>% 
  select(Average_Score, Hotel_Name, Negative_Review, Positive_Review, Reviewer_Score, Tags, purpose)
  

total_review %>% 
  count(purpose)

##    purpose      n
## 1 business  82939
## 2  leisure 417778

summary(total_review)

##  Average_Score    Hotel_Name        Negative_Review    Positive_Review   
##  Min.   :5.200   Length:500717      Length:500717      Length:500717     
##  1st Qu.:8.100   Class :character   Class :character   Class :character  
##  Median :8.400   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :8.401                                                           
##  3rd Qu.:8.800                                                           
##  Max.   :9.800                                                           
##  Reviewer_Score       Tags             purpose         
##  Min.   : 2.500   Length:500717      Length:500717     
##  1st Qu.: 7.500   Class :character   Class :character  
##  Median : 8.800   Mode  :character   Mode  :character  
##  Mean   : 8.403                                        
##  3rd Qu.: 9.600                                        
##  Max.   :10.000

Text data analysis

In this dataset, the positive and negative aspects of each review have already been separated and stored in two distinct columns(“Negative_Review”, “Positive_Review”). However, since the reasons for positive or negative feedback can vary depending on the specific circumstances of each hotel, this analysis focuses not on separating them, but rather on combining both to examine the most frequently mentioned keywords. These commonly mentioned terms are likely to reflect the factors that guests consider most important when choosing or evaluating a hotel.

To facilitate this analysis, the tokenized words from both positive and negative reviews are stored in a new variable named total_review_word, where each tokenized word is held in the column word, along with the associated travel purpose.

negative_review_word <- total_review %>% 
  select(Negative_Review, purpose) %>% 
  unnest_tokens(word, Negative_Review) %>% # Tokenize the text into words
  anti_join(stop_words) # Remove stop words

## Joining with `by = join_by(word)`

head(negative_review_word)

##   purpose    word
## 1 leisure   angry
## 2 leisure    post
## 3 leisure   sites
## 4 leisure planing
## 5 leisure   trips
## 6 leisure mistake

positive_review_word <- total_review %>% 
  select(Positive_Review, purpose) %>% 
  unnest_tokens(word, Positive_Review) %>% # Tokenize the text into words
  anti_join(stop_words) # Remove stop words

## Joining with `by = join_by(word)`

head(positive_review_word)

##   purpose       word
## 1 leisure       park
## 2 leisure      hotel
## 3 leisure  beautiful
## 4 leisure       real
## 5 leisure complaints
## 6 leisure      hotel

total_review_word <-
  bind_rows(positive_review_word, negative_review_word)
head(total_review_word)

##   purpose       word
## 1 leisure       park
## 2 leisure      hotel
## 3 leisure  beautiful
## 4 leisure       real
## 5 leisure complaints
## 6 leisure      hotel

Individual analysis and figures

The analysis addresses two key questions: Do review scores and sentiment tendencies differ between business and leisure travelers? -> Anaysis and Figure1

What specific words or expressions are frequently mentioned in each group’s reviews? -> Anaysis and Figure2,3

Anaysis and Figure 1: Review Scores & Sentiment Distribution

To understand how review scores differ based on the travel purpose, this plot visualizes the distribution of “Reviewer_Score” for both business and leisure travelers. Since the review score itself can be considered part of the overall review, it is important to examine its distribution before conducting text analysis.

From the resulting graph, we observe that: - Both types of travelers tend to give higher scores overall (scores concentrated between 7 and 10), - However, business travelers have a relatively higher proportion of scores below 8, - While leisure travelers are more likely to give scores above 8.

This suggests that business travelers may be slightly more critical in their evaluations compared to leisure travelers.

total_review %>% 
  count(purpose, Reviewer_Score) %>% #Counting the number of reviews per score and purpose
  left_join(total_review %>% 
              count(purpose, name = "total"), by = "purpose") %>% #Calculating the total number of reviews per purpose
  mutate(ratio = n / total * 100) %>%  #computing the ratio to obtain the percentage
  ggplot(aes(x = Reviewer_Score, y = ratio, group = purpose, color = purpose)) +
  geom_line(size = 1, show.legend = TRUE) + 
  labs(
    x = "Reviewer Score",
    y = "Percentage of Reviews",
    title = "1-1. Distribution of Reviewer Scores by Purpose",
    color = "Purpose" 
  ) +
  scale_color_manual(
    values = c("leisure" = "tomato", "business" = "steelblue") #custom colors assigned to each purpose
  ) +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

To complement the analysis of numerical review scores, sentiment analysis was conducted on the textual content of reviews to uncover underlying emotional patterns across different travel purposes.

The sentiment distribution in reviews reveals only subtle differences between business and leisure travelers:

Leisure travelers show slightly higher proportions of joy (15.3%), positive (29%), and anticipation (9.9%), suggesting more emotional engagement and enthusiasm.

Business travelers display a marginally higher share of negative sentiment (13.5%), which may imply a more critical or evaluative attitude.

Other categories such as trust, sadness, and anger show minimal variation between the two groups.

These results indicate that while overall sentiment patterns are similar, leisure travelers tend to express slightly more positive emotions, possibly due to being on vacation, while business travelers may be more pragmatic or critical in tone.

# loading NRC sentiment data
# sentiment_list variable contains list of emotions that NRC work with.
nrc <- get_sentiments("nrc")
table(nrc$sentiment) # Check available sentiment categories

## 
##        anger anticipation      disgust         fear          joy     negative 
##         1245          837         1056         1474          687         3316 
##     positive      sadness     surprise        trust 
##         2308         1187          532         1230

# Count the occurrence of each sentiment per travel purpose
sentiment_count <- total_review_word %>%
  group_by(purpose) %>% 
  inner_join(nrc) %>%  # Match words with NRC sentiment categories
  count(sentiment) %>% 
  mutate(total = sum(n)) %>% 
  mutate(ratio = n/total*100) # Convert counts to percentages

## Joining with `by = join_by(word)`

## Warning in inner_join(., nrc): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 1615 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

head(sentiment_count)

## # A tibble: 6 × 5
## # Groups:   purpose [1]
##   purpose  sentiment        n  total ratio
##   <chr>    <chr>        <int>  <int> <dbl>
## 1 business anger        25693 605299  4.24
## 2 business anticipation 56083 605299  9.27
## 3 business disgust      16966 605299  2.80
## 4 business fear         16777 605299  2.77
## 5 business joy          81532 605299 13.5 
## 6 business negative     72749 605299 12.0

# Create a donut chart to visualize sentiment distribution by trip purpose
ggplot(sentiment_count, aes(x = 2, y = ratio, fill = sentiment)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") + # Convert to circular layout
  facet_wrap(~purpose) +  # Split by travel purpose
  geom_text(aes(label = paste0(round(ratio, 1), "%")), 
            position = position_stack(vjust = 0.5),
            size = 3) +
  labs(title = "1-2. Sentiment Distribution in Reviews by Trip Purpose") +
  theme_void() +
  theme(legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5)) +
  xlim(0.5, 2.5) + 
  theme_light()

Anaysis and Figure 2: Keyword and Distinctive Word Analysis

To demonstrate the second hypothesis, the frequency of words mentioned in reviews by each traveler type was calculated and visualized as a bar graph While there is a significant overlap in common terms—such as staff, location, hotel, breakfast, and bed—some words suggest each group’s unique focus.

For instance, business reviews often include wifi, quiet, and location, which align with practical concerns for working travelers. In contrast, leisure reviews contain more emotionally expressive words such as lovely, amazing, and bar, reflecting vacation-related experiences.

word_frequency <- total_review_word %>% 
  count(purpose, word, sort=T) %>% # Count the frequency of words
  left_join(total_review_word %>% 
              count(purpose, name="total")) # Combine a dataset with a column of the total number of ~

## Joining with `by = join_by(purpose)`

word_frequency %>% 
  mutate(freq = n / total * 100,
         purpose = factor(purpose, levels = c("leisure", "business"))) %>% # Set order: leisure first
  group_by(purpose) %>% 
  slice_max(freq, n = 20) %>% 
  ggplot(aes(x = reorder_within(word, n, purpose), y = freq, fill = purpose)) +
  geom_col(show.legend = FALSE) +
  scale_x_reordered() +
  coord_flip() +
  facet_wrap(~purpose, scales = "free_y") +
  labs(x = NULL, y = "Word Frequency (%)", title = "2-1. Top 20 Words in Reviews by Trip Purpose") +
  theme_minimal()

Because many of the top words were neutral or overlapping, we proceeded to a more refined analysis using log odds ratio to identify distinctive keywords that characterize each group’s review language.

Words with positive log ratios (right side) are more common in leisure reviews and are often related to celebration and personal occasions, such as anniversary, honeymoon, birthday, and balloons.

Words with negative log ratios (left side) appear more often in business reviews and include terms like congress, exam, workplace, invoices, and meetings, indicating a professional or work-related context.

This allows us to clearly distinguish between emotional/personal vocabulary used in leisure travel and pragmatic/functional vocabulary used in business travel.

# Step 1: Word frequency and reshaping
word_ratios <- total_review_word %>%
  count(word, purpose) %>%
  group_by(word) %>%
  filter(sum(n) >= 10) %>%
  ungroup() %>%
  pivot_wider(names_from = purpose, values_from = n, values_fill = list(n = 0)) %>%
  mutate(
    leisure = (leisure + 1) / (sum(leisure + 1)),   # Laplace smoothing
    business = (business + 1) / (sum(business + 1)),
    logratio = log(leisure / business),
    category = ifelse(logratio > 0, "Leisure", "Business")  # for better legend
  ) %>%
  arrange(desc(logratio))

# Step 2: Visualization (Top 15 for each)
word_ratios %>%
  group_by(category) %>%
  slice_max(abs(logratio), n = 15, with_ties = FALSE) %>%
  ungroup() %>%
  mutate(word = reorder(word, logratio)) %>%
  ggplot(aes(word, logratio, fill = category)) +
  geom_col(show.legend = TRUE) +
  coord_flip() +
  labs(
    x = NULL,
    y = "Log Odds Ratio (Leisure / Business)",
    title = "2-2. Top Distinctive Words in Reviews by Trip Purpose",
    fill = "Dominant in"
  ) +
  theme_minimal()+ 
  scale_fill_manual(
    values = c("Leisure" = "tomato", "Business" = "steelblue"),
    name = "Trip Purpose"
  )

Anaysis and Figure 3: Word Co-occurrence Network

To explore not only individual word usage but also contextual meaning, we visualized bigram networks derived from hotel reviews by leisure and business travelers. Each network node represents a word, and edges represent co-occurrence within a two-word phrase (bigram). By analyzing node centrality and cluster grouping, we identify key concepts and thematic structures in each group’s reviews.

In the leisure network, words such as location, friendly, staff, bed, and comfortable emerged as central. Clusters show themes around positive emotional experiences, staff interactions, and convenient access to transportation (station, walk, minutes). Leisure reviews include expressive modifiers like perfect, lovely, and extremely, reflecting a more subjective tone.

In contrast, the business network is more tightly centered around practical terms such as convenient, central, location, staff, and reception. Words like wifi, desk, floor, and credit card appear in smaller clusters, suggesting attention to logistics and amenities. Emotional or descriptive words are relatively sparse.

These network structures support the earlier hypothesis: travel purpose shapes not only word choice but also how those words are connected, highlighting different focuses in how leisure and business travelers describe their hotel experiences.

# Step 1: Create bigrams from both negative and positive reviews
# Tokenize text into bigrams (two-word combinations) for each review type
negative_review_bigrams <- total_review %>% 
  select(Negative_Review, purpose) %>% 
  unnest_tokens(bigram, Negative_Review, token = "ngrams", n = 2) %>%
  filter(!is.na(bigram))

positive_review_bigrams <- total_review %>% 
  select(Positive_Review, purpose) %>% 
  unnest_tokens(bigram, Positive_Review, token = "ngrams", n = 2) %>%
  filter(!is.na(bigram))

# Step 2: Combine positive and negative bigrams for each travel purpose
# Separate data into leisure and business categories
leisure_bigrams <- bind_rows(
  positive_review_bigrams %>%  filter(purpose =="leisure"),
  negative_review_bigrams %>% filter(purpose=="leisure"))
head(leisure_bigrams)

##   purpose       bigram
## 1 leisure     only the
## 2 leisure     the park
## 3 leisure park outside
## 4 leisure   outside of
## 5 leisure       of the
## 6 leisure    the hotel

business_bigrams <- bind_rows(
  positive_review_bigrams %>%  filter(purpose =="business"),
  negative_review_bigrams %>% filter(purpose=="business"))
head(business_bigrams)

##    purpose          bigram
## 1 business  style location
## 2 business  location rooms
## 3 business      this hotel
## 4 business        hotel is
## 5 business        is being
## 6 business being renovated

# Step 3: Separate the bigram into individual words
# e.g., "comfortable bed" -> "comfortable", "bed"
leisure_bigrams_separated <- leisure_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

leisure_bigrams_filtered <- leisure_bigrams_separated%>% 
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)
head(leisure_bigrams_filtered)

##   purpose    word1        word2
## 1 leisure     real   complaints
## 2 leisure location surroundings
## 3 leisure  deposit     payments
## 4 leisure    staff        offer
## 5 leisure original      payment
## 6 leisure      bit    confusing

# Apply same process to business reviews
business_bigrams_separated <- business_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

business_bigrams_filtered <- business_bigrams_separated%>% 
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)
head(business_bigrams_filtered)

##    purpose  word1     word2
## 1 business  style  location
## 2 business unique structure
## 3 business double     paned
## 4 business  paned     glass
## 5 business  glass    window
## 6 business   lush  greenery

# Step 4: Count the frequency of each filtered bigram pair
leisure_bigram_counts <- leisure_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

business_bigram_counts <- business_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

# Step 5: Construct a network graph from top 40 bigrams
# Each node is a word; edges represent co-occurrence as bigram
# Centrality and community grouping are calculated
set.seed(1234)

leisure_graph <- leisure_bigram_counts %>%
  slice_max(n, n = 40) %>% 
  as_tbl_graph(directed = F) %>%
  mutate(centrality = centrality_degree(),    # network centrality
         group = as.factor(group_infomap()))  # community

business_graph <- business_bigram_counts %>%
  slice_max(n, n = 40) %>% 
  as_tbl_graph(directed = F) %>%
  mutate(centrality = centrality_degree(),    # network centrality
         group = as.factor(group_infomap()))  # community

# Step 6: Visualize the leisure review network
# Nodes are colored by community and sized by centrality
set.seed(1234)
ggraph(leisure_graph, layout = "kk") +      # layout
  geom_edge_link(color = "gray50",          # edge color
                 alpha = 0.5) +             # edge contrast
  geom_node_point(aes(size = centrality,    # node size
                      color = group),       # node color
                  show.legend = F) +        # legend removal
  scale_size(range = c(4, 15)) +            # node size range
  geom_node_text(aes(label = name),         # text display
                 repel = T,                 # off-node display
                 size = 5) +                # text size
  theme_graph()+                             # no backgrounds
  labs(title = "3-1. Leisure Purpose Hotel Review Network") #title

# Step 7: Visualize the business review network
set.seed(1234)
ggraph(business_graph, layout = "kk") +
  geom_edge_link(color = "gray50", 
                 alpha = 0.5) +
  geom_node_point(aes(size = centrality,
                      color = group), 
                  show.legend = F) +  
  scale_size(range = c(4, 15)) +      
  geom_node_text(aes(label = name),       
                 repel = T,                 
                 size = 5) +               
  theme_graph() +
  labs(title = "3-2. Business Purpose Hotel Review Network")

Conclusion

This study examined how hotel reviews differ between business and leisure travelers by analyzing review scores, sentiment tendencies, and textual content.

Regarding the first research question, the analysis of review scores and NRC-based sentiment revealed that business travelers tend to give slightly lower ratings and express more negative sentiments, while leisure travelers show more positive emotional responses. Although the differences were not drastic, they consistently indicated that leisure trips are generally perceived more favorably.

For the second question, keyword and bigram analyses highlighted distinctive language patterns in each group. Leisure reviews included more emotionally expressive and celebratory language, often centered on comfort, joy, and memorable experiences. In contrast, business reviews emphasized practicality, efficiency, and functional aspects such as location, wifi, and staff responsiveness. Bigram network structures further demonstrated how the two groups organize and connect words differently when describing their stay.

Overall, the findings suggest that trip purpose meaningfully influences not only what travelers write, but how they express and structure their experiences in hotel reviews. These insights can be valuable for the hospitality industry in developing more targeted services and marketing strategies.

Final_report: analysis of hotel review by trip purpose(leisure/business)

Chaehyun Lee

2025-06-10