There is power in a “review.” In today’s digital era, many consumers check reviews before making purchasing decisions. Hearing directly from real users—rather than relying solely on corporate messaging—has significantly changed consumption behavior. Reviews reflect personal experiences, both positive and negative, and benefit not only consumers but also businesses. With the rise of accommodation booking platforms, travelers now have the opportunity to read and write reviews about their stays. Since accommodations often represent a carefully chosen part of one’s limited and valuable travel time, people tend to make more deliberate decisions—making reviews even more influential.
This report analyzes customer reviews of luxury hotels, focusing on the purpose of travel. The primary objective is to explore whether there are significant textual differences between reviews written by ‘business travelers’ and those written by ‘leisure travelers’.
Understanding these differences can benefit all stakeholders. Hotels can gain insight into what different types of guests value most. Travelers can use such insights to better select accommodations aligned with their travel purpose. Moreover, booking platforms can enhance their recommendation algorithms by incorporating travel-purpose-based keyword analysis, ultimately offering more tailored suggestions to users.
The analysis addresses two key questions: Do review scores and sentiment tendencies differ between business and leisure travelers? What specific words or expressions are frequently mentioned in each group’s reviews?
I hypothesized that reviews from business travelers would contain a higher proportion of negative sentiments, while reviews from leisure travelers would be more positive overall. I also expected that business trip reviews would frequently mention practical aspects such as Wi-Fi and bedding, whereas leisure trip reviews would highlight amenities like pools, bars, and other recreational facilities.
After conducting the analysis, the results revealed several notable differences between leisure and business trip reviewers. There are slight but observable differences between the two groups in both review scores and sentiment tendencies. Business travelers tend to give slightly lower scores and express more negative sentiments, while leisure travelers show more positive emotions such as joy and anticipation. These differences, though subtle, suggest that leisure trips are generally perceived more favorably.
keyword analysis indicates that while both groups use common terms, their word choices reflect distinct priorities: leisure travelers often employ emotionally rich and celebratory language, whereas business travelers focus on practicality and work-related contexts.
Bigram analysis further shows that leisure reviews emphasize emotional and experiential expressions, while business reviews highlight functional and convenience-oriented phrases. This suggests structural differences in how each group describes their stay.
The dataset was scraped from Booking.com and contains approximately 515,000 customer reviews and scores for 1,493 luxury hotels across Europe. The data was collected over a two-year period, from August 4, 2015 to August 3, 2017.
The CSV file includes 17 fields, out of which five variables were selected for the analysis. A brief description of each selected field is provided below: - Hotel_Name: The name of the hotel. - Negative_Review: The negative review provided by the reviewer. If there is no negative review, the field contains ‘No Negative’. - Positive_Review: The positive review provided by the reviewer. If there is no positive review, the field contains ‘No Positive’. - Reviewer_Score: The score assigned to the hotel by the reviewer, based on their overall experience. - Tags: Descriptive tags selected by the reviewer, often indicating the purpose of stay, type of trip, or room information.
The remaining variables not used in this analysis are described below. - Hotel_Address: Address of hotel. - Review_Date: Date when reviewer posted the corresponding review. - Average_Score: Average Score of the hotel, calculated based on the latest comment in the last year. - Reviewer_Nationality: Nationality of Reviewer - Review_Total_Negative_Word_Counts: Total number of words in the negative review. - Review_Total_Positive_Word_Counts: Total number of words in the positive review. - Total_Number_of_Reviews_Reviewer_Has_Given: Number of Reviews the reviewers has given in the past. - Total_Number_of_Reviews: Total number of valid reviews the hotel has. - days_since_review: Duration between the review date and scrape date. - Additional_Number_of_Scoring: There are also some guests who just made a scoring on the service rather than a review. This number indicates how many valid scores without review in there. - lat: Latitude of the hotel - lng: longtitude of the hotel
First, load Hotel_Reviews.csv dataset into ‘df_hotel’ variable.
df_hotel <- read.csv("Hotel_Reviews.csv")
head(df_hotel)
## Hotel_Address
## 1 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 2 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 3 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 4 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 5 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## 6 s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands
## Additional_Number_of_Scoring Review_Date Average_Score Hotel_Name
## 1 194 8/3/2017 7.7 Hotel Arena
## 2 194 8/3/2017 7.7 Hotel Arena
## 3 194 7/31/2017 7.7 Hotel Arena
## 4 194 7/31/2017 7.7 Hotel Arena
## 5 194 7/24/2017 7.7 Hotel Arena
## 6 194 7/24/2017 7.7 Hotel Arena
## Reviewer_Nationality
## 1 Russia
## 2 Ireland
## 3 Australia
## 4 United Kingdom
## 5 New Zealand
## 6 Poland
## Negative_Review
## 1 I am so angry that i made this post available via all possible sites i use when planing my trips so no one will make the mistake of booking this place I made my booking via booking com We stayed for 6 nights in this hotel from 11 to 17 July Upon arrival we were placed in a small room on the 2nd floor of the hotel It turned out that this was not the room we booked I had specially reserved the 2 level duplex room so that we would have a big windows and high ceilings The room itself was ok if you don t mind the broken window that can not be closed hello rain and a mini fridge that contained some sort of a bio weapon at least i guessed so by the smell of it I intimately asked to change the room and after explaining 2 times that i booked a duplex btw it costs the same as a simple double but got way more volume due to the high ceiling was offered a room but only the next day SO i had to check out the next day before 11 o clock in order to get the room i waned to Not the best way to begin your holiday So we had to wait till 13 00 in order to check in my new room what a wonderful waist of my time The room 023 i got was just as i wanted to peaceful internal garden view big window We were tired from waiting the room so we placed our belongings and rushed to the city In the evening it turned out that there was a constant noise in the room i guess it was made by vibrating vent tubes or something it was constant and annoying as hell AND it did not stop even at 2 am making it hard to fall asleep for me and my wife I have an audio recording that i can not attach here but if you want i can send it via e mail The next day the technician came but was not able to determine the cause of the disturbing sound so i was offered to change the room once again the hotel was fully booked and they had only 1 room left the one that was smaller but seems newer
## 2 No Negative
## 3 Rooms are nice but for elderly a bit difficult as most rooms are two story with narrow steps So ask for single level Inside the rooms are very very basic just tea coffee and boiler and no bar empty fridge
## 4 My room was dirty and I was afraid to walk barefoot on the floor which looked as if it was not cleaned in weeks White furniture which looked nice in pictures was dirty too and the door looked like it was attacked by an angry dog My shower drain was clogged and the staff did not respond to my request to clean it On a day with heavy rainfall a pretty common occurrence in Amsterdam the roof in my room was leaking luckily not on the bed you could also see signs of earlier water damage I also saw insects running on the floor Overall the second floor of the property looked dirty and badly kept On top of all of this a repairman who came to fix something in a room next door at midnight was very noisy as were many of the guests I understand the challenges of running a hotel in an old building but this negligence is inconsistent with prices demanded by the hotel On the last night after I complained about water damage the night shift manager offered to move me to a different room but that offer came pretty late around midnight when I was already in bed and ready to sleep
## 5 You When I booked with your company on line you showed me pictures of a room I thought I was getting and paying for and then when we arrived that s room was booked and the staff told me we could only book the villa suite theough them directly Which was completely false advertising After being there we realised that you have grouped lots of rooms on the photos together leaving me the consumer confused and extreamly disgruntled especially as its my my wife s 40th birthday present Please make your website more clear through pricing and photos as again I didn t really know what I was paying for and how much it had wnded up being Your photos told me I was getting something I wasn t Not happy and won t be using you again
## 6 Backyard of the hotel is total mess shouldn t happen in hotel with 4 stars
## Review_Total_Negative_Word_Counts Total_Number_of_Reviews
## 1 397 1403
## 2 0 1403
## 3 42 1403
## 4 210 1403
## 5 140 1403
## 6 17 1403
## Positive_Review
## 1 Only the park outside of the hotel was beautiful
## 2 No real complaints the hotel was great great location surroundings rooms amenities and service Two recommendations however firstly the staff upon check in are very confusing regarding deposit payments and the staff offer you upon checkout to refund your original payment and you can make a new one Bit confusing Secondly the on site restaurant is a bit lacking very well thought out and excellent quality food for anyone of a vegetarian or vegan background but even a wrap or toasted sandwich option would be great Aside from those minor minor things fantastic spot and will be back when i return to Amsterdam
## 3 Location was good and staff were ok It is cute hotel the breakfast range is nice Will go back
## 4 Great location in nice surroundings the bar and restaurant are nice and have a lovely outdoor area The building also has quite some character
## 5 Amazing location and building Romantic setting
## 6 Good restaurant with modern design great chill out place Great park nearby the hotel and awesome main stairs
## Review_Total_Positive_Word_Counts Total_Number_of_Reviews_Reviewer_Has_Given
## 1 11 7
## 2 105 7
## 3 21 9
## 4 26 1
## 5 8 3
## 6 20 1
## Reviewer_Score
## 1 2.9
## 2 7.5
## 3 7.1
## 4 3.8
## 5 6.7
## 6 6.7
## Tags
## 1 [' Leisure trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 6 nights ']
## 2 [' Leisure trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 4 nights ']
## 3 [' Leisure trip ', ' Family with young children ', ' Duplex Double Room ', ' Stayed 3 nights ', ' Submitted from a mobile device ']
## 4 [' Leisure trip ', ' Solo traveler ', ' Duplex Double Room ', ' Stayed 3 nights ']
## 5 [' Leisure trip ', ' Couple ', ' Suite ', ' Stayed 2 nights ', ' Submitted from a mobile device ']
## 6 [' Leisure trip ', ' Group ', ' Duplex Double Room ', ' Stayed 1 night ']
## days_since_review lat lng
## 1 0 days 52.36058 4.915968
## 2 0 days 52.36058 4.915968
## 3 3 days 52.36058 4.915968
## 4 3 days 52.36058 4.915968
## 5 10 days 52.36058 4.915968
## 6 10 days 52.36058 4.915968
Upon examining the Tags column in the df_hotel data frame, it was found that each entry contains multiple pieces of information related to the trip, such as: [‘Leisure trip’, ‘Couple’, ‘Duplex Double Room’, ‘Stayed 6 nights’]
Among these, the purpose of the trip—whether it was for leisure, business, or unspecified—is embedded within the string, which resembles a list but is actually stored as a single text value.
For this analysis, only the trip purpose was extracted and categorized into a new column called purpose, with three possible values: leisure, business, or unknown.
As a result of this classification: The number of reviews tagged as business trips was 82,939, Those tagged as leisure trips numbered 417,778, And 15,021 entries did not specify the purpose and were classified as unknown.
df_hotel <- df_hotel %>%
mutate(purpose = ifelse(str_detect(Tags, "Leisure trip"), "leisure",
ifelse(str_detect(Tags, "Business trip"), "business", NA)))
df_hotel %>%
count(purpose)
## purpose n
## 1 business 82939
## 2 leisure 417778
## 3 <NA> 15021
The 15,021 entries with an unknown trip purpose were excluded from the analysis. Although the number of leisure reviews was about five times greater than that of business reviews, the business category still contained a substantial number of entries. Therefore, to ensure a fair comparison between the two groups, the analysis focused on examining the relative proportions within each category, rather than relying on raw counts.
From the df_hotel data frame, five columns relevant to the analysis were selected and stored in a new variable called “total_review”.
total_review <- df_hotel %>%
filter(!is.na(purpose)) %>%
select(Average_Score, Hotel_Name, Negative_Review, Positive_Review, Reviewer_Score, Tags, purpose)
total_review %>%
count(purpose)
## purpose n
## 1 business 82939
## 2 leisure 417778
summary(total_review)
## Average_Score Hotel_Name Negative_Review Positive_Review
## Min. :5.200 Length:500717 Length:500717 Length:500717
## 1st Qu.:8.100 Class :character Class :character Class :character
## Median :8.400 Mode :character Mode :character Mode :character
## Mean :8.401
## 3rd Qu.:8.800
## Max. :9.800
## Reviewer_Score Tags purpose
## Min. : 2.500 Length:500717 Length:500717
## 1st Qu.: 7.500 Class :character Class :character
## Median : 8.800 Mode :character Mode :character
## Mean : 8.403
## 3rd Qu.: 9.600
## Max. :10.000
In this dataset, the positive and negative aspects of each review have already been separated and stored in two distinct columns(“Negative_Review”, “Positive_Review”). However, since the reasons for positive or negative feedback can vary depending on the specific circumstances of each hotel, this analysis focuses not on separating them, but rather on combining both to examine the most frequently mentioned keywords. These commonly mentioned terms are likely to reflect the factors that guests consider most important when choosing or evaluating a hotel.
To facilitate this analysis, the tokenized words from both positive and negative reviews are stored in a new variable named total_review_word, where each tokenized word is held in the column word, along with the associated travel purpose.
negative_review_word <- total_review %>%
select(Negative_Review, purpose) %>%
unnest_tokens(word, Negative_Review) %>% # Tokenize the text into words
anti_join(stop_words) # Remove stop words
## Joining with `by = join_by(word)`
head(negative_review_word)
## purpose word
## 1 leisure angry
## 2 leisure post
## 3 leisure sites
## 4 leisure planing
## 5 leisure trips
## 6 leisure mistake
positive_review_word <- total_review %>%
select(Positive_Review, purpose) %>%
unnest_tokens(word, Positive_Review) %>% # Tokenize the text into words
anti_join(stop_words) # Remove stop words
## Joining with `by = join_by(word)`
head(positive_review_word)
## purpose word
## 1 leisure park
## 2 leisure hotel
## 3 leisure beautiful
## 4 leisure real
## 5 leisure complaints
## 6 leisure hotel
total_review_word <-
bind_rows(positive_review_word, negative_review_word)
head(total_review_word)
## purpose word
## 1 leisure park
## 2 leisure hotel
## 3 leisure beautiful
## 4 leisure real
## 5 leisure complaints
## 6 leisure hotel
The analysis addresses two key questions: Do review scores and sentiment tendencies differ between business and leisure travelers? -> Anaysis and Figure1
What specific words or expressions are frequently mentioned in each group’s reviews? -> Anaysis and Figure2,3
To understand how review scores differ based on the travel purpose, this plot visualizes the distribution of “Reviewer_Score” for both business and leisure travelers. Since the review score itself can be considered part of the overall review, it is important to examine its distribution before conducting text analysis.
From the resulting graph, we observe that: - Both types of travelers tend to give higher scores overall (scores concentrated between 7 and 10), - However, business travelers have a relatively higher proportion of scores below 8, - While leisure travelers are more likely to give scores above 8.
This suggests that business travelers may be slightly more critical in their evaluations compared to leisure travelers.
total_review %>%
count(purpose, Reviewer_Score) %>% #Counting the number of reviews per score and purpose
left_join(total_review %>%
count(purpose, name = "total"), by = "purpose") %>% #Calculating the total number of reviews per purpose
mutate(ratio = n / total * 100) %>% #computing the ratio to obtain the percentage
ggplot(aes(x = Reviewer_Score, y = ratio, group = purpose, color = purpose)) +
geom_line(size = 1, show.legend = TRUE) +
labs(
x = "Reviewer Score",
y = "Percentage of Reviews",
title = "1-1. Distribution of Reviewer Scores by Purpose",
color = "Purpose"
) +
scale_color_manual(
values = c("leisure" = "tomato", "business" = "steelblue") #custom colors assigned to each purpose
) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
To complement the analysis of numerical review scores, sentiment
analysis was conducted on the textual content of reviews to uncover
underlying emotional patterns across different travel purposes.
The sentiment distribution in reviews reveals only subtle differences between business and leisure travelers:
Leisure travelers show slightly higher proportions of joy (15.3%), positive (29%), and anticipation (9.9%), suggesting more emotional engagement and enthusiasm.
Business travelers display a marginally higher share of negative sentiment (13.5%), which may imply a more critical or evaluative attitude.
Other categories such as trust, sadness, and anger show minimal variation between the two groups.
These results indicate that while overall sentiment patterns are similar, leisure travelers tend to express slightly more positive emotions, possibly due to being on vacation, while business travelers may be more pragmatic or critical in tone.
# loading NRC sentiment data
# sentiment_list variable contains list of emotions that NRC work with.
nrc <- get_sentiments("nrc")
table(nrc$sentiment) # Check available sentiment categories
##
## anger anticipation disgust fear joy negative
## 1245 837 1056 1474 687 3316
## positive sadness surprise trust
## 2308 1187 532 1230
# Count the occurrence of each sentiment per travel purpose
sentiment_count <- total_review_word %>%
group_by(purpose) %>%
inner_join(nrc) %>% # Match words with NRC sentiment categories
count(sentiment) %>%
mutate(total = sum(n)) %>%
mutate(ratio = n/total*100) # Convert counts to percentages
## Joining with `by = join_by(word)`
## Warning in inner_join(., nrc): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 1615 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
head(sentiment_count)
## # A tibble: 6 × 5
## # Groups: purpose [1]
## purpose sentiment n total ratio
## <chr> <chr> <int> <int> <dbl>
## 1 business anger 25693 605299 4.24
## 2 business anticipation 56083 605299 9.27
## 3 business disgust 16966 605299 2.80
## 4 business fear 16777 605299 2.77
## 5 business joy 81532 605299 13.5
## 6 business negative 72749 605299 12.0
# Create a donut chart to visualize sentiment distribution by trip purpose
ggplot(sentiment_count, aes(x = 2, y = ratio, fill = sentiment)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(theta = "y") + # Convert to circular layout
facet_wrap(~purpose) + # Split by travel purpose
geom_text(aes(label = paste0(round(ratio, 1), "%")),
position = position_stack(vjust = 0.5),
size = 3) +
labs(title = "1-2. Sentiment Distribution in Reviews by Trip Purpose") +
theme_void() +
theme(legend.title = element_blank(),
plot.title = element_text(hjust = 0.5)) +
xlim(0.5, 2.5) +
theme_light()
To demonstrate the second hypothesis, the frequency of words mentioned in reviews by each traveler type was calculated and visualized as a bar graph While there is a significant overlap in common terms—such as staff, location, hotel, breakfast, and bed—some words suggest each group’s unique focus.
For instance, business reviews often include wifi, quiet, and location, which align with practical concerns for working travelers. In contrast, leisure reviews contain more emotionally expressive words such as lovely, amazing, and bar, reflecting vacation-related experiences.
word_frequency <- total_review_word %>%
count(purpose, word, sort=T) %>% # Count the frequency of words
left_join(total_review_word %>%
count(purpose, name="total")) # Combine a dataset with a column of the total number of ~
## Joining with `by = join_by(purpose)`
word_frequency %>%
mutate(freq = n / total * 100,
purpose = factor(purpose, levels = c("leisure", "business"))) %>% # Set order: leisure first
group_by(purpose) %>%
slice_max(freq, n = 20) %>%
ggplot(aes(x = reorder_within(word, n, purpose), y = freq, fill = purpose)) +
geom_col(show.legend = FALSE) +
scale_x_reordered() +
coord_flip() +
facet_wrap(~purpose, scales = "free_y") +
labs(x = NULL, y = "Word Frequency (%)", title = "2-1. Top 20 Words in Reviews by Trip Purpose") +
theme_minimal()
Because many of the top words were neutral or overlapping, we proceeded
to a more refined analysis using log odds ratio to identify distinctive
keywords that characterize each group’s review language.
Words with positive log ratios (right side) are more common in leisure reviews and are often related to celebration and personal occasions, such as anniversary, honeymoon, birthday, and balloons.
Words with negative log ratios (left side) appear more often in business reviews and include terms like congress, exam, workplace, invoices, and meetings, indicating a professional or work-related context.
This allows us to clearly distinguish between emotional/personal vocabulary used in leisure travel and pragmatic/functional vocabulary used in business travel.
# Step 1: Word frequency and reshaping
word_ratios <- total_review_word %>%
count(word, purpose) %>%
group_by(word) %>%
filter(sum(n) >= 10) %>%
ungroup() %>%
pivot_wider(names_from = purpose, values_from = n, values_fill = list(n = 0)) %>%
mutate(
leisure = (leisure + 1) / (sum(leisure + 1)), # Laplace smoothing
business = (business + 1) / (sum(business + 1)),
logratio = log(leisure / business),
category = ifelse(logratio > 0, "Leisure", "Business") # for better legend
) %>%
arrange(desc(logratio))
# Step 2: Visualization (Top 15 for each)
word_ratios %>%
group_by(category) %>%
slice_max(abs(logratio), n = 15, with_ties = FALSE) %>%
ungroup() %>%
mutate(word = reorder(word, logratio)) %>%
ggplot(aes(word, logratio, fill = category)) +
geom_col(show.legend = TRUE) +
coord_flip() +
labs(
x = NULL,
y = "Log Odds Ratio (Leisure / Business)",
title = "2-2. Top Distinctive Words in Reviews by Trip Purpose",
fill = "Dominant in"
) +
theme_minimal()+
scale_fill_manual(
values = c("Leisure" = "tomato", "Business" = "steelblue"),
name = "Trip Purpose"
)
To explore not only individual word usage but also contextual meaning, we visualized bigram networks derived from hotel reviews by leisure and business travelers. Each network node represents a word, and edges represent co-occurrence within a two-word phrase (bigram). By analyzing node centrality and cluster grouping, we identify key concepts and thematic structures in each group’s reviews.
In the leisure network, words such as location, friendly, staff, bed, and comfortable emerged as central. Clusters show themes around positive emotional experiences, staff interactions, and convenient access to transportation (station, walk, minutes). Leisure reviews include expressive modifiers like perfect, lovely, and extremely, reflecting a more subjective tone.
In contrast, the business network is more tightly centered around practical terms such as convenient, central, location, staff, and reception. Words like wifi, desk, floor, and credit card appear in smaller clusters, suggesting attention to logistics and amenities. Emotional or descriptive words are relatively sparse.
These network structures support the earlier hypothesis: travel purpose shapes not only word choice but also how those words are connected, highlighting different focuses in how leisure and business travelers describe their hotel experiences.
# Step 1: Create bigrams from both negative and positive reviews
# Tokenize text into bigrams (two-word combinations) for each review type
negative_review_bigrams <- total_review %>%
select(Negative_Review, purpose) %>%
unnest_tokens(bigram, Negative_Review, token = "ngrams", n = 2) %>%
filter(!is.na(bigram))
positive_review_bigrams <- total_review %>%
select(Positive_Review, purpose) %>%
unnest_tokens(bigram, Positive_Review, token = "ngrams", n = 2) %>%
filter(!is.na(bigram))
# Step 2: Combine positive and negative bigrams for each travel purpose
# Separate data into leisure and business categories
leisure_bigrams <- bind_rows(
positive_review_bigrams %>% filter(purpose =="leisure"),
negative_review_bigrams %>% filter(purpose=="leisure"))
head(leisure_bigrams)
## purpose bigram
## 1 leisure only the
## 2 leisure the park
## 3 leisure park outside
## 4 leisure outside of
## 5 leisure of the
## 6 leisure the hotel
business_bigrams <- bind_rows(
positive_review_bigrams %>% filter(purpose =="business"),
negative_review_bigrams %>% filter(purpose=="business"))
head(business_bigrams)
## purpose bigram
## 1 business style location
## 2 business location rooms
## 3 business this hotel
## 4 business hotel is
## 5 business is being
## 6 business being renovated
# Step 3: Separate the bigram into individual words
# e.g., "comfortable bed" -> "comfortable", "bed"
leisure_bigrams_separated <- leisure_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
leisure_bigrams_filtered <- leisure_bigrams_separated%>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
head(leisure_bigrams_filtered)
## purpose word1 word2
## 1 leisure real complaints
## 2 leisure location surroundings
## 3 leisure deposit payments
## 4 leisure staff offer
## 5 leisure original payment
## 6 leisure bit confusing
# Apply same process to business reviews
business_bigrams_separated <- business_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
business_bigrams_filtered <- business_bigrams_separated%>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
head(business_bigrams_filtered)
## purpose word1 word2
## 1 business style location
## 2 business unique structure
## 3 business double paned
## 4 business paned glass
## 5 business glass window
## 6 business lush greenery
# Step 4: Count the frequency of each filtered bigram pair
leisure_bigram_counts <- leisure_bigrams_filtered %>%
count(word1, word2, sort = TRUE)
business_bigram_counts <- business_bigrams_filtered %>%
count(word1, word2, sort = TRUE)
# Step 5: Construct a network graph from top 40 bigrams
# Each node is a word; edges represent co-occurrence as bigram
# Centrality and community grouping are calculated
set.seed(1234)
leisure_graph <- leisure_bigram_counts %>%
slice_max(n, n = 40) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(), # network centrality
group = as.factor(group_infomap())) # community
business_graph <- business_bigram_counts %>%
slice_max(n, n = 40) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(), # network centrality
group = as.factor(group_infomap())) # community
# Step 6: Visualize the leisure review network
# Nodes are colored by community and sized by centrality
set.seed(1234)
ggraph(leisure_graph, layout = "kk") + # layout
geom_edge_link(color = "gray50", # edge color
alpha = 0.5) + # edge contrast
geom_node_point(aes(size = centrality, # node size
color = group), # node color
show.legend = F) + # legend removal
scale_size(range = c(4, 15)) + # node size range
geom_node_text(aes(label = name), # text display
repel = T, # off-node display
size = 5) + # text size
theme_graph()+ # no backgrounds
labs(title = "3-1. Leisure Purpose Hotel Review Network") #title
# Step 7: Visualize the business review network
set.seed(1234)
ggraph(business_graph, layout = "kk") +
geom_edge_link(color = "gray50",
alpha = 0.5) +
geom_node_point(aes(size = centrality,
color = group),
show.legend = F) +
scale_size(range = c(4, 15)) +
geom_node_text(aes(label = name),
repel = T,
size = 5) +
theme_graph() +
labs(title = "3-2. Business Purpose Hotel Review Network")
This study examined how hotel reviews differ between business and leisure travelers by analyzing review scores, sentiment tendencies, and textual content.
Regarding the first research question, the analysis of review scores and NRC-based sentiment revealed that business travelers tend to give slightly lower ratings and express more negative sentiments, while leisure travelers show more positive emotional responses. Although the differences were not drastic, they consistently indicated that leisure trips are generally perceived more favorably.
For the second question, keyword and bigram analyses highlighted distinctive language patterns in each group. Leisure reviews included more emotionally expressive and celebratory language, often centered on comfort, joy, and memorable experiences. In contrast, business reviews emphasized practicality, efficiency, and functional aspects such as location, wifi, and staff responsiveness. Bigram network structures further demonstrated how the two groups organize and connect words differently when describing their stay.
Overall, the findings suggest that trip purpose meaningfully influences not only what travelers write, but how they express and structure their experiences in hotel reviews. These insights can be valuable for the hospitality industry in developing more targeted services and marketing strategies.