Data Analysis Report Team 1

Airbnb Logo

Does offering an enological experience impact Airbnb ratings, and is this moderated by Superhost status?

Introduction

This report analyzes how offering an enological experience (e.g., wine tasting, offering a glass of wine) impacts an Airbnb listing’s review score ratings in Florence. This relationship is moderated by Superhost status.

Objectives and Expected Results

Analysis Objectives

  • Explore how offering an enological experience impacts an Airbnb listing’s review scores.
  • Examine whether this relationship is moderated by the host’s Superhost status.

Expected Results

  • Listings that offer enological experiences may receive higher review scores.
  • Superhost status might enhance the positive effect of enological experiences.
  • The findings could help hosts optimize their service offerings to improve ratings.

Potential Business Actions

  • Airbnb hosts could leverage enological experiences as a differentiator.
  • Marketing efforts could emphasize wine-related offerings in Airbnb descriptions.
  • Superhosts could maximize their advantage by incorporating wine-related experiences.

Load Necessary Libraries


``` r
library(readxl) 
library(dplyr)          
library(stringr)
library(ggplot2) 
library(car)
library(jpeg)     

Import Airbnb Data


``` r
# Set working directory
setwd("/Users/djemkasahinpasic/Documents/LUISS/Uni/Market Data Analysis/Project 1 - Florence")

# Load Data
reviews <- read_excel("reviews.xlsx")
listings <- read_excel("listings.xlsx")

Key Variables

Listings data:

Listing Information: listing_id, listing_url, last_scraped, name, description

Host Information: host_id, host_name, host_since, host_location

Location: neighborhood_cleansed (renamed to *neighborhood`*)

Pricing: price

Reviews & Ratings: review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value

Amenities: selected_amenities

Host is a SuperHost: host_is_superhost

Reviews data:

Listing Information: listing_id

Host Information: id

Date information: date

Data Analysis

Rename the column for ease of use

listings <- listings %>% rename("neighborhood" = "neighbourhood_cleansed", ) 

Create column enological_experience based on keywords in reviews

reviews <- reviews %>%
  mutate(
    enological_experience = ifelse(
      str_detect(tolower(comments), "vino|wine|vin|wein|vinho|wino|şarap|wijn|viini"), 
      1, 
      0
    )
  )

Merge reviews dataset with listings dataset

reviews_listings <- merge(reviews, listings, by = "listing_id", all.x = TRUE)

Create new dataset with key variables

final_dataset <- reviews_listings %>%
  select(listing_id, review_scores_rating, enological_experience, host_is_superhost)

Transform host_is_superhost into a dummy (1 = Superhost, 0 = No)

final_dataset$host_is_superhost <- ifelse(final_dataset$host_is_superhost == "t", 1, 0)

Remove rows with missing values

final_dataset <- na.omit(final_dataset)

View first rows

str(final_dataset)
## 'data.frame':    209792 obs. of  4 variables:
##  $ listing_id           : num  31840 31840 31840 31840 31840 ...
##  $ review_scores_rating : num  4.66 4.66 4.66 4.66 4.66 4.66 4.88 4.74 4.74 4.74 ...
##  $ enological_experience: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ host_is_superhost    : num  0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "na.action")= 'omit' Named int [1:17119] 403 404 405 406 407 408 409 410 411 412 ...
##   ..- attr(*, "names")= chr [1:17119] "403" "404" "405" "406" ...
head(final_dataset)
##   listing_id review_scores_rating enological_experience host_is_superhost
## 1      31840                 4.66                     0                 0
## 2      31840                 4.66                     0                 0
## 3      31840                 4.66                     0                 0
## 4      31840                 4.66                     0                 0
## 5      31840                 4.66                     0                 0
## 6      31840                 4.66                     0                 0

Summary statistics for enological experience, superhost status(1), and ratings

table(final_dataset$host_is_superhost)
## 
##      0      1 
##  72494 137298
table(final_dataset$host_is_superhost)
## 
##      0      1 
##  72494 137298
host_counts <- table(final_dataset$host_is_superhost)
summary(final_dataset$review_scores_rating)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.72    4.85    4.80    4.93    5.00

Statics on price

reviews_listings$price <- as.numeric(gsub("[^0-9.]", "", reviews_listings$price))
summary(reviews_listings$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    10.0   101.0   142.0   197.3   201.0 92324.0    5324
price_summary <- summary(reviews_listings$price, na.rm = TRUE)
cat("The average price in Florence is", round(mean(reviews_listings$price, na.rm = TRUE), 2), 
    ", the maximum price is", max(reviews_listings$price, na.rm = TRUE), 
    ", the minimum price is", min(reviews_listings$price, na.rm = TRUE), "\n")
## The average price in Florence is 197.3 , the maximum price is 92324 , the minimum price is 10

Scatter Plots for each key variable

Distribution of review scores

ggplot(final_dataset, aes(x = review_scores_rating)) +
  geom_histogram(binwidth = 0.5, fill = "blue", alpha = 0.7, color = "black") +
  theme_minimal() +
  labs(title = "Distribution of Review Scores", x = "Review Score", y = "Frequency")

# The chart displays the distribution of Airbnb listing review scores. Most reviews are concentrated at higher ratings, indicating an overall trend toward positive evaluations. However, some variation is present, with a limited number of lower scores. This suggests that while perceived quality is generally high, some exceptions exist, possibly influenced by factors such as the offered experience or service quality.

Percentage of comments containing “wine”

final_dataset$enological_experience <- as.numeric(final_dataset$enological_experience)
total_comments <- nrow(final_dataset)
wine_comments <- sum(final_dataset$enological_experience == 1)
non_wine_comments <- sum(final_dataset$enological_experience == 0)
percentage_wine <- wine_comments / total_comments * 100

cat("The total number of comments is", total_comments, "\n")
## The total number of comments is 209792
cat("The number of comments mentioning 'wine' is", wine_comments, "\n")
## The number of comments mentioning 'wine' is 12362
cat("The number of comments that do not mention an enological experience is", non_wine_comments, "\n")
## The number of comments that do not mention an enological experience is 197430
cat("The percentage of comments mentioning 'wine' is", round(percentage_wine, 2), "%\n")
## The percentage of comments mentioning 'wine' is 5.89 %
ggplot(data.frame(category = c("Contains 'wine'", "Does not contain 'wine'"), 
                  count = c(wine_comments, total_comments - wine_comments)), 
       aes(x = category, y = count, fill = category)) +
  geom_bar(stat = "identity", alpha = 0.7) +
  theme_minimal() +
  labs(title = "Proportion of Comments Mentioning 'Wine'", x = "Comment Type", y = "Count")

Difference between hosts and Superhosts

ggplot(final_dataset, aes(x = factor(host_is_superhost, labels = c("Host", "Superhost")), fill = factor(host_is_superhost))) +
  geom_bar(alpha = 0.7) +
  theme_minimal() +
  labs(title = "Number of Hosts vs Superhosts", x = "Superhost Status", y = "Count") +
  scale_fill_manual(values = c("red", "blue"), labels = c("Host", "Superhost"))

cat("The total number of hosts is", host_counts["0"], ", the total number of Superhosts is", host_counts["1"], "\n")
## The total number of hosts is 72494 , the total number of Superhosts is 137298

Comparison: Hosts offering enological experience vs Superhosts offering experience

experience_counts <- table(final_dataset$enological_experience, final_dataset$host_is_superhost)
ggplot(final_dataset, aes(x = factor(enological_experience, labels = c("Does Not Offer Experience", "Offers Experience")), 
                          fill = factor(host_is_superhost, labels = c("Host", "Superhost")))) +
  geom_bar(position = "dodge", alpha = 0.7) +
  theme_minimal() +
  labs(title = "Enological Experience by Host Type", 
       x = "Enological Experience", 
       y = "Count", 
       fill = "Host Type") +
  scale_fill_manual(values = c("red", "blue"))

cat("The number of listings that do not offer an enological experience:\n",
    "- Hosts:", experience_counts["0", "0"], "\n",
    "- Superhosts:", experience_counts["0", "1"], "\n")
## The number of listings that do not offer an enological experience:
##  - Hosts: 68982 
##  - Superhosts: 128448
cat("The number of listings that offer an enological experience:\n",
    "- Hosts:", experience_counts["1", "0"], "\n",
    "- Superhosts:", experience_counts["1", "1"], "\n")
## The number of listings that offer an enological experience:
##  - Hosts: 3512 
##  - Superhosts: 8850

Exploratory Data Analysis (EDA)

Our question:

How does offering an enological experience (e.g., wine tasting, offering glass of wine, etc.) impact an Airbnb listing’s review score ratings, and how is this relationship moderated by Superhost status?

Data Strengths:

  • Includes key details about listings, hosts, and reviews, allowing for a comprehensive analysis of Airbnb experiences.
  • Provides structured data on Superhost status and enological experiences, enabling comparisons.
  • Standardized rating system (review_scores_rating) facilitates benchmarking across listings.

Data Weaknesses:

  • Some columns were removed or not included (e.g., host response time, room type), which may limit deeper behavioral analysis.
  • Variation in review frequency across listings may introduce bias, requiring careful interpretation.
  • Outliers detected: 685 extreme values were found in review_scores_rating using the IQR (Interquartile Range) method. These could represent unusually high or low ratings that may impact analysis.

Objectives:

This analysis aims to determine how offering an enological experience (e.g., wine tasting, offering a glass of wine) impacts an Airbnb listing’s review scores, and whether this relationship is moderated by the host’s Superhost status. By analyzing these factors, we aim to understand how hosts (both Superhosts and non-Superhosts) can leverage wine-related experiences to optimize guest satisfaction and ratings.

Findings suggest that Superhost status has a stronger impact on ratings than wine offerings, highlighting the importance of consistent high-quality service. However, non-Superhost hosts can use enological experiences to enhance guest satisfaction, improve ratings, and increase their chances of becoming Superhosts.

Airbnb hosts can use enological experiences as a key differentiator to boost their ratings and attract more guests. Highlighting wine-related offerings in listing descriptions can make properties more appealing and improve booking rates. While Superhosts already have a credibility advantage, they can further enhance their guest experience by incorporating wine-related experiences, reinforcing their status and maintaining a competitive edge.