Variables: price_per_min_night (calculated response) vs. minimum_nights(explanatory).
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_csv("AB_NYC_2019.csv")
## Rows: 48895 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, host_name, neighbourhood_group, neighbourhood, room_type
## dbl (10): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## date (1): last_review
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create the calculated column 'price_per_min_night'
data <- data %>% mutate(price_per_min_night = price / minimum_nights)
# Display a quick view of the dataset to ensure the new column is created
head(data)
## # A tibble: 6 × 17
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & qu… 2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mid… 2845 Jennifer Manhattan Midtown 40.8
## 3 3647 THE VILLAG… 4632 Elisabeth Manhattan Harlem 40.8
## 4 3831 Cozy Entir… 4869 LisaRoxa… Brooklyn Clinton Hill 40.7
## 5 5022 Entire Apt… 7192 Laura Manhattan East Harlem 40.8
## 6 5099 Large Cozy… 7322 Chris Manhattan Murray Hill 40.7
## # ℹ 10 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>, price_per_min_night <dbl>
Insight: Regardless of the minimum stay needed, calculating price_per_min_night (price divided by minimum nights) provides a more accurate indication of the nightly rate. This might show whether or not properties that have higher minimum stay requirements typically charge more or less per night than those with lower requirements.
Significance: By predicting the nightly cost based on minimum stay constraints, visitors who may have specific duration requirements might benefit from an understanding of this relationship.
Further Questions:
What are the differences in minimum night prices between
neighborhoods?
When it comes to minimum night needs, do different property types
display different pricing patterns?
Pair 2: Price per Review vs. Number of Reviews
number_of_reviews (explanatory) vs. price_per_review (response), which can create price /number_of_reviews.
# Load necessary libraries
data <- data %>% mutate(number_of_reviews = ifelse(number_of_reviews == 0, NA, number_of_reviews))
# Create the calculated column 'price_per_review'
data <- data %>% mutate(price_per_review = price / number_of_reviews)
head(data)
## # A tibble: 6 × 18
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & qu… 2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mid… 2845 Jennifer Manhattan Midtown 40.8
## 3 3647 THE VILLAG… 4632 Elisabeth Manhattan Harlem 40.8
## 4 3831 Cozy Entir… 4869 LisaRoxa… Brooklyn Clinton Hill 40.7
## 5 5022 Entire Apt… 7192 Laura Manhattan East Harlem 40.8
## 6 5099 Large Cozy… 7322 Chris Manhattan Murray Hill 40.7
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>, price_per_min_night <dbl>, price_per_review <dbl>
Insights: Using this pair, we can investigate the relationship between a listing’s price and popularity as determined by the number of reviews. If many guests leave negative reviews, a high price per review might be indicate of either expensive properties or possibly lower quality.
Significance: Determining whether listings are seen as valuable or well-reviewed compared to those that are expensive or perform poorly in terms of guest satisfaction is made easier by analyzing this relationship.
Further questions:
What is the average price per review across different neighborhoods, and
what is the relationship between this and the total cost of actual
property?
What are the seasonality effects in the relationship between price per
review and quantity of reviews over time?
Plot 1: price_per_min_night vs. minimum_nights
# Scatter plot for price_per_min_night vs minimum_nights
library(ggplot2)
ggplot(data, aes(x = minimum_nights, y = price_per_min_night)) +
geom_point(alpha = 0.5) +
labs(title = "Price per Minimum Night vs Minimum Nights",
x = "Minimum Nights",
y = "Price per Minimum Night") +
theme_minimal()
Observation: The distribution of nightly rates based on different minimum stay requirements is displayed in the scatter plot. Listings with higher or lower nightly rates that deviate from typical pricing trends may be identified as outliers.
Conclusion: We might infer that longer minimum stays typically have either higher or lower rates per night if a strong trend shows up. The weak correlation, however, may suggest that nightly rates are not significantly impacted by minimum stay requirements alone.
Plot 2: Price per Review vs Number of Reviews
# Replace NA values in number_of_reviews and price_per_review with 0
data <- data %>%
replace_na(list(number_of_reviews = 0, price_per_review = 0))
# Now plot again
ggplot(data, aes(x = number_of_reviews, y = price_per_review)) +
geom_point(alpha = 0.5) +
labs(title = "Price per Review vs Number of Reviews",
x = "Number of Reviews",
y = "Price per Review") +
theme_minimal()
Observation: There is a chance that an
increasing trend will show that properties with more reviews get higher
prices per review, suggesting that they are in more demand.
Conclusion: The existence of outliers could
point to listings with a low number of reviews but high prices,
necessitating further investigation into the reasons behind those
properties’ seeking prices.
Correlation for Price per minimum night vs. Minimum Nights
cor_price_per_min_night <- cor(data$price_per_min_night, data$minimum_nights, use = "complete.obs")
cor_price_per_min_night
## [1] -0.1053576
Insights
Weak Negative Relationship: This slightly negative connection suggests
that the nightly rate generally tends to drop a little when the minimum
stay required rises. This might be the result of hosts lowering the
nightly charge in an effort attract in longer-term visitors, which might
encourage them to make longer-term reservations.
Significance
Budgeting for Guests: According to this information, guests who are
planning longer stays may find that longer minimum stays result in a
slightly better nightly rate.
Host Pricing Strategy: In order to stay competitive and attract in
longer bookings, hosts may utilize this information to modify their
pricing strategy, including providing discounted nightly rates for
properties with higher minimum night requirements.
Correlation for Price per Review vs Number of
Reviews
cor_price_per_review_reviews <- cor(data$price_per_review, data$number_of_reviews, use = "complete.obs")
cor_price_per_review_reviews
## [1] -0.128673
Insights: The price per review and the quantity of
reviews have a weakly negative link, as indicated by the correlation
coefficient of -0.1287. This implies that the price per review generally
tends to fall down a little as the number of reviews increases. This may
suggest that listings with a higher number of reviews are thought to
offer better value, which could result in lower costs for each
review.
Significance: This discovery raises concerns regarding the
relationship between cost and guest satisfaction. Listings with a high
number of reviews yet a low price per review could be more popular or
offer better value, attracting more visitors and reviews in the
process.
mean_price_per_min_night <- mean(data$price_per_min_night, na.rm = TRUE)
se_price_per_min_night <- sd(data$price_per_min_night, na.rm = TRUE) / sqrt(nrow(data))
# 95% Confidence Interval
ci_price_per_min_night <- c(
mean_price_per_min_night - 1.96 * se_price_per_min_night,
mean_price_per_min_night + 1.96 * se_price_per_min_night
)
ci_price_per_min_night
## [1] 68.77712 71.57138
Insights
Significance
# Calculate mean and standard error
mean_price_per_review <- mean(data$price_per_review, na.rm = TRUE)
se_price_per_review <- sd(data$price_per_review, na.rm = TRUE) / sqrt(nrow(data))
# Confidence interval (95%)
ci_price_per_review <- c(mean_price_per_review - 1.96 * se_price_per_review, mean_price_per_review + 1.96 * se_price_per_review)
ci_price_per_review
## [1] 29.42625 31.26067
Insights:
Perceived Value:Based on the price per review, it appears that guests
typically get good value from the accommodation they select. A lower
cost per review can mean that visitors believe they are receiving good
experiences for their money.
Quality Indicator: Listings with a higher number of reviews and a lower
price per review might be drawing in more customers as a result of
competitive pricing and high-quality service, which would create a
positive feedback loop that would result in bookings and
reviews.
Significance:
Expectations from Guests: This price per review indicator gives guests
an idea of what to anticipate in terms of the quality of accommodation
for the amount they are paying.
Competitive Pricing: To determine whether their prices are reasonable
given the quality of service they offer, hosts should compare their
price per review to the average.