Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A positive correlation means that as one variable increases, the other variable tends to increase as well. Conversely, a negative correlation indicates that as one variable increases, the other tends to decrease. The strength and direction of a correlation are typically measured by a correlation coefficient, which ranges from -1 to +1. The closer the coefficient is to +1 or -1, the stronger the correlation between the variables.
Covariance, on the other hand, is a measure that indicates the direction of the linear relationship between two variables. Unlike correlation, covariance is not standardized. Therefore, its value can range from negative to positive infinity, and it depends on the units of the variables. This makes it less informative for comparing the relationship strengths across different pairs of variables.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# Merge by listing_id
merged_data <- merge(listings, reviews, by = "listing_id")
head(merged_data)
## listing_id name host_id host_since
## 1 2577 Loft for 4 by Canal Saint Martin 2827 2008-09-09
## 2 2595 Skylit Midtown Castle 2845 2008-09-09
## 3 2595 Skylit Midtown Castle 2845 2008-09-09
## 4 2595 Skylit Midtown Castle 2845 2008-09-09
## 5 2595 Skylit Midtown Castle 2845 2008-09-09
## 6 2595 Skylit Midtown Castle 2845 2008-09-09
## host_location host_response_time host_response_rate
## 1 Casablanca, Grand Casablanca, Morocco a few days or more 0.00
## 2 New York, New York, United States within a few hours 0.93
## 3 New York, New York, United States within a few hours 0.93
## 4 New York, New York, United States within a few hours 0.93
## 5 New York, New York, United States within a few hours 0.93
## 6 New York, New York, United States within a few hours 0.93
## host_acceptance_rate host_is_superhost host_total_listings_count
## 1 0.67 f 2
## 2 0.26 f 6
## 3 0.26 f 6
## 4 0.26 f 6
## 5 0.26 f 6
## 6 0.26 f 6
## host_has_profile_pic host_identity_verified neighbourhood district
## 1 t t Enclos-St-Laurent
## 2 t t Midtown Manhattan
## 3 t t Midtown Manhattan
## 4 t t Midtown Manhattan
## 5 t t Midtown Manhattan
## 6 t t Midtown Manhattan
## city latitude longitude property_type room_type accommodates
## 1 Paris 48.86993 2.36251 Entire loft Entire place 4
## 2 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 3 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 4 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 5 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 6 New York 40.75362 -73.98377 Entire apartment Entire place 2
## bedrooms
## 1 2
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## amenities
## 1 ["Heating", "TV", "Iron", "Kitchen", "Essentials", "Washer", "Dryer", "Hot water", "Hangers", "Wifi", "Long term stays allowed", "Dedicated workspace", "Host greets you"]
## 2 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 3 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 4 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 5 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 6 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## price minimum_nights maximum_nights review_scores_rating
## 1 125 3 1125 100
## 2 100 30 1125 94
## 3 100 30 1125 94
## 4 100 30 1125 94
## 5 100 30 1125 94
## 6 100 30 1125 94
## review_scores_accuracy review_scores_cleanliness review_scores_checkin
## 1 10 10 10
## 2 9 9 10
## 3 9 9 10
## 4 9 9 10
## 5 9 9 10
## 6 9 9 10
## review_scores_communication review_scores_location review_scores_value
## 1 10 10 10
## 2 10 10 9
## 3 10 10 9
## 4 10 10 9
## 5 10 10 9
## 6 10 10 9
## instant_bookable review_id date reviewer_id
## 1 t 366217274 2019-01-02 28047930
## 2 f 2022498 2012-08-18 2124102
## 3 f 334253940 2018-10-08 56872516
## 4 f 46312 2010-05-25 117113
## 5 f 487972917 2019-07-14 60181725
## 6 f 328954829 2018-09-27 203936538
str(merged_data)
## 'data.frame': 5373143 obs. of 36 variables:
## $ listing_id : int 2577 2595 2595 2595 2595 2595 2595 2595 2595 2595 ...
## $ name : chr "Loft for 4 by Canal Saint Martin" "Skylit Midtown Castle" "Skylit Midtown Castle" "Skylit Midtown Castle" ...
## $ host_id : int 2827 2845 2845 2845 2845 2845 2845 2845 2845 2845 ...
## $ host_since : chr "2008-09-09" "2008-09-09" "2008-09-09" "2008-09-09" ...
## $ host_location : chr "Casablanca, Grand Casablanca, Morocco" "New York, New York, United States" "New York, New York, United States" "New York, New York, United States" ...
## $ host_response_time : chr "a few days or more" "within a few hours" "within a few hours" "within a few hours" ...
## $ host_response_rate : num 0 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 ...
## $ host_acceptance_rate : num 0.67 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 ...
## $ host_is_superhost : chr "f" "f" "f" "f" ...
## $ host_total_listings_count : int 2 6 6 6 6 6 6 6 6 6 ...
## $ host_has_profile_pic : chr "t" "t" "t" "t" ...
## $ host_identity_verified : chr "t" "t" "t" "t" ...
## $ neighbourhood : chr "Enclos-St-Laurent" "Midtown" "Midtown" "Midtown" ...
## $ district : chr "" "Manhattan" "Manhattan" "Manhattan" ...
## $ city : chr "Paris" "New York" "New York" "New York" ...
## $ latitude : num 48.9 40.8 40.8 40.8 40.8 ...
## $ longitude : num 2.36 -73.98 -73.98 -73.98 -73.98 ...
## $ property_type : chr "Entire loft" "Entire apartment" "Entire apartment" "Entire apartment" ...
## $ room_type : chr "Entire place" "Entire place" "Entire place" "Entire place" ...
## $ accommodates : int 4 2 2 2 2 2 2 2 2 2 ...
## $ bedrooms : int 2 NA NA NA NA NA NA NA NA NA ...
## $ amenities : chr "[\"Heating\", \"TV\", \"Iron\", \"Kitchen\", \"Essentials\", \"Washer\", \"Dryer\", \"Hot water\", \"Hangers\","| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ ...
## $ price : int 125 100 100 100 100 100 100 100 100 100 ...
## $ minimum_nights : int 3 30 30 30 30 30 30 30 30 30 ...
## $ maximum_nights : int 1125 1125 1125 1125 1125 1125 1125 1125 1125 1125 ...
## $ review_scores_rating : int 100 94 94 94 94 94 94 94 94 94 ...
## $ review_scores_accuracy : int 10 9 9 9 9 9 9 9 9 9 ...
## $ review_scores_cleanliness : int 10 9 9 9 9 9 9 9 9 9 ...
## $ review_scores_checkin : int 10 10 10 10 10 10 10 10 10 10 ...
## $ review_scores_communication: int 10 10 10 10 10 10 10 10 10 10 ...
## $ review_scores_location : int 10 10 10 10 10 10 10 10 10 10 ...
## $ review_scores_value : int 10 9 9 9 9 9 9 9 9 9 ...
## $ instant_bookable : chr "t" "f" "f" "f" ...
## $ review_id : int 366217274 2022498 334253940 46312 487972917 328954829 20937971 15515108 17857 1293632 ...
## $ date : chr "2019-01-02" "2012-08-18" "2018-10-08" "2010-05-25" ...
## $ reviewer_id : int 28047930 2124102 56872516 117113 60181725 203936538 13460520 10781357 50679 1870771 ...
# Select variables
selected_vars <- c("price", "review_scores_rating")
# Remove NAs
analysis_dataset <- na.omit(merged_data[selected_vars])
stargazer(listings, reviews, merged_data, type = "text", title = "Summary Statistics of Listings, Reviews, and Merged Dataset")
##
## Summary Statistics of Listings, Reviews, and Merged Dataset
## =========================================================================================
## Statistic N Mean St. Dev. Min Max
## -----------------------------------------------------------------------------------------
## listing_id 279,712 26,381,955.000 14,425,759.000 2,577 48,343,530
## host_id 279,712 108,165,773.000 110,856,993.000 1,822 390,187,445
## host_response_rate 150,930 0.866 0.284 0.000 1.000
## host_acceptance_rate 166,625 0.827 0.289 0.000 1.000
## host_total_listings_count 279,547 24.582 284.041 0 7,235
## latitude 279,712 18.762 32.560 -34.264 48.905
## longitude 279,712 12.595 73.081 -99.340 151.340
## accommodates 279,712 3.289 2.133 0 16
## bedrooms 250,277 1.516 1.153 1 50
## price 279,712 608.793 3,441.827 0 625,216
## minimum_nights 279,712 8.051 31.519 1 9,999
## maximum_nights 279,712 27,558.600 7,282,875.000 1 2,147,483,647
## review_scores_rating 188,307 93.405 10.070 20 100
## review_scores_accuracy 187,999 9.565 0.991 2 10
## review_scores_cleanliness 188,047 9.313 1.146 2 10
## review_scores_checkin 187,941 9.702 0.867 2 10
## review_scores_communication 188,025 9.699 0.887 2 10
## review_scores_location 187,937 9.634 0.833 2 10
## review_scores_value 187,927 9.335 1.043 2 10
## -----------------------------------------------------------------------------------------
##
## Summary Statistics of Listings, Reviews, and Merged Dataset
## =======================================================================
## Statistic N Mean St. Dev. Min Max
## -----------------------------------------------------------------------
## listing_id 5,373,143 16,029,886.000 11,986,765.000 2,577 48,263,869
## review_id 5,373,143 348,675,319.000 206,101,858.000 282 735,623,741
## reviewer_id 5,373,143 98,081,330.000 90,805,956.000 1 390,338,478
## -----------------------------------------------------------------------
##
## Summary Statistics of Listings, Reviews, and Merged Dataset
## ===========================================================================================
## Statistic N Mean St. Dev. Min Max
## -------------------------------------------------------------------------------------------
## listing_id 5,373,143 16,029,886.000 11,986,765.000 2,577 48,263,869
## host_id 5,373,143 64,699,944.000 78,260,770.000 1,822 389,316,854
## host_response_rate 3,897,510 0.935 0.189 0.000 1.000
## host_acceptance_rate 4,624,706 0.905 0.199 0.000 1.000
## host_total_listings_count 5,369,204 6.994 25.983 0 7,235
## latitude 5,373,143 24.301 29.840 -34.264 48.904
## longitude 5,373,143 3.872 69.504 -99.340 151.340
## accommodates 5,373,143 3.449 2.023 1 16
## bedrooms 4,831,730 1.450 1.085 1 50
## price 5,373,143 395.841 2,423.261 8 300,177
## minimum_nights 5,373,143 5.927 33.363 1 9,999
## maximum_nights 5,373,143 136,727.000 16,706,862.000 1 2,147,483,647
## review_scores_rating 5,367,038 94.581 4.603 20 100
## review_scores_accuracy 5,333,880 9.729 0.516 2 10
## review_scores_cleanliness 5,334,072 9.498 0.669 2 10
## review_scores_checkin 5,333,805 9.835 0.426 2 10
## review_scores_communication 5,333,909 9.828 0.436 2 10
## review_scores_location 5,333,797 9.751 0.491 2 10
## review_scores_value 5,333,784 9.470 0.599 2 10
## review_id 5,373,143 348,675,319.000 206,101,858.000 282 735,623,741
## reviewer_id 5,373,143 98,081,330.000 90,805,956.000 1 390,338,478
## -------------------------------------------------------------------------------------------
# Calculate correlation
correlation_result <- cor(analysis_dataset$price, analysis_dataset$review_scores_rating)
# Calculate covariance
covariance_result <- cov(analysis_dataset$price, analysis_dataset$review_scores_rating)
cat("Correlation between price and review scores rating: ", correlation_result, "\n")
## Correlation between price and review scores rating: 0.02896872
cat("Covariance between price and review scores rating: ", covariance_result, "\n")
## Covariance between price and review scores rating: 323.1574
While the covariance suggests a general positive trend, the extremely low correlation indicates a more critical story—that the relationship between price and review scores rating is minimal. This minimal relationship implies that other factors likely play much more significant roles in influencing the review scores of listings on platforms like Airbnb.