Correlation: It indicates how two or more variables are related. Correlation can be positive (both variables move in the same direction) or negative (move in opposite directions). The strength of the correlation ranges from -1 to +1, with values close to 0 indicating a weaker correlation. Correlation measures relationships without implying causation.
Covariance: It shows the extent to which two variables vary together. It indicates whether two variables move in the same or opposite directions. Like variance, which measures variability of a single variable, covariance measures the joint variability of two variables. Positive or negative values indicate the direction of the relationship.
# Load the data
Listings <- read.csv("Listings.csv")
Reviews <- read.csv("Reviews.csv")
# Merge the datasets into one
merged_data <- merge(Listings, Reviews, by = "listing_id", all = TRUE)
summary(merged_data)
## listing_id name host_id host_since
## Min. : 2577 Length:5459299 Min. : 1822 Length:5459299
## 1st Qu.: 5425967 Class :character 1st Qu.: 8939609 Class :character
## Median :14746073 Mode :character Median : 31142417 Mode :character
## Mean :16278585 Mean : 65862881
## 3rd Qu.:24504409 3rd Qu.: 96238999
## Max. :48343530 Max. :390187445
##
## host_location host_response_time host_response_rate host_acceptance_rate
## Length:5459299 Length:5459299 Min. :0.0 Min. :0.0
## Class :character Class :character 1st Qu.:1.0 1st Qu.:0.9
## Mode :character Mode :character Median :1.0 Median :1.0
## Mean :0.9 Mean :0.9
## 3rd Qu.:1.0 3rd Qu.:1.0
## Max. :1.0 Max. :1.0
## NA's :1523184 NA's :798117
## host_is_superhost host_total_listings_count host_has_profile_pic
## Length:5459299 Min. : 0.000 Length:5459299
## Class :character 1st Qu.: 1.000 Class :character
## Mode :character Median : 2.000 Mode :character
## Mean : 7.804
## 3rd Qu.: 5.000
## Max. :7235.000
## NA's :3992
## host_identity_verified neighbourhood district
## Length:5459299 Length:5459299 Length:5459299
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## city latitude longitude property_type
## Length:5459299 Min. :-34.26 Min. :-99.340 Length:5459299
## Class :character 1st Qu.: 13.76 1st Qu.:-43.377 Class :character
## Mode :character Median : 40.77 Median : 2.377 Mode :character
## Mean : 24.19 Mean : 4.104
## 3rd Qu.: 41.91 3rd Qu.: 18.394
## Max. : 48.90 Max. :151.340
##
## room_type accommodates bedrooms amenities
## Length:5459299 Min. : 0.000 Min. : 1.0 Length:5459299
## Class :character 1st Qu.: 2.000 1st Qu.: 1.0 Class :character
## Mode :character Median : 3.000 Median : 1.0 Mode :character
## Mean : 3.445 Mean : 1.5
## 3rd Qu.: 4.000 3rd Qu.: 2.0
## Max. :16.000 Max. :50.0
## NA's :550476
## price minimum_nights maximum_nights review_scores_rating
## Min. : 0.0 Min. : 1.000 Min. :1.000e+00 Min. : 20.00
## 1st Qu.: 67.0 1st Qu.: 1.000 1st Qu.:4.500e+01 1st Qu.: 93.00
## Median : 116.0 Median : 2.000 Median :1.125e+03 Median : 96.00
## Mean : 403.9 Mean : 5.989 Mean :1.346e+05 Mean : 94.58
## 3rd Qu.: 334.0 3rd Qu.: 3.000 3rd Qu.:1.125e+03 3rd Qu.: 98.00
## Max. :625216.0 Max. :9999.000 Max. :2.147e+09 Max. :100.00
## NA's :92260
## review_scores_accuracy review_scores_cleanliness review_scores_checkin
## Min. : 2.00 Min. : 2.0 Min. : 2.00
## 1st Qu.:10.00 1st Qu.: 9.0 1st Qu.:10.00
## Median :10.00 Median :10.0 Median :10.00
## Mean : 9.73 Mean : 9.5 Mean : 9.83
## 3rd Qu.:10.00 3rd Qu.:10.0 3rd Qu.:10.00
## Max. :10.00 Max. :10.0 Max. :10.00
## NA's :125419 NA's :125227 NA's :125494
## review_scores_communication review_scores_location review_scores_value
## Min. : 2.00 Min. : 2.00 Min. : 2.00
## 1st Qu.:10.00 1st Qu.:10.00 1st Qu.: 9.00
## Median :10.00 Median :10.00 Median :10.00
## Mean : 9.83 Mean : 9.75 Mean : 9.47
## 3rd Qu.:10.00 3rd Qu.:10.00 3rd Qu.:10.00
## Max. :10.00 Max. :10.00 Max. :10.00
## NA's :125390 NA's :125502 NA's :125515
## instant_bookable review_id date reviewer_id
## Length:5459299 Min. : 282 Length:5459299 Min. : 1
## Class :character 1st Qu.:166643479 Class :character 1st Qu.: 23902058
## Mode :character Median :342572666 Mode :character Median : 66978139
## Mean :348675319 Mean : 98081330
## 3rd Qu.:533404482 3rd Qu.:152893599
## Max. :735623741 Max. :390338478
## NA's :86156 NA's :86156
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(merged_data, type = "text",
title = "Summary Statistics")
##
## Summary Statistics
## ===========================================================================================
## Statistic N Mean St. Dev. Min Max
## -------------------------------------------------------------------------------------------
## listing_id 5,459,299 16,278,585.000 12,188,827.000 2,577 48,343,530
## host_id 5,459,299 65,862,881.000 79,734,618.000 1,822 390,187,445
## host_response_rate 3,936,115 0.934 0.192 0.000 1.000
## host_acceptance_rate 4,661,182 0.903 0.201 0.000 1.000
## host_total_listings_count 5,455,307 7.804 68.330 0 7,235
## latitude 5,459,299 24.187 29.890 -34.264 48.905
## longitude 5,459,299 4.104 69.585 -99.340 151.340
## accommodates 5,459,299 3.445 2.027 0 16
## bedrooms 4,908,823 1.452 1.090 1 50
## price 5,459,299 403.920 2,495.727 0 625,216
## minimum_nights 5,459,299 5.989 33.308 1 9,999
## maximum_nights 5,459,299 134,584.200 16,574,518.000 1 2,147,483,647
## review_scores_rating 5,367,039 94.581 4.603 20 100
## review_scores_accuracy 5,333,880 9.729 0.516 2 10
## review_scores_cleanliness 5,334,072 9.498 0.669 2 10
## review_scores_checkin 5,333,805 9.835 0.426 2 10
## review_scores_communication 5,333,909 9.828 0.436 2 10
## review_scores_location 5,333,797 9.751 0.491 2 10
## review_scores_value 5,333,784 9.470 0.599 2 10
## review_id 5,373,143 348,675,319.000 206,101,858.000 282 735,623,741
## reviewer_id 5,373,143 98,081,330.000 90,805,956.000 1 390,338,478
## -------------------------------------------------------------------------------------------
correlation <- cor(Listings$host_response_rate, Listings$host_acceptance_rate, use = "complete.obs")
covariance <- cov(Listings$host_response_rate, Listings$host_acceptance_rate, use = "complete.obs")
print(correlation)
## [1] 0.3215103
print(covariance)
## [1] 0.02118452