Q1

Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A positive correlation means that as one variable increases, the other variable tends to increase as well. Conversely, a negative correlation indicates that as one variable increases, the other tends to decrease. The strength and direction of a correlation are typically measured by a correlation coefficient, which ranges from -1 to +1. The closer the coefficient is to +1 or -1, the stronger the correlation between the variables.

Q2.

Covariance, on the other hand, is a measure that indicates the direction of the linear relationship between two variables. Unlike correlation, covariance is not standardized. Therefore, its value can range from negative to positive infinity, and it depends on the units of the variables. This makes it less informative for comparing the relationship strengths across different pairs of variables.

Q3.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# Merge by listing_id
merged_data <- merge(listings, reviews, by = "listing_id")

head(merged_data)
##   listing_id                             name host_id host_since
## 1       2577 Loft for 4 by Canal Saint Martin    2827 2008-09-09
## 2       2595            Skylit Midtown Castle    2845 2008-09-09
## 3       2595            Skylit Midtown Castle    2845 2008-09-09
## 4       2595            Skylit Midtown Castle    2845 2008-09-09
## 5       2595            Skylit Midtown Castle    2845 2008-09-09
## 6       2595            Skylit Midtown Castle    2845 2008-09-09
##                           host_location host_response_time host_response_rate
## 1 Casablanca, Grand Casablanca, Morocco a few days or more               0.00
## 2     New York, New York, United States within a few hours               0.93
## 3     New York, New York, United States within a few hours               0.93
## 4     New York, New York, United States within a few hours               0.93
## 5     New York, New York, United States within a few hours               0.93
## 6     New York, New York, United States within a few hours               0.93
##   host_acceptance_rate host_is_superhost host_total_listings_count
## 1                 0.67                 f                         2
## 2                 0.26                 f                         6
## 3                 0.26                 f                         6
## 4                 0.26                 f                         6
## 5                 0.26                 f                         6
## 6                 0.26                 f                         6
##   host_has_profile_pic host_identity_verified     neighbourhood  district
## 1                    t                      t Enclos-St-Laurent          
## 2                    t                      t           Midtown Manhattan
## 3                    t                      t           Midtown Manhattan
## 4                    t                      t           Midtown Manhattan
## 5                    t                      t           Midtown Manhattan
## 6                    t                      t           Midtown Manhattan
##       city latitude longitude    property_type    room_type accommodates
## 1    Paris 48.86993   2.36251      Entire loft Entire place            4
## 2 New York 40.75362 -73.98377 Entire apartment Entire place            2
## 3 New York 40.75362 -73.98377 Entire apartment Entire place            2
## 4 New York 40.75362 -73.98377 Entire apartment Entire place            2
## 5 New York 40.75362 -73.98377 Entire apartment Entire place            2
## 6 New York 40.75362 -73.98377 Entire apartment Entire place            2
##   bedrooms
## 1        2
## 2       NA
## 3       NA
## 4       NA
## 5       NA
## 6       NA
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            amenities
## 1                                                                                                                                                                                                                                                                                                                                                                         ["Heating", "TV", "Iron", "Kitchen", "Essentials", "Washer", "Dryer", "Hot water", "Hangers", "Wifi", "Long term stays allowed", "Dedicated workspace", "Host greets you"]
## 2 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 3 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 4 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 5 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 6 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
##   price minimum_nights maximum_nights review_scores_rating
## 1   125              3           1125                  100
## 2   100             30           1125                   94
## 3   100             30           1125                   94
## 4   100             30           1125                   94
## 5   100             30           1125                   94
## 6   100             30           1125                   94
##   review_scores_accuracy review_scores_cleanliness review_scores_checkin
## 1                     10                        10                    10
## 2                      9                         9                    10
## 3                      9                         9                    10
## 4                      9                         9                    10
## 5                      9                         9                    10
## 6                      9                         9                    10
##   review_scores_communication review_scores_location review_scores_value
## 1                          10                     10                  10
## 2                          10                     10                   9
## 3                          10                     10                   9
## 4                          10                     10                   9
## 5                          10                     10                   9
## 6                          10                     10                   9
##   instant_bookable review_id       date reviewer_id
## 1                t 366217274 2019-01-02    28047930
## 2                f   2022498 2012-08-18     2124102
## 3                f 334253940 2018-10-08    56872516
## 4                f     46312 2010-05-25      117113
## 5                f 487972917 2019-07-14    60181725
## 6                f 328954829 2018-09-27   203936538
str(merged_data)
## 'data.frame':    5373143 obs. of  36 variables:
##  $ listing_id                 : int  2577 2595 2595 2595 2595 2595 2595 2595 2595 2595 ...
##  $ name                       : chr  "Loft for 4 by Canal Saint Martin" "Skylit Midtown Castle" "Skylit Midtown Castle" "Skylit Midtown Castle" ...
##  $ host_id                    : int  2827 2845 2845 2845 2845 2845 2845 2845 2845 2845 ...
##  $ host_since                 : chr  "2008-09-09" "2008-09-09" "2008-09-09" "2008-09-09" ...
##  $ host_location              : chr  "Casablanca, Grand Casablanca, Morocco" "New York, New York, United States" "New York, New York, United States" "New York, New York, United States" ...
##  $ host_response_time         : chr  "a few days or more" "within a few hours" "within a few hours" "within a few hours" ...
##  $ host_response_rate         : num  0 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 ...
##  $ host_acceptance_rate       : num  0.67 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 ...
##  $ host_is_superhost          : chr  "f" "f" "f" "f" ...
##  $ host_total_listings_count  : int  2 6 6 6 6 6 6 6 6 6 ...
##  $ host_has_profile_pic       : chr  "t" "t" "t" "t" ...
##  $ host_identity_verified     : chr  "t" "t" "t" "t" ...
##  $ neighbourhood              : chr  "Enclos-St-Laurent" "Midtown" "Midtown" "Midtown" ...
##  $ district                   : chr  "" "Manhattan" "Manhattan" "Manhattan" ...
##  $ city                       : chr  "Paris" "New York" "New York" "New York" ...
##  $ latitude                   : num  48.9 40.8 40.8 40.8 40.8 ...
##  $ longitude                  : num  2.36 -73.98 -73.98 -73.98 -73.98 ...
##  $ property_type              : chr  "Entire loft" "Entire apartment" "Entire apartment" "Entire apartment" ...
##  $ room_type                  : chr  "Entire place" "Entire place" "Entire place" "Entire place" ...
##  $ accommodates               : int  4 2 2 2 2 2 2 2 2 2 ...
##  $ bedrooms                   : int  2 NA NA NA NA NA NA NA NA NA ...
##  $ amenities                  : chr  "[\"Heating\", \"TV\", \"Iron\", \"Kitchen\", \"Essentials\", \"Washer\", \"Dryer\", \"Hot water\", \"Hangers\","| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ "[\"Refrigerator\", \"Air conditioning\", \"Baking sheet\", \"Free street parking\", \"Bathtub\", \"Kitchen\", \"| __truncated__ ...
##  $ price                      : int  125 100 100 100 100 100 100 100 100 100 ...
##  $ minimum_nights             : int  3 30 30 30 30 30 30 30 30 30 ...
##  $ maximum_nights             : int  1125 1125 1125 1125 1125 1125 1125 1125 1125 1125 ...
##  $ review_scores_rating       : int  100 94 94 94 94 94 94 94 94 94 ...
##  $ review_scores_accuracy     : int  10 9 9 9 9 9 9 9 9 9 ...
##  $ review_scores_cleanliness  : int  10 9 9 9 9 9 9 9 9 9 ...
##  $ review_scores_checkin      : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ review_scores_communication: int  10 10 10 10 10 10 10 10 10 10 ...
##  $ review_scores_location     : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ review_scores_value        : int  10 9 9 9 9 9 9 9 9 9 ...
##  $ instant_bookable           : chr  "t" "f" "f" "f" ...
##  $ review_id                  : int  366217274 2022498 334253940 46312 487972917 328954829 20937971 15515108 17857 1293632 ...
##  $ date                       : chr  "2019-01-02" "2012-08-18" "2018-10-08" "2010-05-25" ...
##  $ reviewer_id                : int  28047930 2124102 56872516 117113 60181725 203936538 13460520 10781357 50679 1870771 ...
# Select variables
selected_vars <- c("price", "review_scores_rating")

# Remove NAs
analysis_dataset <- na.omit(merged_data[selected_vars])

Q4.

stargazer(listings, reviews, merged_data, type = "text", title = "Summary Statistics of Listings, Reviews, and Merged Dataset")
## 
## Summary Statistics of Listings, Reviews, and Merged Dataset
## =========================================================================================
## Statistic                      N         Mean          St. Dev.       Min        Max     
## -----------------------------------------------------------------------------------------
## listing_id                  279,712 26,381,955.000  14,425,759.000   2,577   48,343,530  
## host_id                     279,712 108,165,773.000 110,856,993.000  1,822   390,187,445 
## host_response_rate          150,930      0.866           0.284       0.000      1.000    
## host_acceptance_rate        166,625      0.827           0.289       0.000      1.000    
## host_total_listings_count   279,547     24.582          284.041        0        7,235    
## latitude                    279,712     18.762          32.560      -34.264    48.905    
## longitude                   279,712     12.595          73.081      -99.340    151.340   
## accommodates                279,712      3.289           2.133         0         16      
## bedrooms                    250,277      1.516           1.153         1         50      
## price                       279,712     608.793        3,441.827       0       625,216   
## minimum_nights              279,712      8.051          31.519         1        9,999    
## maximum_nights              279,712   27,558.600     7,282,875.000     1    2,147,483,647
## review_scores_rating        188,307     93.405          10.070        20         100     
## review_scores_accuracy      187,999      9.565           0.991         2         10      
## review_scores_cleanliness   188,047      9.313           1.146         2         10      
## review_scores_checkin       187,941      9.702           0.867         2         10      
## review_scores_communication 188,025      9.699           0.887         2         10      
## review_scores_location      187,937      9.634           0.833         2         10      
## review_scores_value         187,927      9.335           1.043         2         10      
## -----------------------------------------------------------------------------------------
## 
## Summary Statistics of Listings, Reviews, and Merged Dataset
## =======================================================================
## Statistic       N          Mean          St. Dev.      Min      Max    
## -----------------------------------------------------------------------
## listing_id  5,373,143 16,029,886.000  11,986,765.000  2,577 48,263,869 
## review_id   5,373,143 348,675,319.000 206,101,858.000  282  735,623,741
## reviewer_id 5,373,143 98,081,330.000  90,805,956.000    1   390,338,478
## -----------------------------------------------------------------------
## 
## Summary Statistics of Listings, Reviews, and Merged Dataset
## ===========================================================================================
## Statistic                       N          Mean          St. Dev.       Min        Max     
## -------------------------------------------------------------------------------------------
## listing_id                  5,373,143 16,029,886.000  11,986,765.000   2,577   48,263,869  
## host_id                     5,373,143 64,699,944.000  78,260,770.000   1,822   389,316,854 
## host_response_rate          3,897,510      0.935           0.189       0.000      1.000    
## host_acceptance_rate        4,624,706      0.905           0.199       0.000      1.000    
## host_total_listings_count   5,369,204      6.994          25.983         0        7,235    
## latitude                    5,373,143     24.301          29.840      -34.264    48.904    
## longitude                   5,373,143      3.872          69.504      -99.340    151.340   
## accommodates                5,373,143      3.449           2.023         1         16      
## bedrooms                    4,831,730      1.450           1.085         1         50      
## price                       5,373,143     395.841        2,423.261       8       300,177   
## minimum_nights              5,373,143      5.927          33.363         1        9,999    
## maximum_nights              5,373,143   136,727.000   16,706,862.000     1    2,147,483,647
## review_scores_rating        5,367,038     94.581           4.603        20         100     
## review_scores_accuracy      5,333,880      9.729           0.516         2         10      
## review_scores_cleanliness   5,334,072      9.498           0.669         2         10      
## review_scores_checkin       5,333,805      9.835           0.426         2         10      
## review_scores_communication 5,333,909      9.828           0.436         2         10      
## review_scores_location      5,333,797      9.751           0.491         2         10      
## review_scores_value         5,333,784      9.470           0.599         2         10      
## review_id                   5,373,143 348,675,319.000 206,101,858.000   282    735,623,741 
## reviewer_id                 5,373,143 98,081,330.000  90,805,956.000     1     390,338,478 
## -------------------------------------------------------------------------------------------

Q5.

# Calculate correlation
correlation_result <- cor(analysis_dataset$price, analysis_dataset$review_scores_rating)

# Calculate covariance
covariance_result <- cov(analysis_dataset$price, analysis_dataset$review_scores_rating)

cat("Correlation between price and review scores rating: ", correlation_result, "\n")
## Correlation between price and review scores rating:  0.02896872
cat("Covariance between price and review scores rating: ", covariance_result, "\n")
## Covariance between price and review scores rating:  323.1574

Correlation Interpretation

  • Value: The correlation coefficient of 0.029 is very close to zero, indicating a very weak positive linear relationship between the price of the listings and their review scores rating. The extremely low value suggests almost no linear association.
  • Meaning: The near-zero correlation implies that there is little to no linear relationship between how much a listing costs and how highly it is rated. It suggests that increasing the price of a listing does not necessarily correlate with a higher rating, and similarly, lower prices do not necessarily correlate with lower ratings.

Covariance Interpretation

  • Value: The positive covariance of 323.1574 suggests a general trend where higher prices are associated with higher review scores. This means that, in general, when the price increases, the review scores also tend to be higher, and conversely, when the price decreases, the review scores tend to be lower.
  • Meaning: Although the covariance is positive, indicating a general positive direction in the relationship, the actual value does not provide a standardized measure to evaluate the strength of this relationship. The very weak correlation value significantly undermines the apparent positive relationship suggested by the covariance, indicating that the relationship is overall very weak.

Overall Interpretation

While the covariance suggests a general positive trend, the extremely low correlation indicates a more critical story—that the relationship between price and review scores rating is minimal. This minimal relationship implies that other factors likely play much more significant roles in influencing the review scores of listings on platforms like Airbnb.