#Clearing the environment
rm(list = ls())
Correlation is a statistical term that describes how much two variables fluctuate in tandem. A correlation coefficient of -1 shows a perfect negative correlation, which means that if one variable increases, so does the other. A correlation value of 0 shows that there is no linear relationship between the variables.
However, it does not imply causation, meaning that even if two variables are correlated, it does not necessarily mean that one variable causes the other to change.
Covariance is a statistical measure that describes the extent to which two random variables change together. If the covariance is positive, it indicates that as one variable increases, the other tends to increase as well, and vice versa.
Covariance is not scaled and can take any value, positive or negative. The magnitude of the covariance is not easily interpretable in terms of the strength of the relationship between the variables.
Correlation, on the other hand, is a standardized measure. It scales the covariance by the product of the standard deviations of the variables, resulting in a value between -1 and 1. This allows for a more interpretable comparison of the strength and direction of the relationship.
#Importing the necessary libraries
library(psych)
# Load the CSV file into R
df_main <- read.csv("/Users/aritraray/Desktop/Airbnb Data/Listings.csv")
head(df_main)
## listing_id name host_id
## 1 281420 Beautiful Flat in le Village Montmartre, Paris 1466919
## 2 3705183 39 m² Paris (Sacre Cœur) 10328771
## 3 4082273 Lovely apartment with Terrace, 60m2 19252768
## 4 4797344 Cosy studio (close to Eiffel tower) 10668311
## 5 4823489 Close to Eiffel Tower - Beautiful flat : 2 rooms 24837558
## 6 4898654 NEW - Charming apartment Le Marais 505535
## host_since host_location host_response_time host_response_rate
## 1 2011-12-03 Paris, Ile-de-France, France NA
## 2 2013-11-29 Paris, Ile-de-France, France NA
## 3 2014-07-31 Paris, Ile-de-France, France NA
## 4 2013-12-17 Paris, Ile-de-France, France NA
## 5 2014-12-14 Paris, Ile-de-France, France NA
## 6 2011-04-13 Paris, Ile-de-France, France NA
## host_acceptance_rate host_is_superhost host_total_listings_count
## 1 NA f 1
## 2 NA f 1
## 3 NA f 1
## 4 NA f 1
## 5 NA f 1
## 6 NA f 1
## host_has_profile_pic host_identity_verified neighbourhood district city
## 1 t f Buttes-Montmartre Paris
## 2 t t Buttes-Montmartre Paris
## 3 t f Elysee Paris
## 4 t t Vaugirard Paris
## 5 t f Passy Paris
## 6 t t Temple Paris
## latitude longitude property_type room_type accommodates bedrooms
## 1 48.88668 2.33343 Entire apartment Entire place 2 1
## 2 48.88617 2.34515 Entire apartment Entire place 2 1
## 3 48.88112 2.31712 Entire apartment Entire place 2 1
## 4 48.84571 2.30584 Entire apartment Entire place 2 1
## 5 48.85500 2.26979 Entire apartment Entire place 2 1
## 6 48.86428 2.35370 Entire apartment Entire place 2 1
## amenities
## 1 ["Heating", "Kitchen", "Washer", "Wifi", "Long term stays allowed"]
## 2 ["Shampoo", "Heating", "Kitchen", "Essentials", "Washer", "Dryer", "Wifi", "Long term stays allowed"]
## 3 ["Heating", "TV", "Kitchen", "Washer", "Wifi", "Long term stays allowed"]
## 4 ["Heating", "TV", "Kitchen", "Wifi", "Long term stays allowed"]
## 5 ["Heating", "TV", "Kitchen", "Essentials", "Hair dryer", "Washer", "Dryer", "Bathtub", "Wifi", "Elevator", "Long term stays allowed", "Cable TV"]
## 6 ["Heating", "TV", "Kitchen", "Essentials", "Washer", "Smoke alarm", "Wifi", "Long term stays allowed", "Cable TV"]
## price minimum_nights maximum_nights review_scores_rating
## 1 53 2 1125 100
## 2 120 2 1125 100
## 3 89 2 1125 100
## 4 58 2 1125 100
## 5 60 2 1125 100
## 6 95 2 1125 100
## review_scores_accuracy review_scores_cleanliness review_scores_checkin
## 1 10 10 10
## 2 10 10 10
## 3 10 10 10
## 4 10 10 10
## 5 10 10 10
## 6 10 10 10
## review_scores_communication review_scores_location review_scores_value
## 1 10 10 10
## 2 10 10 10
## 3 10 10 10
## 4 10 10 10
## 5 10 10 10
## 6 10 10 10
## instant_bookable
## 1 f
## 2 f
## 3 f
## 4 f
## 5 f
## 6 f
# Check for duplicates in each column
duplicate_columns <- sapply(df_main, function(x) any(duplicated(x)))
# Print the names of columns with duplicates
cat("Columns with duplicate values: ", names(duplicate_columns[duplicate_columns]), "\n")
## Columns with duplicate values: name host_id host_since host_location host_response_time host_response_rate host_acceptance_rate host_is_superhost host_total_listings_count host_has_profile_pic host_identity_verified neighbourhood district city latitude longitude property_type room_type accommodates bedrooms amenities price minimum_nights maximum_nights review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable
describe(df_main)
## vars n mean sd median
## listing_id 1 279712 26381955.49 14425758.69 27670985.00
## name* 2 279711 132674.65 76747.41 132473.00
## host_id 3 279712 108165773.09 110856993.22 58269113.50
## host_since* 4 279712 2458.32 891.26 2402.00
## host_location* 5 279704 3971.60 1825.76 4455.00
## host_response_time* 6 279712 2.73 1.77 2.00
## host_response_rate 7 150930 0.87 0.28 1.00
## host_acceptance_rate 8 166625 0.83 0.29 0.98
## host_is_superhost* 9 279712 2.18 0.38 2.00
## host_total_listings_count 10 279547 24.58 284.04 1.00
## host_has_profile_pic* 11 279712 3.00 0.07 3.00
## host_identity_verified* 12 279712 2.72 0.45 3.00
## neighbourhood* 13 279712 311.00 188.65 290.00
## district* 14 279712 1.36 0.95 1.00
## city* 15 279712 6.26 2.61 7.00
## latitude 16 279712 18.76 32.56 40.71
## longitude 17 279712 12.60 73.08 2.38
## property_type* 18 279712 29.77 25.79 16.00
## room_type* 19 279712 1.70 0.97 1.00
## accommodates 20 279712 3.29 2.13 2.00
## bedrooms 21 250277 1.52 1.15 1.00
## amenities* 22 279712 123206.80 70507.63 122933.50
## price 23 279712 608.79 3441.83 150.00
## minimum_nights 24 279712 8.05 31.52 2.00
## maximum_nights 25 279712 27558.60 7282875.16 1125.00
## review_scores_rating 26 188307 93.41 10.07 96.00
## review_scores_accuracy 27 187999 9.57 0.99 10.00
## review_scores_cleanliness 28 188047 9.31 1.15 10.00
## review_scores_checkin 29 187941 9.70 0.87 10.00
## review_scores_communication 30 188025 9.70 0.89 10.00
## review_scores_location 31 187937 9.63 0.83 10.00
## review_scores_value 32 187927 9.34 1.04 10.00
## instant_bookable* 33 279712 1.41 0.49 1.00
## trimmed mad min max
## listing_id 26826801.12 18961737.16 2577.00 4.834353e+07
## name* 132623.17 98477.26 1.00 2.658630e+05
## host_id 92587569.53 75850352.70 1822.00 3.901874e+08
## host_since* 2465.33 987.41 1.00 4.241000e+03
## host_location* 4061.99 1399.57 1.00 7.160000e+03
## host_response_time* 2.66 1.48 1.00 5.000000e+00
## host_response_rate 0.95 0.00 0.00 1.000000e+00
## host_acceptance_rate 0.90 0.03 0.00 1.000000e+00
## host_is_superhost* 2.10 0.00 1.00 3.000000e+00
## host_total_listings_count 2.62 1.48 0.00 7.235000e+03
## host_has_profile_pic* 3.00 0.00 1.00 3.000000e+00
## host_identity_verified* 2.77 0.00 1.00 3.000000e+00
## neighbourhood* 303.81 234.25 1.00 6.600000e+02
## district* 1.08 0.00 1.00 6.000000e+00
## city* 6.41 2.97 1.00 1.000000e+01
## latitude 21.59 12.11 -34.26 4.890000e+01
## longitude 8.35 67.55 -99.34 1.513400e+02
## property_type* 24.65 7.41 1.00 1.440000e+02
## room_type* 1.60 0.00 1.00 4.000000e+00
## accommodates 2.98 1.48 0.00 1.600000e+01
## bedrooms 1.29 0.00 1.00 5.000000e+01
## amenities* 123410.57 90894.50 1.00 2.450030e+05
## price 270.61 146.78 0.00 6.252160e+05
## minimum_nights 4.09 1.48 1.00 9.999000e+03
## maximum_nights 722.04 0.00 1.00 2.147484e+09
## review_scores_rating 95.43 5.93 20.00 1.000000e+02
## review_scores_accuracy 9.78 0.00 2.00 1.000000e+01
## review_scores_cleanliness 9.55 0.00 2.00 1.000000e+01
## review_scores_checkin 9.89 0.00 2.00 1.000000e+01
## review_scores_communication 9.90 0.00 2.00 1.000000e+01
## review_scores_location 9.81 0.00 2.00 1.000000e+01
## review_scores_value 9.54 0.00 2.00 1.000000e+01
## instant_bookable* 1.39 0.00 1.00 2.000000e+00
## range skew kurtosis se
## listing_id 4.834095e+07 -0.19 -1.25 27276.15
## name* 2.658620e+05 0.01 -1.20 145.11
## host_id 3.901856e+08 0.96 -0.36 209607.85
## host_since* 4.240000e+03 -0.02 -0.72 1.69
## host_location* 7.159000e+03 -0.50 -0.78 3.45
## host_response_time* 4.000000e+00 0.24 -1.73 0.00
## host_response_rate 1.000000e+00 -2.27 3.83 0.00
## host_acceptance_rate 1.000000e+00 -1.86 2.35 0.00
## host_is_superhost* 2.000000e+00 1.64 0.81 0.00
## host_total_listings_count 7.235000e+03 23.49 586.29 0.54
## host_has_profile_pic* 2.000000e+00 -18.82 395.80 0.00
## host_identity_verified* 2.000000e+00 -0.99 -0.96 0.00
## neighbourhood* 6.590000e+02 0.23 -1.12 0.36
## district* 5.000000e+00 2.53 5.11 0.00
## city* 9.000000e+00 -0.47 -0.63 0.00
## latitude 8.317000e+01 -0.71 -1.21 0.06
## longitude 2.506800e+02 0.49 -0.52 0.14
## property_type* 1.430000e+02 1.46 1.54 0.05
## room_type* 3.000000e+00 0.75 -1.25 0.00
## accommodates 1.600000e+01 2.20 7.44 0.00
## bedrooms 4.900000e+01 13.21 428.88 0.00
## amenities* 2.450020e+05 -0.02 -1.18 133.32
## price 6.252160e+05 61.62 7173.15 6.51
## minimum_nights 9.998000e+03 123.17 36316.63 0.06
## maximum_nights 2.147484e+09 284.22 82344.32 13770.42
## review_scores_rating 8.000000e+01 -3.76 20.00 0.02
## review_scores_accuracy 8.000000e+00 -4.21 24.05 0.00
## review_scores_cleanliness 8.000000e+00 -3.03 13.02 0.00
## review_scores_checkin 8.000000e+00 -5.26 36.58 0.00
## review_scores_communication 8.000000e+00 -5.23 35.55 0.00
## review_scores_location 8.000000e+00 -4.32 28.39 0.00
## review_scores_value 8.000000e+00 -3.25 16.30 0.00
## instant_bookable* 1.000000e+00 0.35 -1.88 0.00
# Load the CSV file into R
df_reviews <- read.csv("/Users/aritraray/Desktop/Airbnb Data/Reviews.csv")
Merging the above datasets
# Merge the datasets based on listing_id
merged_data <- merge(df_main, df_reviews, by = "listing_id", all.x = TRUE)
# Displaying the first few rows of the merged dataset
head(merged_data)
## listing_id name host_id host_since
## 1 2577 Loft for 4 by Canal Saint Martin 2827 2008-09-09
## 2 2595 Skylit Midtown Castle 2845 2008-09-09
## 3 2595 Skylit Midtown Castle 2845 2008-09-09
## 4 2595 Skylit Midtown Castle 2845 2008-09-09
## 5 2595 Skylit Midtown Castle 2845 2008-09-09
## 6 2595 Skylit Midtown Castle 2845 2008-09-09
## host_location host_response_time host_response_rate
## 1 Casablanca, Grand Casablanca, Morocco a few days or more 0.00
## 2 New York, New York, United States within a few hours 0.93
## 3 New York, New York, United States within a few hours 0.93
## 4 New York, New York, United States within a few hours 0.93
## 5 New York, New York, United States within a few hours 0.93
## 6 New York, New York, United States within a few hours 0.93
## host_acceptance_rate host_is_superhost host_total_listings_count
## 1 0.67 f 2
## 2 0.26 f 6
## 3 0.26 f 6
## 4 0.26 f 6
## 5 0.26 f 6
## 6 0.26 f 6
## host_has_profile_pic host_identity_verified neighbourhood district
## 1 t t Enclos-St-Laurent
## 2 t t Midtown Manhattan
## 3 t t Midtown Manhattan
## 4 t t Midtown Manhattan
## 5 t t Midtown Manhattan
## 6 t t Midtown Manhattan
## city latitude longitude property_type room_type accommodates
## 1 Paris 48.86993 2.36251 Entire loft Entire place 4
## 2 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 3 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 4 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 5 New York 40.75362 -73.98377 Entire apartment Entire place 2
## 6 New York 40.75362 -73.98377 Entire apartment Entire place 2
## bedrooms
## 1 2
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## amenities
## 1 ["Heating", "TV", "Iron", "Kitchen", "Essentials", "Washer", "Dryer", "Hot water", "Hangers", "Wifi", "Long term stays allowed", "Dedicated workspace", "Host greets you"]
## 2 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 3 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 4 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 5 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## 6 ["Refrigerator", "Air conditioning", "Baking sheet", "Free street parking", "Bathtub", "Kitchen", "Keypad", "Coffee maker", "Oven", "Iron", "Hangers", "Smoke alarm", "Dedicated workspace", "Fire extinguisher", "Hot water", "Long term stays allowed", "Extra pillows and blankets", "Hair dryer", "Bed linens", "Essentials", "Dishes and silverware", "TV", "Wifi", "Heating", "Paid parking off premises", "Cooking basics", "Stove", "Luggage dropoff allowed", "Cleaning before checkout", "Carbon monoxide alarm", "Ethernet connection"]
## price minimum_nights maximum_nights review_scores_rating
## 1 125 3 1125 100
## 2 100 30 1125 94
## 3 100 30 1125 94
## 4 100 30 1125 94
## 5 100 30 1125 94
## 6 100 30 1125 94
## review_scores_accuracy review_scores_cleanliness review_scores_checkin
## 1 10 10 10
## 2 9 9 10
## 3 9 9 10
## 4 9 9 10
## 5 9 9 10
## 6 9 9 10
## review_scores_communication review_scores_location review_scores_value
## 1 10 10 10
## 2 10 10 9
## 3 10 10 9
## 4 10 10 9
## 5 10 10 9
## 6 10 10 9
## instant_bookable review_id date reviewer_id
## 1 t 366217274 2019-01-02 28047930
## 2 f 2022498 2012-08-18 2124102
## 3 f 334253940 2018-10-08 56872516
## 4 f 46312 2010-05-25 117113
## 5 f 487972917 2019-07-14 60181725
## 6 f 328954829 2018-09-27 203936538
Choosing x and y
Independent Variable (x):
amenities
represents the number of
amenities in a property.
Outcome Variable (y):
review_scores_rating
represents the
average review score for a property.
# Displaying summary statistics
summary(merged_data)
## listing_id name host_id host_since
## Min. : 2577 Length:5459299 Min. : 1822 Length:5459299
## 1st Qu.: 5425967 Class :character 1st Qu.: 8939609 Class :character
## Median :14746073 Mode :character Median : 31142417 Mode :character
## Mean :16278585 Mean : 65862881
## 3rd Qu.:24504409 3rd Qu.: 96238999
## Max. :48343530 Max. :390187445
##
## host_location host_response_time host_response_rate host_acceptance_rate
## Length:5459299 Length:5459299 Min. :0.0 Min. :0.0
## Class :character Class :character 1st Qu.:1.0 1st Qu.:0.9
## Mode :character Mode :character Median :1.0 Median :1.0
## Mean :0.9 Mean :0.9
## 3rd Qu.:1.0 3rd Qu.:1.0
## Max. :1.0 Max. :1.0
## NA's :1523184 NA's :798117
## host_is_superhost host_total_listings_count host_has_profile_pic
## Length:5459299 Min. : 0.000 Length:5459299
## Class :character 1st Qu.: 1.000 Class :character
## Mode :character Median : 2.000 Mode :character
## Mean : 7.804
## 3rd Qu.: 5.000
## Max. :7235.000
## NA's :3992
## host_identity_verified neighbourhood district
## Length:5459299 Length:5459299 Length:5459299
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## city latitude longitude property_type
## Length:5459299 Min. :-34.26 Min. :-99.340 Length:5459299
## Class :character 1st Qu.: 13.76 1st Qu.:-43.377 Class :character
## Mode :character Median : 40.77 Median : 2.377 Mode :character
## Mean : 24.19 Mean : 4.104
## 3rd Qu.: 41.91 3rd Qu.: 18.394
## Max. : 48.90 Max. :151.340
##
## room_type accommodates bedrooms amenities
## Length:5459299 Min. : 0.000 Min. : 1.0 Length:5459299
## Class :character 1st Qu.: 2.000 1st Qu.: 1.0 Class :character
## Mode :character Median : 3.000 Median : 1.0 Mode :character
## Mean : 3.445 Mean : 1.5
## 3rd Qu.: 4.000 3rd Qu.: 2.0
## Max. :16.000 Max. :50.0
## NA's :550476
## price minimum_nights maximum_nights review_scores_rating
## Min. : 0.0 Min. : 1.000 Min. :1.000e+00 Min. : 20.00
## 1st Qu.: 67.0 1st Qu.: 1.000 1st Qu.:4.500e+01 1st Qu.: 93.00
## Median : 116.0 Median : 2.000 Median :1.125e+03 Median : 96.00
## Mean : 403.9 Mean : 5.989 Mean :1.346e+05 Mean : 94.58
## 3rd Qu.: 334.0 3rd Qu.: 3.000 3rd Qu.:1.125e+03 3rd Qu.: 98.00
## Max. :625216.0 Max. :9999.000 Max. :2.147e+09 Max. :100.00
## NA's :92260
## review_scores_accuracy review_scores_cleanliness review_scores_checkin
## Min. : 2.00 Min. : 2.0 Min. : 2.00
## 1st Qu.:10.00 1st Qu.: 9.0 1st Qu.:10.00
## Median :10.00 Median :10.0 Median :10.00
## Mean : 9.73 Mean : 9.5 Mean : 9.83
## 3rd Qu.:10.00 3rd Qu.:10.0 3rd Qu.:10.00
## Max. :10.00 Max. :10.0 Max. :10.00
## NA's :125419 NA's :125227 NA's :125494
## review_scores_communication review_scores_location review_scores_value
## Min. : 2.00 Min. : 2.00 Min. : 2.00
## 1st Qu.:10.00 1st Qu.:10.00 1st Qu.: 9.00
## Median :10.00 Median :10.00 Median :10.00
## Mean : 9.83 Mean : 9.75 Mean : 9.47
## 3rd Qu.:10.00 3rd Qu.:10.00 3rd Qu.:10.00
## Max. :10.00 Max. :10.00 Max. :10.00
## NA's :125390 NA's :125502 NA's :125515
## instant_bookable review_id date reviewer_id
## Length:5459299 Min. : 282 Length:5459299 Min. : 1
## Class :character 1st Qu.:166643479 Class :character 1st Qu.: 23902058
## Mode :character Median :342572666 Mode :character Median : 66978139
## Mean :348675319 Mean : 98081330
## 3rd Qu.:533404482 3rd Qu.:152893599
## Max. :735623741 Max. :390338478
## NA's :86156 NA's :86156
# Extracting the two quantitative variables of interest
accommodates <- merged_data$accommodates
review_scores <- merged_data$review_scores_rating
#Checking the lengths
length(accommodates)
## [1] 5459299
length(review_scores)
## [1] 5459299
# Calculating the correlation
correlation <- cor(accommodates, review_scores, use = "complete.obs")
# Displaying the results
cat("Correlation between Accommodates and Review Scores:", correlation, "\n")
## Correlation between Accommodates and Review Scores: -0.001033417
# Calculating the covariance
covariance <- cov(accommodates, review_scores, use = "complete.obs")
# Displaying the results
cat("Covariance between Accommodates and Review Scores:", covariance, "\n")
## Covariance between Accommodates and Review Scores: -0.009623657
Interpretation: Based on these statistical measures, there appears to be little to no meaningful linear relationship between the number of accommodates and review scores in the dataset.