1.
Correlation quantifies the degree to which there is a linear relationship between two sets of data. The most often used correlation measure is the Pearson correlation coefficient, which is commonly represented by the letter “r.” One does not infer causation from correlation. A correlation between two variables does not always imply that one causes the other. Correlation coefficients can be disproportionately affected by outliers, therefore looking for significant points in the data is crucial.
2.
Covariance evaluates the degree to which variations in one variable’s mean are related to variations in another variable’s mean. Covariance can be used to determine whether there is a positive or negative relationship between two variables, but it cannot reveal how strong the relationship is. When one variable is above its mean, the other variable tends to be below its mean on average, as shown by a negative covariance.
3.
I picked airbnb data set
mydata4<-read.csv("/Users/timyang/Downloads/Listings.csv")
mydata3<-read.csv("/Users/timyang/Downloads/Reviews.csv")
mydata1<-read.csv("/Users/timyang/Downloads/Reviews_data_dictionary.csv")
mydata2<-read.csv("/Users/timyang/Downloads/Listings_data_dictionary.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
inner_merged_data <- inner_join(mydata4, mydata3, by = "listing_id")
4.
library("stargazer")
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
model <- lm(mpg ~ hp + wt, data = mtcars)
stargazer(inner_merged_data, type = "text",
title = "Summary Statistics")
##
## Summary Statistics
## ===========================================================================================
## Statistic N Mean St. Dev. Min Max
## -------------------------------------------------------------------------------------------
## listing_id 5,373,143 16,029,886.000 11,986,765.000 2,577 48,263,869
## host_id 5,373,143 64,699,944.000 78,260,770.000 1,822 389,316,854
## host_response_rate 3,897,510 0.935 0.189 0.000 1.000
## host_acceptance_rate 4,624,706 0.905 0.199 0.000 1.000
## host_total_listings_count 5,369,204 6.994 25.983 0 7,235
## latitude 5,373,143 24.301 29.840 -34.264 48.904
## longitude 5,373,143 3.872 69.504 -99.340 151.340
## accommodates 5,373,143 3.449 2.023 1 16
## bedrooms 4,831,730 1.450 1.085 1 50
## price 5,373,143 395.841 2,423.261 8 300,177
## minimum_nights 5,373,143 5.927 33.363 1 9,999
## maximum_nights 5,373,143 136,727.000 16,706,862.000 1 2,147,483,647
## review_scores_rating 5,367,038 94.581 4.603 20 100
## review_scores_accuracy 5,333,880 9.729 0.516 2 10
## review_scores_cleanliness 5,334,072 9.498 0.669 2 10
## review_scores_checkin 5,333,805 9.835 0.426 2 10
## review_scores_communication 5,333,909 9.828 0.436 2 10
## review_scores_location 5,333,797 9.751 0.491 2 10
## review_scores_value 5,333,784 9.470 0.599 2 10
## review_id 5,373,143 348,675,319.000 206,101,858.000 282 735,623,741
## reviewer_id 5,373,143 98,081,330.000 90,805,956.000 1 390,338,478
## -------------------------------------------------------------------------------------------
table_html <- stargazer(model, title = "Linear Regression Results", align = TRUE, type = "html", out = NULL)
##
## <table style="text-align:center"><caption><strong>Linear Regression Results</strong></caption>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>mpg</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">hp</td><td>-0.032<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.009)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">wt</td><td>-3.878<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.633)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>37.227<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1.599)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>32</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.827</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.815</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>2.593 (df = 29)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>69.211<sup>***</sup> (df = 2; 29)</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
5.
I picked variables price and accommodates
correlation_result <- cor(inner_merged_data$price, inner_merged_data$accommodates)
print(correlation_result)
## [1] 0.08724258
covariance_result <- cov(inner_merged_data$price, inner_merged_data$accommodates)
# Print the covariance matrix result
print(covariance_result)
## [1] 427.7052
A shows that there is a positive linear relationship between price and accommodates. A positive correlation between two variables is indicated by a correlation value of 0.08. The degree and direction of a linear link between two variables are determined by the correlation coefficient. The two variables have a little positive linear association, The positive sign indicates that as price increases, accommodates tends to increase as well, and as one variable decreases, the other tends to decrease, as indicated by the correlation coefficient of 0.08.
Since covariance values are not standardized and heavily rely on the scales of the variables involved, a covariance result of 427.705 does not indicate the direction or strength of the link between the price and accommodates. Covariance quantifies the degree to which two variables vary jointly, although its magnitude by itself is not interpretable.