1.

Correlation quantifies the degree to which there is a linear relationship between two sets of data. The most often used correlation measure is the Pearson correlation coefficient, which is commonly represented by the letter “r.” One does not infer causation from correlation. A correlation between two variables does not always imply that one causes the other. Correlation coefficients can be disproportionately affected by outliers, therefore looking for significant points in the data is crucial.

2.

Covariance evaluates the degree to which variations in one variable’s mean are related to variations in another variable’s mean. Covariance can be used to determine whether there is a positive or negative relationship between two variables, but it cannot reveal how strong the relationship is. When one variable is above its mean, the other variable tends to be below its mean on average, as shown by a negative covariance.

3.

I picked airbnb data set

mydata4<-read.csv("/Users/timyang/Downloads/Listings.csv")
mydata3<-read.csv("/Users/timyang/Downloads/Reviews.csv")
mydata1<-read.csv("/Users/timyang/Downloads/Reviews_data_dictionary.csv") 
mydata2<-read.csv("/Users/timyang/Downloads/Listings_data_dictionary.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
inner_merged_data <- inner_join(mydata4, mydata3, by = "listing_id")

4.

library("stargazer")
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
model <- lm(mpg ~ hp + wt, data = mtcars)
stargazer(inner_merged_data, type = "text", 
          title = "Summary Statistics")
## 
## Summary Statistics
## ===========================================================================================
## Statistic                       N          Mean          St. Dev.       Min        Max     
## -------------------------------------------------------------------------------------------
## listing_id                  5,373,143 16,029,886.000  11,986,765.000   2,577   48,263,869  
## host_id                     5,373,143 64,699,944.000  78,260,770.000   1,822   389,316,854 
## host_response_rate          3,897,510      0.935           0.189       0.000      1.000    
## host_acceptance_rate        4,624,706      0.905           0.199       0.000      1.000    
## host_total_listings_count   5,369,204      6.994          25.983         0        7,235    
## latitude                    5,373,143     24.301          29.840      -34.264    48.904    
## longitude                   5,373,143      3.872          69.504      -99.340    151.340   
## accommodates                5,373,143      3.449           2.023         1         16      
## bedrooms                    4,831,730      1.450           1.085         1         50      
## price                       5,373,143     395.841        2,423.261       8       300,177   
## minimum_nights              5,373,143      5.927          33.363         1        9,999    
## maximum_nights              5,373,143   136,727.000   16,706,862.000     1    2,147,483,647
## review_scores_rating        5,367,038     94.581           4.603        20         100     
## review_scores_accuracy      5,333,880      9.729           0.516         2         10      
## review_scores_cleanliness   5,334,072      9.498           0.669         2         10      
## review_scores_checkin       5,333,805      9.835           0.426         2         10      
## review_scores_communication 5,333,909      9.828           0.436         2         10      
## review_scores_location      5,333,797      9.751           0.491         2         10      
## review_scores_value         5,333,784      9.470           0.599         2         10      
## review_id                   5,373,143 348,675,319.000 206,101,858.000   282    735,623,741 
## reviewer_id                 5,373,143 98,081,330.000  90,805,956.000     1     390,338,478 
## -------------------------------------------------------------------------------------------
table_html <- stargazer(model, title = "Linear Regression Results", align = TRUE, type = "html", out = NULL)
## 
## <table style="text-align:center"><caption><strong>Linear Regression Results</strong></caption>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>mpg</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">hp</td><td>-0.032<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.009)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">wt</td><td>-3.878<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.633)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>37.227<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1.599)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>32</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.827</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.815</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>2.593 (df = 29)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>69.211<sup>***</sup> (df = 2; 29)</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>

5.

I picked variables price and accommodates

correlation_result <- cor(inner_merged_data$price, inner_merged_data$accommodates)
print(correlation_result)
## [1] 0.08724258
covariance_result <- cov(inner_merged_data$price, inner_merged_data$accommodates)

# Print the covariance matrix result
print(covariance_result)
## [1] 427.7052

A shows that there is a positive linear relationship between price and accommodates. A positive correlation between two variables is indicated by a correlation value of 0.08. The degree and direction of a linear link between two variables are determined by the correlation coefficient. The two variables have a little positive linear association, The positive sign indicates that as price increases, accommodates tends to increase as well, and as one variable decreases, the other tends to decrease, as indicated by the correlation coefficient of 0.08.

Since covariance values are not standardized and heavily rely on the scales of the variables involved, a covariance result of 427.705 does not indicate the direction or strength of the link between the price and accommodates. Covariance quantifies the degree to which two variables vary jointly, although its magnitude by itself is not interpretable.