Haiding Luo
2023 11 27
1.Do a few Google searches and tell us what is correlation (5 lines max).
Correlation refers to the degree of mutual relationship between two or more variables. It describes the strength of connection or dependence between variables and can be used to measure the extent to which the change in one variable affects another. It can be positive, negative, or have no correlation. The range of the correlation coefficient extends from -1 to +1.
2.Do a few Google searches and tell us what is covariance (5 lines max)
Covariance is a measure used in statistics to assess the degree of correlation between variables, representing the expected error of two variables collectively. If two variables tend to vary in the same direction, meaning one is above its expected value while the other is also above its expected value, then the covariance between these two variables is positive. Conversely, if the variables vary in opposite directions, where one is above its expected value and the other is below its expected value, then the covariance between these two variables is negative.
3
getwd()
## [1] "C:/Users/pokem/OneDrive/文档"
Lebron <- read.csv("C:/Users/pokem/OneDrive/文档/lebron_playoffs.csv")
Jordan <- read.csv("C:/Users/pokem/OneDrive/文档/jordan_playoffs.csv")
dataset <- merge(Lebron,Jordan, by = "game", all = TRUE)
summary(dataset )
## game date.x series.x series_game.x
## Min. : 1.000 Length:2299 Length:2299 Min. :1.000
## 1st Qu.: 4.000 Class :character Class :character 1st Qu.:2.000
## Median : 8.000 Mode :character Mode :character Median :3.000
## Mean : 8.286 Mean :3.052
## 3rd Qu.:12.000 3rd Qu.:4.000
## Max. :23.000 Max. :7.000
##
## team.x opp.x result.x mp.x
## Length:2299 Length:2299 Length:2299 Min. :24.00
## Class :character Class :character Class :character 1st Qu.:39.00
## Mode :character Mode :character Mode :character Median :42.00
## Mean :41.29
## 3rd Qu.:44.00
## Max. :53.00
##
## fg.x fga.x fgp.x three.x
## Min. : 2.00 Min. :10.00 Min. :0.1110 Min. :0.000
## 1st Qu.: 8.00 1st Qu.:17.00 1st Qu.:0.4290 1st Qu.:1.000
## Median :10.00 Median :20.00 Median :0.5000 Median :1.000
## Mean :10.18 Mean :20.42 Mean :0.4998 Mean :1.567
## 3rd Qu.:12.00 3rd Qu.:24.00 3rd Qu.:0.5830 3rd Qu.:2.000
## Max. :20.00 Max. :38.00 Max. :0.8460 Max. :7.000
##
## threeatt.x threep.x ft.x fta.x
## Min. : 0.000 Min. :0.0000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 3.000 1st Qu.:0.1670 1st Qu.: 4.00 1st Qu.: 6.000
## Median : 5.000 Median :0.3330 Median : 7.00 Median : 9.000
## Mean : 4.677 Mean :0.3109 Mean : 6.85 Mean : 9.243
## 3rd Qu.: 6.000 3rd Qu.:0.5000 3rd Qu.: 9.00 3rd Qu.:12.000
## Max. :12.000 Max. :1.0000 Max. :18.00 Max. :24.000
## NA's :36
## ftp.x orb.x drb.x trb.x
## Min. :0.0000 Min. :0.000 Min. : 1.000 Min. : 1.000
## 1st Qu.:0.6250 1st Qu.:0.000 1st Qu.: 5.000 1st Qu.: 6.000
## Median :0.7500 Median :1.000 Median : 7.000 Median : 8.000
## Mean :0.7301 Mean :1.424 Mean : 7.469 Mean : 8.893
## 3rd Qu.:0.8570 3rd Qu.:2.000 3rd Qu.: 9.000 3rd Qu.:11.000
## Max. :1.0000 Max. :8.000 Max. :16.000 Max. :19.000
## NA's :12
## ast.x stl.x blk.x tov.x
## Min. : 1.000 Min. :0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 5.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.: 2.000
## Median : 7.000 Median :2.000 Median :1.0000 Median : 3.000
## Mean : 7.114 Mean :1.716 Mean :0.9717 Mean : 3.603
## 3rd Qu.: 9.000 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.: 5.000
## Max. :16.000 Max. :6.000 Max. :5.0000 Max. :10.000
##
## pts.x game_score.x plus_minus.x date.y
## Min. : 7.00 Min. :-0.70 Min. :-32.000 Length:2299
## 1st Qu.:23.00 1st Qu.:18.70 1st Qu.: -3.000 Class :character
## Median :29.00 Median :23.80 Median : 6.000 Mode :character
## Mean :28.77 Mean :23.67 Mean : 6.124
## 3rd Qu.:34.00 3rd Qu.:29.20 3rd Qu.: 15.000
## Max. :51.00 Max. :44.70 Max. : 46.000
##
## series.y series_game.y team.y opp.y
## Length:2299 Min. :1.000 Length:2299 Length:2299
## Class :character 1st Qu.:2.000 Class :character Class :character
## Mode :character Median :3.000 Mode :character Mode :character
## Mean :2.985
## 3rd Qu.:4.000
## Max. :7.000
## NA's :2
## result.y mp.y fg.y fga.y
## Length:2299 Min. :29.00 Min. : 3.00 Min. : 8.00
## Class :character 1st Qu.:40.00 1st Qu.: 9.00 1st Qu.:21.00
## Mode :character Median :42.00 Median :12.00 Median :25.00
## Mean :41.66 Mean :12.23 Mean :25.08
## 3rd Qu.:44.00 3rd Qu.:15.00 3rd Qu.:29.00
## Max. :57.00 Max. :24.00 Max. :45.00
## NA's :2 NA's :2 NA's :2
## fgp.y three.y threeatt.y threep.y
## Min. :0.1670 Min. :0.0000 Min. : 0.000 Min. :0.0000
## 1st Qu.:0.4290 1st Qu.:0.0000 1st Qu.: 1.000 1st Qu.:0.0000
## Median :0.4860 Median :0.0000 Median : 2.000 Median :0.2860
## Mean :0.4854 Mean :0.7993 Mean : 2.434 Mean :0.3082
## 3rd Qu.:0.5420 3rd Qu.:1.0000 3rd Qu.: 4.000 3rd Qu.:0.5000
## Max. :0.8330 Max. :6.0000 Max. :11.000 Max. :1.0000
## NA's :2 NA's :2 NA's :2 NA's :435
## ft.y fta.y ftp.y orb.y
## Min. : 0.000 Min. : 0.000 Min. :0.2500 Min. :0.00
## 1st Qu.: 5.000 1st Qu.: 6.000 1st Qu.:0.7500 1st Qu.:1.00
## Median : 8.000 Median :10.000 Median :0.8460 Median :1.00
## Mean : 8.201 Mean : 9.882 Mean :0.8235 Mean :1.72
## 3rd Qu.:11.000 3rd Qu.:13.000 3rd Qu.:0.9230 3rd Qu.:2.00
## Max. :23.000 Max. :28.000 Max. :1.0000 Max. :8.00
## NA's :2 NA's :2 NA's :26 NA's :2
## drb.y trb.y ast.y stl.y
## Min. : 0.000 Min. : 0.000 Min. : 1.000 Min. :0.000
## 1st Qu.: 3.000 1st Qu.: 4.000 1st Qu.: 4.000 1st Qu.:1.000
## Median : 4.000 Median : 6.000 Median : 5.000 Median :2.000
## Mean : 4.768 Mean : 6.488 Mean : 5.702 Mean :2.125
## 3rd Qu.: 6.000 3rd Qu.: 8.000 3rd Qu.: 7.000 3rd Qu.:3.000
## Max. :15.000 Max. :19.000 Max. :14.000 Max. :6.000
## NA's :2 NA's :2 NA's :2 NA's :2
## blk.y tov.y pts.y game_score.y
## Min. :0.0000 Min. :0.000 Min. :15.00 Min. : 2.60
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:27.00 1st Qu.:19.90
## Median :1.0000 Median :3.000 Median :32.00 Median :24.60
## Mean :0.9094 Mean :3.057 Mean :33.47 Mean :25.26
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:39.00 3rd Qu.:30.80
## Max. :5.0000 Max. :8.000 Max. :63.00 Max. :49.80
## NA's :2 NA's :2 NA's :2 NA's :2
## plus_minus.y
## Mode:logical
## NA's:2299
##
##
##
##
##
4. Create a summary statistics table of the merged dataset. stargazer package (you will have to load the package first if you do not have it installed). This will give the reader some idea about the variables in your data, and their distribution.
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(dataset, type = "text",
title = "Summary")
##
## Summary
## =================================================
## Statistic N Mean St. Dev. Min Max
## -------------------------------------------------
## game 2,299 8.286 5.241 1 23
## series_game.x 2,299 3.052 1.612 1 7
## mp.x 2,299 41.294 4.726 24 53
## fg.x 2,299 10.177 3.288 2 20
## fga.x 2,299 20.416 5.060 10 38
## fgp.x 2,299 0.500 0.118 0.111 0.846
## three.x 2,299 1.567 1.404 0 7
## threeatt.x 2,299 4.677 2.244 0 12
## threep.x 2,263 0.311 0.232 0.000 1.000
## ft.x 2,299 6.850 3.818 0 18
## fta.x 2,299 9.243 4.543 0 24
## ftp.x 2,287 0.730 0.182 0.000 1.000
## orb.x 2,299 1.424 1.340 0 8
## drb.x 2,299 7.469 2.820 1 16
## trb.x 2,299 8.893 3.175 1 19
## ast.x 2,299 7.114 2.782 1 16
## stl.x 2,299 1.716 1.203 0 6
## blk.x 2,299 0.972 1.021 0 5
## tov.x 2,299 3.603 2.087 0 10
## pts.x 2,299 28.772 8.031 7 51
## game_score.x 2,299 23.673 8.227 -0.700 44.700
## plus_minus.x 2,299 6.124 13.653 -32 46
## series_game.y 2,297 2.985 1.579 1 7
## mp.y 2,297 41.656 3.840 29 57
## fg.y 2,297 12.234 3.891 3 24
## fga.y 2,297 25.080 5.992 8 45
## fgp.y 2,297 0.485 0.095 0.167 0.833
## three.y 2,297 0.799 1.069 0 6
## threeatt.y 2,297 2.434 2.182 0 11
## threep.y 1,864 0.308 0.316 0.000 1.000
## ft.y 2,297 8.201 4.249 0 23
## fta.y 2,297 9.882 4.882 0 28
## ftp.y 2,273 0.823 0.150 0.250 1.000
## orb.y 2,297 1.720 1.398 0 8
## drb.y 2,297 4.768 2.329 0 15
## trb.y 2,297 6.488 2.839 0 19
## ast.y 2,297 5.702 2.883 1 14
## stl.y 2,297 2.125 1.479 0 6
## blk.y 2,297 0.909 1.055 0 5
## tov.y 2,297 3.057 1.891 0 8
## pts.y 2,297 33.468 8.897 15 63
## game_score.y 2,297 25.255 8.229 2.600 49.800
## -------------------------------------------------
5.
correlation <- cor(Lebron$mp , Lebron$fg)
correlation
## [1] 0.2034239
covariance <- cov(Lebron$mp , Lebron$fg)
covariance
## [1] 3.185818
The correlation value of 0.2034 indicates a weak positive linear relationship between two variables. Being close to 0 but slightly positive, it suggests that as one variable increases, the other tends to increase to some extent as well. The positive covariance of 3.18 implies a positive trend in the variation of the two variables. Overall, there is a weak positive correlation between them, meaning they tend to increase together, but this relationship is not particularly strong.