library(rmarkdown); library(knitr); library(readxl)
set.seed(37)
library(cats)
ascend <- c(1,4,6,7)
descend <- c(7,6,4,1)
asmean <- (ascend[1]+ascend[2]+ascend[3]+ascend[4])/length(ascend)
desmean <- (descend[1]+descend[2]+descend[3]+descend[4])/length(descend)
asmean
## [1] 4.5
desmean
## [1] 4.5
assd <- sqrt(((ascend[1]-asmean)^2+(ascend[2]-asmean)^2+(ascend[3]-asmean)^2+(ascend[4]-asmean)^2)/(length(ascend)-1))
dessd <- sqrt(((descend[1]-desmean)^2+(descend[2]-desmean)^2+(descend[3]-desmean)^2+(descend[4]-desmean)^2)/(length(descend)-1))
assd
## [1] 2.645751
dessd
## [1] 2.645751
SS <- (ascend[1] - asmean)*(descend[1] - desmean)+(ascend[2] - asmean)*(descend[2] - desmean)+(ascend[3] - asmean)*(descend[3] - desmean)+(ascend[4] - asmean)*(descend[4] - desmean)
SS
## [1] -19
cov <- SS/(length(ascend)-1)
cov
## [1] -6.333333
cor <- cov/(assd*dessd)
cor
## [1] -0.9047619
pres <- data.frame(Year = c(2020,2012,2004,1996,1992,1984,1980,1976,1972,1964,1956),
Candidate = c("Truman", "Obama", "Bush", "Clinton", "Bush", "Reagan", "Carter", "Ford", "Nixon", "Johnson", "Eisenhower"),
Approval = c(45,52,48,54,34,58,37,45,59,74,68),
Margin = c(-4.4,3.9,2.4,8.5,-5.5,18.2,-9.7,-2.1,23.2,22.6,15.4))
plot(pres[,3], pres[,4], pch = 16, xlab = "Approval Rating", ylab = "Margin of Victory")
It appears that the direction is positive, likely linear, and I think it is a strong association.
The correlation is .905, which indicates a strong positive relationship, which matches what I have. The covariance is positive which also matches.
cov(pres[,3],pres[,4])
## [1] 129.9418
cor(pres[,3],pres[,4])
## [1] 0.9053949
library(readxl)
MooseWolvesData <- read_excel("C:/Users/Sarah Chock/OneDrive - University of St. Thomas/Senior Year/STAT 360 Comp Stat and Data Analysis/Exploratory Data Analysis/MooseWolvesData.xlsx", sheet = "1. population level data")
mwd <- as.data.frame(MooseWolvesData)
The centroid is at about (21.066, 1020.628)
mean(mwd[,2])
## [1] 21.06557
mean(mwd[,3])
## [1] 1020.628
plot(mwd[,2], mwd[,3], pch = 10, xlab = "Wolves", ylab = "Mooses")
abline(v = mean(mwd[,2]), h = mean(mwd[,3]))
It would appear that the covariability is likely negative, because I think there are more data points in quadrants 2 and 4. But I think it is closer to zero than to one because there's still many data points in the other quadrants.
cov(mwd[,2], mwd[,3])
## [1] -1487.304
cor(mwd[,2], mwd[,3])
## [1] -0.3423859
Based on these statistics, the covariability is negative, but it is a moderate to weak relationship since the correlation is only -.34
ans <- datasets::anscombe
head(ans)
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
Woah! All of the correlations are nearly the same.
cor(ans[,1],ans[,5])
## [1] 0.8164205
cor(ans[,2],ans[,6])
## [1] 0.8162365
cor(ans[,3],ans[,7])
## [1] 0.8162867
cor(ans[,4],ans[,8])
## [1] 0.8165214
Based on these statistics, all of the pairs appear to have very similar covariability.
plot(ans[,1], ans[,5], pch = 16, xlab = "x1", ylab = "y1")
plot(ans[,2], ans[,6], pch = 16, xlab = "x2", ylab = "y2")
plot(ans[,3], ans[,7], pch = 16, xlab = "x3", ylab = "y3")
plot(ans[,4], ans[,8], pch = 16, xlab = "x4", ylab = "y4")
Only presenting summary statistics can be very misleading. If you just looked at all of the stats of these data, you might think theey were all the same. BUT if we look at the graphs, the data are WILDLY, INSANELY, ABSURDLY different.
here_kitty()
## meow