STAT 360: Computational Statistics and Data Analysis

Load R Libraries, Import and Attach Relevant Data, and Specify Seed

library(rmarkdown); library(knitr); library(readxl)
set.seed(37)
library(cats)

EXERCISE 01

Part (a)

ascend <- c(1,4,6,7)
descend <- c(7,6,4,1)
asmean <- (ascend[1]+ascend[2]+ascend[3]+ascend[4])/length(ascend)
desmean <- (descend[1]+descend[2]+descend[3]+descend[4])/length(descend)
asmean
## [1] 4.5
desmean
## [1] 4.5
assd <- sqrt(((ascend[1]-asmean)^2+(ascend[2]-asmean)^2+(ascend[3]-asmean)^2+(ascend[4]-asmean)^2)/(length(ascend)-1))
dessd <- sqrt(((descend[1]-desmean)^2+(descend[2]-desmean)^2+(descend[3]-desmean)^2+(descend[4]-desmean)^2)/(length(descend)-1))
assd
## [1] 2.645751
dessd
## [1] 2.645751

Part (b)

SS <- (ascend[1] - asmean)*(descend[1] - desmean)+(ascend[2] - asmean)*(descend[2] - desmean)+(ascend[3] - asmean)*(descend[3] - desmean)+(ascend[4] - asmean)*(descend[4] - desmean)
SS
## [1] -19

Part (c)

cov <- SS/(length(ascend)-1)
cov
## [1] -6.333333

Part (d)

cor <- cov/(assd*dessd)
cor
## [1] -0.9047619

EXERCISE 02

Part (a)

pres <- data.frame(Year = c(2020,2012,2004,1996,1992,1984,1980,1976,1972,1964,1956),
Candidate = c("Truman", "Obama", "Bush", "Clinton", "Bush", "Reagan", "Carter", "Ford", "Nixon", "Johnson", "Eisenhower"),
Approval = c(45,52,48,54,34,58,37,45,59,74,68),
Margin = c(-4.4,3.9,2.4,8.5,-5.5,18.2,-9.7,-2.1,23.2,22.6,15.4))

Part (b)

plot(pres[,3], pres[,4], pch = 16, xlab = "Approval Rating", ylab = "Margin of Victory")

Part (c)

It appears that the direction is positive, likely linear, and I think it is a strong association.

Part (d)

The correlation is .905, which indicates a strong positive relationship, which matches what I have. The covariance is positive which also matches.

cov(pres[,3],pres[,4])
## [1] 129.9418
cor(pres[,3],pres[,4])
## [1] 0.9053949

EXERCISE 03

Part (a)

library(readxl)
MooseWolvesData <- read_excel("C:/Users/Sarah Chock/OneDrive - University of St. Thomas/Senior Year/STAT 360 Comp Stat and Data Analysis/Exploratory Data Analysis/MooseWolvesData.xlsx", sheet = "1. population level data")
mwd <- as.data.frame(MooseWolvesData)

Part (b)

The centroid is at about (21.066, 1020.628)

mean(mwd[,2])
## [1] 21.06557
mean(mwd[,3])
## [1] 1020.628

Part (c)

plot(mwd[,2], mwd[,3], pch = 10, xlab = "Wolves", ylab = "Mooses")
abline(v = mean(mwd[,2]), h = mean(mwd[,3]))

Part (d)

It would appear that the covariability is likely negative, because I think there are more data points in quadrants 2 and 4. But I think it is closer to zero than to one because there's still many data points in the other quadrants.

Part (e)

cov(mwd[,2], mwd[,3])
## [1] -1487.304
cor(mwd[,2], mwd[,3])
## [1] -0.3423859

Part (f)

Based on these statistics, the covariability is negative, but it is a moderate to weak relationship since the correlation is only -.34

EXERCISE 04

Part (a)

ans <- datasets::anscombe
head(ans)
##   x1 x2 x3 x4   y1   y2    y3   y4
## 1 10 10 10  8 8.04 9.14  7.46 6.58
## 2  8  8  8  8 6.95 8.14  6.77 5.76
## 3 13 13 13  8 7.58 8.74 12.74 7.71
## 4  9  9  9  8 8.81 8.77  7.11 8.84
## 5 11 11 11  8 8.33 9.26  7.81 8.47
## 6 14 14 14  8 9.96 8.10  8.84 7.04

Part (b)

Woah! All of the correlations are nearly the same.

cor(ans[,1],ans[,5])
## [1] 0.8164205
cor(ans[,2],ans[,6])
## [1] 0.8162365
cor(ans[,3],ans[,7])
## [1] 0.8162867
cor(ans[,4],ans[,8])
## [1] 0.8165214

Part (c)

Based on these statistics, all of the pairs appear to have very similar covariability.

Part (d)

plot(ans[,1], ans[,5], pch = 16, xlab = "x1", ylab = "y1")

plot(ans[,2], ans[,6], pch = 16, xlab = "x2", ylab = "y2")

plot(ans[,3], ans[,7], pch = 16, xlab = "x3", ylab = "y3")

plot(ans[,4], ans[,8], pch = 16, xlab = "x4", ylab = "y4")

Part (e)

Only presenting summary statistics can be very misleading. If you just looked at all of the stats of these data, you might think theey were all the same. BUT if we look at the graphs, the data are WILDLY, INSANELY, ABSURDLY different.

CATS

here_kitty()

## meow