title: “ANLY 512 - Problem Set 2” title: author: “Rischav Verma” date: “1/30/2019” output: html_document subtitle: Anscombe’s quartet —
The objectives of this problem set is to orient you to a number of activities in R. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html file as: YourName_ANLY512-Section-Year-Semester.html and upload it to the “Problem Set 2” assignment to your R Pubs account and submit the link to Moodle. Points will be deducted for uploading the improper format.
anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.data <- anscombe
fBasics() package!)mean(data$x1)
## [1] 9
mean(data$x2)
## [1] 9
mean(data$x3)
## [1] 9
mean(data$x4)
## [1] 9
mean(data$y1)
## [1] 7.500909
mean(data$y2)
## [1] 7.500909
mean(data$y3)
## [1] 7.5
mean(data$y4)
## [1] 7.500909
#Calculating Variance
var(data$x1)
## [1] 11
var(data$x2)
## [1] 11
var(data$x3)
## [1] 11
var(data$x4)
## [1] 11
var(data$y1)
## [1] 4.127269
var(data$y2)
## [1] 4.127629
var(data$y3)
## [1] 4.12262
var(data$y4)
## [1] 4.123249
cor(data$x1,data$y1)
## [1] 0.8164205
cor(data$x2,data$y2)
## [1] 0.8162365
cor(data$x3,data$y3)
## [1] 0.8162867
cor(data$x4,data$y4)
## [1] 0.8165214
plot(data$x1,data$y1, main="Comparing x1 and y1",
xlab="x1 ", ylab="y1 ", pch=20)
plot(data$x2,data$y2, main="Comparing x2 and y2",
xlab="x2 ", ylab="y2 ", pch=21)
plot(data$x3,data$y3, main="Comparing x3 and y3",
xlab="x3 ", ylab="y3 ", pch=22)
plot(data$x4,data$y4, main="Comparing x4 and y4",
xlab="x4 ", ylab="y4 ", pch=23)
par(mfrow=c(2,2))
plot(data$x1, data$y1, pch=20)
plot(data$x2, data$y2, pch=20)
plot(data$x3, data$y3, pch=20)
plot(data$x4, data$y4, pch=20)
lm() function.fit1 <- lm(data$y1~data$x1, data=data)
fit2 <- lm(data$y2~data$x2, data=data)
fit3 <- lm(data$y3~data$x3, data=data)
fit4 <- lm(data$y4~data$x4, data=data)
par(mfrow=c(2,2))
plot(data$x1, data$y1, pch=20)
abline(fit1)
plot(data$x2, data$y2, pch=20)
abline(fit2)
plot(data$x3, data$y3, pch=20)
abline(fit3)
plot(data$x4, data$y4, pch=20)
abline(fit4)
library(rcompanion)
compareLM(fit1,fit2,fit3,fit4)
## $Models
## Formula
## 1 "data$y1 ~ data$x1"
## 2 "data$y2 ~ data$x2"
## 3 "data$y3 ~ data$x3"
## 4 "data$y4 ~ data$x4"
##
## $Fit.criteria
## Rank Df.res AIC AICc BIC R.squared Adj.R.sq p.value Shapiro.W
## 1 2 9 39.68 43.11 40.88 0.6665 0.6295 0.002170 0.9421
## 2 2 9 39.69 43.12 40.89 0.6662 0.6292 0.002179 0.8762
## 3 2 9 39.68 43.10 40.87 0.6663 0.6292 0.002176 0.7407
## 4 2 9 39.67 43.09 40.86 0.6667 0.6297 0.002165 0.9607
## Shapiro.p
## 1 0.545600
## 2 0.092950
## 3 0.001574
## 4 0.780000
Data Visualization is a great way to find hidden insights, insights you would not find through description analytics. Even though the mean and variance were similar as per the mean and variance calculations, the scatterplots for x2y2 and x4y4 said a different story and we could see that they had outliers and were not linearly correlated to each other.
Data Visualization helps us take important decisions with respect to the outliers that we would not have found otherwise through descriptive analytics