title: “ANLY 512 - Problem Set 2” subtitle: “Anscombe’s quartet” author: “Shridhar Kulkarni” date: “11/01/2019” output: html_document —
The objectives of this problem set is to orient you to a number of activities in R
. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html
file as: YourName_ANLY512-Section-Year-Semester.html
and upload it to the “Problem Set 2” assignment to your R Pubs account and submit the link to Moodle. Points will be deducted for uploading the improper format.
anscombe
data that is part of the library(datasets)
in R
. And assign that data to a new object called data
.library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(datasets)
data <- anscombe
print(data)
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
fBasics()
package!)Mean <- apply (data, 2, mean)
print(Mean)
## x1 x2 x3 x4 y1 y2 y3 y4
## 9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909
Variance <- apply(data, 2, var)
print (Variance)
## x1 x2 x3 x4 y1 y2 y3 y4
## 11.000000 11.000000 11.000000 11.000000 4.127269 4.127629 4.122620 4.123249
Corelation <- cor(data[, 1:4], data[, 5:8])
Corelation <- c(Corelation[1, 1], Corelation[2, 2], Corelation[3, 3], Corelation[4, 4])
print (Corelation)
## [1] 0.8164205 0.8162365 0.8162867 0.8165214
plot(data$x1, data$y1, xlab="x1", ylab= "y1")
plot(data$x2, data$y3, xlab="x2", ylab= "y2")
plot(data$x3, data$y3, xlab="x3", ylab= "y3")
plot(data$x4, data$y4, xlab="x4", ylab= "y4")
4. Now change the symbols on the scatter plots to solid circles and plot them together as a 4 panel graphic
```r
par(mfrow = c(2, 2))
plot(data$x1, data$y1, xlab="x1", ylab= "y1", pch = 16)
plot(data$x2, data$y3, xlab="x2", ylab= "y2", pch = 16)
plot(data$x3, data$y3, xlab="x3", ylab= "y3", pch = 16)
plot(data$x4, data$y4, xlab="x4", ylab= "y4", pch = 16)
lm()
function.lm1 <- lm(data$y1 ~ data$x1)
lm2 <- lm(data$y2 ~ data$x2)
lm3 <- lm(data$y3 ~ data$x3)
lm4 <- lm(data$y4 ~ data$x4)
par(mfrow = c(2, 2))
with(data, plot(x1, y1, pch = 16))
abline(lm1)
with(data, plot(x2, y2, pch = 16))
abline(lm2)
with(data, plot(x3, y3, pch = 16))
abline(lm3)
with(data, plot(x4, y4, pch = 16))
abline(lm4)
summary(lm1)$adj.r.squared
## [1] 0.6294916
summary(lm2)$adj.r.squared
## [1] 0.6291578
summary(lm3)$adj.r.squared
## [1] 0.6292489
summary(lm4)$adj.r.squared
## [1] 0.6296747
Anscombe’s Quartet dataset shows us that even though certain properties of a data very similar to each other and other statistical properties may vary vastly. We can utilize visualization tools to identify these differnces and obtain better understanding when comparing datasets.