The objectives of this problem set is to orient you to a number of activities in R. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html file as: YourName_ANLY512-Section-Year-Semester.html and upload it to the “Problem Set 2” assignment to your R Pubs account and submit the link to Moodle. Points will be deducted for uploading the improper format.
anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.library(datasets)
data <- anscombe
View(data)
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
y1 <- data$y1
y2 <- data$y2
y3 <- data$y3
y4 <- data$y4
fBasics() package!)library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
basicStats(data)
## x1 x2 x3 x4 y1 y2
## nobs 11.000000 11.000000 11.000000 11.000000 11.000000 11.000000
## NAs 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## Minimum 4.000000 4.000000 4.000000 8.000000 4.260000 3.100000
## Maximum 14.000000 14.000000 14.000000 19.000000 10.840000 9.260000
## 1. Quartile 6.500000 6.500000 6.500000 8.000000 6.315000 6.695000
## 3. Quartile 11.500000 11.500000 11.500000 8.000000 8.570000 8.950000
## Mean 9.000000 9.000000 9.000000 9.000000 7.500909 7.500909
## Median 9.000000 9.000000 9.000000 8.000000 7.580000 8.140000
## Sum 99.000000 99.000000 99.000000 99.000000 82.510000 82.510000
## SE Mean 1.000000 1.000000 1.000000 1.000000 0.612541 0.612568
## LCL Mean 6.771861 6.771861 6.771861 6.771861 6.136083 6.136024
## UCL Mean 11.228139 11.228139 11.228139 11.228139 8.865735 8.865795
## Variance 11.000000 11.000000 11.000000 11.000000 4.127269 4.127629
## Stdev 3.316625 3.316625 3.316625 3.316625 2.031568 2.031657
## Skewness 0.000000 0.000000 0.000000 2.466911 -0.048374 -0.978693
## Kurtosis -1.528926 -1.528926 -1.528926 4.520661 -1.199123 -0.514319
## y3 y4
## nobs 11.000000 11.000000
## NAs 0.000000 0.000000
## Minimum 5.390000 5.250000
## Maximum 12.740000 12.500000
## 1. Quartile 6.250000 6.170000
## 3. Quartile 7.980000 8.190000
## Mean 7.500000 7.500909
## Median 7.110000 7.040000
## Sum 82.500000 82.510000
## SE Mean 0.612196 0.612242
## LCL Mean 6.135943 6.136748
## UCL Mean 8.864057 8.865070
## Variance 4.122620 4.123249
## Stdev 2.030424 2.030579
## Skewness 1.380120 1.120774
## Kurtosis 1.240044 0.628751
cor(x1, y1)
## [1] 0.8164205
cor(x2, y2)
## [1] 0.8162365
cor(x3, y3)
## [1] 0.8162867
cor(x4, y4)
## [1] 0.8165214
plot(x1, y1)
plot(x2, y2)
plot(x3, y3)
plot(x4, y4)
par(mfrow=c(2,2))
plot(x1, y1, pch =20)
plot(x2, y2, pch =20)
plot(x3, y3, pch =20)
plot(x4, y4, pch =20)
lm() function.lm(y1 ~ x1)
##
## Call:
## lm(formula = y1 ~ x1)
##
## Coefficients:
## (Intercept) x1
## 3.0001 0.5001
lm(y2 ~ x2)
##
## Call:
## lm(formula = y2 ~ x2)
##
## Coefficients:
## (Intercept) x2
## 3.001 0.500
lm(y3 ~ x3)
##
## Call:
## lm(formula = y3 ~ x3)
##
## Coefficients:
## (Intercept) x3
## 3.0025 0.4997
lm(y4 ~ x4)
##
## Call:
## lm(formula = y4 ~ x4)
##
## Coefficients:
## (Intercept) x4
## 3.0017 0.4999
par(mfrow=c(2,2))
plot(x1, y1, pch =20)
abline(lm(y1~x1))
plot(x2, y2, pch =20)
abline(lm(y2~x2))
plot(x3, y3, pch =20)
abline(lm(y3~x3))
plot(x4, y4, pch =20)
abline(lm(y4~x4))
anova(lm(y1~x1), test = "Chisq")
Analysis of Variance Table
Response: y1 Df Sum Sq Mean Sq F value Pr(>F)
x1 1 27.510 27.5100 17.99 0.00217 ** Residuals 9 13.763 1.5292
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(y2~x2), test = "Chisq")
Analysis of Variance Table
Response: y2 Df Sum Sq Mean Sq F value Pr(>F)
x2 1 27.500 27.5000 17.966 0.002179 ** Residuals 9 13.776 1.5307
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(y3~x3), test = "Chisq")
Analysis of Variance Table
Response: y3 Df Sum Sq Mean Sq F value Pr(>F)
x3 1 27.470 27.4700 17.972 0.002176 ** Residuals 9 13.756 1.5285
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(y4~x4), test = "Chisq")
Analysis of Variance Table
Response: y4 Df Sum Sq Mean Sq F value Pr(>F)
x4 1 27.490 27.4900 18.003 0.002165 ** Residuals 9 13.742 1.5269
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
The lesson of Anscombe’s Quartet let me have a better understanding of the power of data visualization. It shows different datasets even with similar or in part identical characteristics can become totally different when demonstrating in graphs. And it is also obvious that different data visualization model may have different performance so that it is very important to select the most proper model to visualize data more accurately.