The objectives of this problem set is to orient you to a number of activities in R. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html file as: YourName_ANLY512-Section-Year-Semester.html and upload it to the “Problem Set 2” assignmenet on Moodle.
anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.data<-anscombe
fBasics() package!)library(fBasics)
## Warning: package 'fBasics' was built under R version 3.3.3
## Loading required package: timeDate
## Loading required package: timeSeries
colStats(anscombe,FUN = mean)
## x1 x2 x3 x4 y1 y2 y3 y4
## 9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909
colStats(anscombe, FUN= var)
## x1 x2 x3 x4 y1 y2 y3
## 11.000000 11.000000 11.000000 11.000000 4.127269 4.127629 4.122620
## y4
## 4.123249
x1<-data[,1]
x2<-data[,2]
x3<-data[,3]
x4<-data[,4]
y1<-data[,5]
y2<-data[,6]
y3<-data[,7]
y4<-data[,8]
cor(x1,y1)
## [1] 0.8164205
cor(x2,y2)
## [1] 0.8162365
cor(x3,y3)
## [1] 0.8162867
cor(x4,y4)
## [1] 0.8165214
plot(x1,y1, main="Scatterplot for X1 and Y1")
plot(x2,y2, main="Scatterplot for X2 and Y2")
plot(x3,y3, main="Scatterplot for X3 and Y3")
plot(x4,y4, main="Scatterplot for X4 and Y4")
par(mfrow=c(2,2))
plot(x1,y1,pch=16)
plot(x2,y2,pch=16)
plot(x3,y3,pch=16)
plot(x4,y4,pch=16)
lm() function.fit1 <- lm(y1~x1)
fit2 <- lm(y2~x2)
fit3 <- lm(y3~x3)
fit4 <- lm(y4~x4)
par(mfrow=c(2,2))
plot(x1,y1,pch=16)
abline(fit1)
plot(x2,y2,pch=16)
abline(fit2)
plot(x3,y3,pch=16)
abline(fit3)
plot(x4,y4,pch=16)
abline(fit4)
anova(fit1)
## Analysis of Variance Table
##
## Response: y1
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 27.510 27.5100 17.99 0.00217 **
## Residuals 9 13.763 1.5292
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit2)
## Analysis of Variance Table
##
## Response: y2
## Df Sum Sq Mean Sq F value Pr(>F)
## x2 1 27.500 27.5000 17.966 0.002179 **
## Residuals 9 13.776 1.5307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit3)
## Analysis of Variance Table
##
## Response: y3
## Df Sum Sq Mean Sq F value Pr(>F)
## x3 1 27.470 27.4700 17.972 0.002176 **
## Residuals 9 13.756 1.5285
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit4)
## Analysis of Variance Table
##
## Response: y4
## Df Sum Sq Mean Sq F value Pr(>F)
## x4 1 27.490 27.4900 18.003 0.002165 **
## Residuals 9 13.742 1.5269
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The summary statistics have shown similar responses for all four fit models. In particular, fit 1 and fit 2 both demonstrate positive correlation. However, the data visualization has shown that the fit1 is less linear than fit 2.
It is also seen that the fit 3 and fit 4 are very vunerable to outliers, as shown in the visualization.