The objectives of this problem set is to orient you to a number of activities in R. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html file as: YourName_ANLY512-Section-Year-Semester.html and upload it to the “Problem Set 2” assignmenet on Moodle.
anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.#Install the anscombe data from library(datasets) and assign as request:
library(datasets)
data<-anscombe
x1<-data[,1]
x2<-data[,2]
x3<-data[,3]
x4<-data[,4]
y1<-data[,5]
y2<-data[,6]
y3<-data[,7]
y4<-data[,8]
fBasics() package!)library(fBasics)
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
## Loading required package: timeSeries
summary(data)
## x1 x2 x3 x4
## Min. : 4.0 Min. : 4.0 Min. : 4.0 Min. : 8
## 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 8
## Median : 9.0 Median : 9.0 Median : 9.0 Median : 8
## Mean : 9.0 Mean : 9.0 Mean : 9.0 Mean : 9
## 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.: 8
## Max. :14.0 Max. :14.0 Max. :14.0 Max. :19
## y1 y2 y3 y4
## Min. : 4.260 Min. :3.100 Min. : 5.39 Min. : 5.250
## 1st Qu.: 6.315 1st Qu.:6.695 1st Qu.: 6.25 1st Qu.: 6.170
## Median : 7.580 Median :8.140 Median : 7.11 Median : 7.040
## Mean : 7.501 Mean :7.501 Mean : 7.50 Mean : 7.501
## 3rd Qu.: 8.570 3rd Qu.:8.950 3rd Qu.: 7.98 3rd Qu.: 8.190
## Max. :10.840 Max. :9.260 Max. :12.74 Max. :12.500
#mean and variance for each column
mean(x1)
## [1] 9
var(x1)
## [1] 11
mean(x2)
## [1] 9
var(x2)
## [1] 11
mean(x3)
## [1] 9
var(x3)
## [1] 11
mean(x4)
## [1] 9
var(x4)
## [1] 11
mean(y1)
## [1] 7.500909
var(y1)
## [1] 4.127269
mean(y2)
## [1] 7.500909
var(y2)
## [1] 4.127629
mean(y3)
## [1] 7.5
var(y3)
## [1] 4.12262
mean(y4)
## [1] 7.500909
var(y4)
## [1] 4.123249
#correlation between x and y
correlationTest(x1,y1)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8164
## STATISTIC:
## t: 4.2415
## P VALUE:
## Alternative Two-Sided: 0.00217
## Alternative Less: 0.9989
## Alternative Greater: 0.001085
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4244, 0.9507
## Less: -1, 0.9388
## Greater: 0.5113, 1
##
## Description:
## Wed May 2 22:21:27 2018
correlationTest(x2,y2)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8162
## STATISTIC:
## t: 4.2386
## P VALUE:
## Alternative Two-Sided: 0.002179
## Alternative Less: 0.9989
## Alternative Greater: 0.001089
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4239, 0.9506
## Less: -1, 0.9387
## Greater: 0.5109, 1
##
## Description:
## Wed May 2 22:21:27 2018
correlationTest(x3,y3)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8163
## STATISTIC:
## t: 4.2394
## P VALUE:
## Alternative Two-Sided: 0.002176
## Alternative Less: 0.9989
## Alternative Greater: 0.001088
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4241, 0.9507
## Less: -1, 0.9387
## Greater: 0.511, 1
##
## Description:
## Wed May 2 22:21:27 2018
correlationTest(x4,y4)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8165
## STATISTIC:
## t: 4.243
## P VALUE:
## Alternative Two-Sided: 0.002165
## Alternative Less: 0.9989
## Alternative Greater: 0.001082
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4246, 0.9507
## Less: -1, 0.9388
## Greater: 0.5115, 1
##
## Description:
## Wed May 2 22:21:27 2018
#Create scatter plots:
plot(x1,y1, main="Scatterplot for x1 and y1")
plot(x2,y2, main="Scatterplot for x2 and y2")
plot(x3,y3, main="Scatterplot for x3 and y4")
plot(x4,y4, main="Scatterplot for x4 and y4")
#Reformat the plots:
par(mfrow=c(2,2))
plot(x1,y1, main="Scatterplot between x1,y1",pch=19)
plot(x2,y2, main="Scatterplot between x2,y2",pch=19)
plot(x3,y3, main="Scatterplot between x3,y3",pch=19)
plot(x4,y4, main="Scatterplot between x4,y4",pch=19)
lm() function.#Using liinear regression model:
fit1<-lm(y1~x1)
fit2<-lm(y2~x2)
fit3<-lm(y3~x3)
fit4<-lm(y4~x4)
# add regression line (y~x)
par(mfrow=c(2,2))
plot(x1,y1, main="Scatterplot between x1,y1",pch=19)
abline(fit1, col="red")
plot(x2,y2, main="Scatterplot between x2,y2",pch=19)
abline(fit2, col="red")
plot(x3,y3, main="Scatterplot between x3,y3",pch=19)
abline(fit3, col="red")
plot(x4,y4, main="Scatterplot between x4,y4",pch=19)
abline(fit4, col="red")
#Use ANOVA to compare each model
anova(fit1)
Analysis of Variance Table
Response: y1 Df Sum Sq Mean Sq F value Pr(>F)
x1 1 27.510 27.5100 17.99 0.00217 ** Residuals 9 13.763 1.5292
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit2)
Analysis of Variance Table
Response: y2 Df Sum Sq Mean Sq F value Pr(>F)
x2 1 27.500 27.5000 17.966 0.002179 ** Residuals 9 13.776 1.5307
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit3)
Analysis of Variance Table
Response: y3 Df Sum Sq Mean Sq F value Pr(>F)
x3 1 27.470 27.4700 17.972 0.002176 ** Residuals 9 13.756 1.5285
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit4)
Analysis of Variance Table
Response: y4 Df Sum Sq Mean Sq F value Pr(>F)
x4 1 27.490 27.4900 18.003 0.002165 ** Residuals 9 13.742 1.5269
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1