anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.data<-anscombe
x1<-data[,1]
x2<-data[,2]
x3<-data[,3]
x4<-data[,4]
y1<-data[,5]
y2<-data[,6]
y3<-data[,7]
y4<-data[,8]
fBasics() package!)summary(data)
## x1 x2 x3 x4
## Min. : 4.0 Min. : 4.0 Min. : 4.0 Min. : 8
## 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 8
## Median : 9.0 Median : 9.0 Median : 9.0 Median : 8
## Mean : 9.0 Mean : 9.0 Mean : 9.0 Mean : 9
## 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.: 8
## Max. :14.0 Max. :14.0 Max. :14.0 Max. :19
## y1 y2 y3 y4
## Min. : 4.260 Min. :3.100 Min. : 5.39 Min. : 5.250
## 1st Qu.: 6.315 1st Qu.:6.695 1st Qu.: 6.25 1st Qu.: 6.170
## Median : 7.580 Median :8.140 Median : 7.11 Median : 7.040
## Mean : 7.501 Mean :7.501 Mean : 7.50 Mean : 7.501
## 3rd Qu.: 8.570 3rd Qu.:8.950 3rd Qu.: 7.98 3rd Qu.: 8.190
## Max. :10.840 Max. :9.260 Max. :12.74 Max. :12.500
var(x1)
## [1] 11
var(y1)
## [1] 4.127269
var(x2)
## [1] 11
var(y2)
## [1] 4.127629
var(x3)
## [1] 11
var(y3)
## [1] 4.12262
var(x4)
## [1] 11
var(y4)
## [1] 4.123249
library(fBasics)
## Warning: package 'fBasics' was built under R version 3.4.2
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.2
## Loading required package: timeSeries
## Warning: package 'timeSeries' was built under R version 3.4.2
correlationTest(x1,y1)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8164
## STATISTIC:
## t: 4.2415
## P VALUE:
## Alternative Two-Sided: 0.00217
## Alternative Less: 0.9989
## Alternative Greater: 0.001085
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4244, 0.9507
## Less: -1, 0.9388
## Greater: 0.5113, 1
##
## Description:
## Sun Apr 8 16:33:17 2018
correlationTest(x2,y2)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8162
## STATISTIC:
## t: 4.2386
## P VALUE:
## Alternative Two-Sided: 0.002179
## Alternative Less: 0.9989
## Alternative Greater: 0.001089
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4239, 0.9506
## Less: -1, 0.9387
## Greater: 0.5109, 1
##
## Description:
## Sun Apr 8 16:33:17 2018
correlationTest(x3,y3)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8163
## STATISTIC:
## t: 4.2394
## P VALUE:
## Alternative Two-Sided: 0.002176
## Alternative Less: 0.9989
## Alternative Greater: 0.001088
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4241, 0.9507
## Less: -1, 0.9387
## Greater: 0.511, 1
##
## Description:
## Sun Apr 8 16:33:18 2018
correlationTest(x4,y4)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8165
## STATISTIC:
## t: 4.243
## P VALUE:
## Alternative Two-Sided: 0.002165
## Alternative Less: 0.9989
## Alternative Greater: 0.001082
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4246, 0.9507
## Less: -1, 0.9388
## Greater: 0.5115, 1
##
## Description:
## Sun Apr 8 16:33:18 2018
plot(x1, y1, main="Scatterplot between x1,y1", pch=8, col="red")
plot(x2, y2, main="Scatterplot between x2,y2", pch=8, col="green")
plot(x3, y3, main="Scatterplot between x3,y3", pch=8, col="orange")
plot(x4, y4, main="Scatterplot between x4,y4", pch=8, col="blue")
par(mfrow=c(2,2))
plot(x1,y1, main="Scatterplot between x1,y1",pch=19)
plot(x2,y2, main="Scatterplot between x2,y2",pch=19)
plot(x3,y3, main="Scatterplot between x3,y3",pch=19)
plot(x4,y4, main="Scatterplot between x4,y4",pch=19)
lm() function.fit1<-lm(y1~x1)
fit2<-lm(y2~x2)
fit3<-lm(y3~x3)
fit4<-lm(y4~x4)
par(mfrow=c(2,2))
plot(x1,y1, main="Scatterplot between x1,y1",pch=19)
abline(fit1, col="red") # regression line (y~x)
plot(x2,y2, main="Scatterplot between x2,y2",pch=19)
abline(fit2, col="green") # regression line (y~x)
plot(x3,y3, main="Scatterplot between x3,y3",pch=19)
abline(fit3, col="orange") # regression line (y~x)
plot(x4,y4, main="Scatterplot between x4,y4",pch=19)
abline(fit4, col="blue") # regression line (y~x)
anova(fit1)
Analysis of Variance Table
Response: y1 Df Sum Sq Mean Sq F value Pr(>F)
x1 1 27.510 27.5100 17.99 0.00217 ** Residuals 9 13.763 1.5292
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit2)
Analysis of Variance Table
Response: y2 Df Sum Sq Mean Sq F value Pr(>F)
x2 1 27.500 27.5000 17.966 0.002179 ** Residuals 9 13.776 1.5307
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit3)
Analysis of Variance Table
Response: y3 Df Sum Sq Mean Sq F value Pr(>F)
x3 1 27.470 27.4700 17.972 0.002176 ** Residuals 9 13.756 1.5285
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(fit4)
Analysis of Variance Table
Response: y4 Df Sum Sq Mean Sq F value Pr(>F)
x4 1 27.490 27.4900 18.003 0.002165 ** Residuals 9 13.742 1.5269
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Anscombe’s Quartet is a classic example of why data visualization is imperative. The data contains four datasets and through the simple statistical values of four datasets the results look similar. However, the graphs of four data sets are completely different. On running the visualization functions. There are so many ways we can ways to visualise the data for the concerned audience. We ran scatter plots and fit regression lines to see the relationships between variables. Thus data visualization gives us exploratory tools to better understand the data, which may seem simila at first but is different in many ways as proved in our analysis.