The objectives of this problem set is to orient you to a number of activities in R
. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html
file as: YourName_ANLY512-Section-Year-Semester.html
and upload it to the “Problem Set 2” assignmenet on Moodle.
anscombe
data that is part of the library(datasets)
in R
. And assign that data to a new object called data
.library(datasets)
data <- anscombe
fBasics()
package!)library(fBasics)
## Warning: package 'fBasics' was built under R version 3.4.2
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.2
## Loading required package: timeSeries
## Warning: package 'timeSeries' was built under R version 3.4.2
colAvgs(data)
## x1 x2 x3 x4 y1 y2 y3 y4
## 9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909
correlationTest(data$x1,data$y1)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8164
## STATISTIC:
## t: 4.2415
## P VALUE:
## Alternative Two-Sided: 0.00217
## Alternative Less: 0.9989
## Alternative Greater: 0.001085
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4244, 0.9507
## Less: -1, 0.9388
## Greater: 0.5113, 1
##
## Description:
## Mon Nov 20 23:52:07 2017
correlationTest(data$x2,data$y2)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8162
## STATISTIC:
## t: 4.2386
## P VALUE:
## Alternative Two-Sided: 0.002179
## Alternative Less: 0.9989
## Alternative Greater: 0.001089
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4239, 0.9506
## Less: -1, 0.9387
## Greater: 0.5109, 1
##
## Description:
## Mon Nov 20 23:52:07 2017
correlationTest(data$x3,data$y3)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8163
## STATISTIC:
## t: 4.2394
## P VALUE:
## Alternative Two-Sided: 0.002176
## Alternative Less: 0.9989
## Alternative Greater: 0.001088
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4241, 0.9507
## Less: -1, 0.9387
## Greater: 0.511, 1
##
## Description:
## Mon Nov 20 23:52:07 2017
correlationTest(data$x4,data$y4)
##
## Title:
## Pearson's Correlation Test
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8165
## STATISTIC:
## t: 4.243
## P VALUE:
## Alternative Two-Sided: 0.002165
## Alternative Less: 0.9989
## Alternative Greater: 0.001082
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4246, 0.9507
## Less: -1, 0.9388
## Greater: 0.5115, 1
##
## Description:
## Mon Nov 20 23:52:07 2017
plot(data$x1,data$y1)
plot(data$x2,data$y2)
plot(data$x3,data$y3)
plot(data$x4,data$y4)
par(mfrow= c(2,2))
plot(data$x1, data$y1, type = 'p', col = 'red', pch=16)
plot(data$x2, data$y2, type = 'p', col = 'red', pch=16)
plot(data$x3, data$y3, type = 'p', col = 'red', pch=16)
plot(data$x4, data$y4, type = 'p', col = 'red', pch=16)
lm()
function.lm(data$y1 ~ data$x1)
##
## Call:
## lm(formula = data$y1 ~ data$x1)
##
## Coefficients:
## (Intercept) data$x1
## 3.0001 0.5001
lm(data$y2 ~ data$x2)
##
## Call:
## lm(formula = data$y2 ~ data$x2)
##
## Coefficients:
## (Intercept) data$x2
## 3.001 0.500
lm(data$y3 ~ data$x3)
##
## Call:
## lm(formula = data$y3 ~ data$x3)
##
## Coefficients:
## (Intercept) data$x3
## 3.0025 0.4997
lm(data$y4 ~ data$x4)
##
## Call:
## lm(formula = data$y4 ~ data$x4)
##
## Coefficients:
## (Intercept) data$x4
## 3.0017 0.4999
summary(lm(data$y1 ~ data$x1))
##
## Call:
## lm(formula = data$y1 ~ data$x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.92127 -0.45577 -0.04136 0.70941 1.83882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0001 1.1247 2.667 0.02573 *
## data$x1 0.5001 0.1179 4.241 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
## F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
summary(lm(data$y2 ~ data$x2))
##
## Call:
## lm(formula = data$y2 ~ data$x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9009 -0.7609 0.1291 0.9491 1.2691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.001 1.125 2.667 0.02576 *
## data$x2 0.500 0.118 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
summary(lm(data$y3 ~ data$x3))
##
## Call:
## lm(formula = data$y3 ~ data$x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1586 -0.6146 -0.2303 0.1540 3.2411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0025 1.1245 2.670 0.02562 *
## data$x3 0.4997 0.1179 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
summary(lm(data$y4 ~ data$x4))
##
## Call:
## lm(formula = data$y4 ~ data$x4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.751 -0.831 0.000 0.809 1.839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0017 1.1239 2.671 0.02559 *
## data$x4 0.4999 0.1178 4.243 0.00216 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297
## F-statistic: 18 on 1 and 9 DF, p-value: 0.002165
par(mfrow= c(2,2))
plot(data$x1, data$y1, type = 'p', col = 'red', pch=16)
abline(lm(data$y1 ~ data$x1))
plot(data$x2, data$y2, type = 'p', col = 'red', pch=16)
abline(lm(data$y2 ~ data$x2))
plot(data$x3, data$y3, type = 'p', col = 'red', pch=16)
abline(lm(data$y3 ~ data$x3))
plot(data$x4, data$y4, type = 'p', col = 'red', pch=16)
abline(lm(data$y4 ~ data$x4))
summary(lm(data$y1 ~ data$x1))
Call: lm(formula = data\(y1 ~ data\)x1)
Residuals: Min 1Q Median 3Q Max -1.92127 -0.45577 -0.04136 0.70941 1.83882
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0001 1.1247 2.667 0.02573 * data$x1 0.5001 0.1179 4.241 0.00217 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 1.237 on 9 degrees of freedom Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295 F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
summary(lm(data$y2 ~ data$x2))
Call: lm(formula = data\(y2 ~ data\)x2)
Residuals: Min 1Q Median 3Q Max -1.9009 -0.7609 0.1291 0.9491 1.2691
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.001 1.125 2.667 0.02576 * data$x2 0.500 0.118 4.239 0.00218 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 1.237 on 9 degrees of freedom Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292 F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
summary(lm(data$y3 ~ data$x3))
Call: lm(formula = data\(y3 ~ data\)x3)
Residuals: Min 1Q Median 3Q Max -1.1586 -0.6146 -0.2303 0.1540 3.2411
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0025 1.1245 2.670 0.02562 * data$x3 0.4997 0.1179 4.239 0.00218 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 1.236 on 9 degrees of freedom Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292 F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
summary(lm(data$y4 ~ data$x4))
Call: lm(formula = data\(y4 ~ data\)x4)
Residuals: Min 1Q Median 3Q Max -1.751 -0.831 0.000 0.809 1.839
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0017 1.1239 2.671 0.02559 * data$x4 0.4999 0.1178 4.243 0.00216 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 1.236 on 9 degrees of freedom Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297 F-statistic: 18 on 1 and 9 DF, p-value: 0.002165
anova(lm(data$y1 ~ data$x1))
Analysis of Variance Table
Response: data\(y1 Df Sum Sq Mean Sq F value Pr(>F) data\)x1 1 27.510 27.5100 17.99 0.00217 ** Residuals 9 13.763 1.5292
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(data$y2 ~ data$x2))
Analysis of Variance Table
Response: data\(y2 Df Sum Sq Mean Sq F value Pr(>F) data\)x2 1 27.500 27.5000 17.966 0.002179 ** Residuals 9 13.776 1.5307
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(data$y3 ~ data$x3))
Analysis of Variance Table
Response: data\(y3 Df Sum Sq Mean Sq F value Pr(>F) data\)x3 1 27.470 27.4700 17.972 0.002176 ** Residuals 9 13.756 1.5285
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
anova(lm(data$y4 ~ data$x4))
Analysis of Variance Table
Response: data\(y4 Df Sum Sq Mean Sq F value Pr(>F) data\)x4 1 27.490 27.4900 18.003 0.002165 ** Residuals 9 13.742 1.5269
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Anscombe’s Quartet is a group of four datasets that appear to be similar when using typical summary statistics, yet tell four different stories when graphed. it dispays the importance of data visualization since we can reorganize and reconstructured our data as needed.