The objectives of this problem set is to orient you to a number of activities in R. And to conduct a thoughtful exercise in appreciating the importance of data visualization. For each question create a code chunk or text response that completes/answers the activity or question requested. Finally, upon completion name your final output .html file as: YourName_ANLY512-Section-Year-Semester.html and upload it to the “Problem Set 2” assignment to your R Pubs account and submit the link to Moodle. Points will be deducted for uploading the improper format.
library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
anscombe data that is part of the library(datasets) in R. And assign that data to a new object called data.#lload library & assign dataset as data
data("anscombe")
data <- anscombe
fBasics() package!)#table for mean & variance for each variables
data.frame(Variable = c("x1","x2","x3","x3","y1","y2","y3","y4"), Mean = c(mean(data$x1),mean(data$x2),mean(data$x3),mean(data$x4),mean(data$y1),mean(data$y2),mean(data$y3),mean(data$y4)), Variance = c(var(data$x1),var(data$x2),var(data$x3),var(data$x4),var(data$y1),var(data$y2),var(data$y3),var(data$y4)))
## Variable Mean Variance
## 1 x1 9.000000 11.000000
## 2 x2 9.000000 11.000000
## 3 x3 9.000000 11.000000
## 4 x3 9.000000 11.000000
## 5 y1 7.500909 4.127269
## 6 y2 7.500909 4.127629
## 7 y3 7.500000 4.122620
## 8 y4 7.500909 4.123249
#correlation test (pearsons) for each x,y pairs
fBasics::correlationTest(data$x1,data$y1,method = c("pearson"),title="x1 vs y1")
##
## Title:
## x1 vs y1
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8164
## STATISTIC:
## t: 4.2415
## P VALUE:
## Alternative Two-Sided: 0.00217
## Alternative Less: 0.9989
## Alternative Greater: 0.001085
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4244, 0.9507
## Less: -1, 0.9388
## Greater: 0.5113, 1
##
## Description:
## Sun Apr 21 21:45:51 2019
fBasics::correlationTest(data$x2,data$y2,method = c("pearson"),title="x2 vs y2")
##
## Title:
## x2 vs y2
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8162
## STATISTIC:
## t: 4.2386
## P VALUE:
## Alternative Two-Sided: 0.002179
## Alternative Less: 0.9989
## Alternative Greater: 0.001089
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4239, 0.9506
## Less: -1, 0.9387
## Greater: 0.5109, 1
##
## Description:
## Sun Apr 21 21:45:51 2019
fBasics::correlationTest(data$x3,data$y3,method = c("pearson"),title="x3 vs y3")
##
## Title:
## x3 vs y3
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8163
## STATISTIC:
## t: 4.2394
## P VALUE:
## Alternative Two-Sided: 0.002176
## Alternative Less: 0.9989
## Alternative Greater: 0.001088
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4241, 0.9507
## Less: -1, 0.9387
## Greater: 0.511, 1
##
## Description:
## Sun Apr 21 21:45:51 2019
fBasics::correlationTest(data$x4,data$y4,method = c("pearson"),title="x4 vs y4")
##
## Title:
## x4 vs y4
##
## Test Results:
## PARAMETER:
## Degrees of Freedom: 9
## SAMPLE ESTIMATES:
## Correlation: 0.8165
## STATISTIC:
## t: 4.243
## P VALUE:
## Alternative Two-Sided: 0.002165
## Alternative Less: 0.9989
## Alternative Greater: 0.001082
## CONFIDENCE INTERVAL:
## Two-Sided: 0.4246, 0.9507
## Less: -1, 0.9388
## Greater: 0.5115, 1
##
## Description:
## Sun Apr 21 21:45:51 2019
#x1 vs y1
plot(data$x1,data$y1,main="Scatter Plot of x1 vs y1")
plot(data$x2,data$y2,main="Scatter Plot of x2 vs y2")
plot(data$x3,data$y3,main="Scatter Plot of x3 vs y3")
plot(data$x4,data$y4,main="Scatter Plot of x4 vs y4")
par(mfrow=c(2,2))
plot(data$x1,data$y1,main="Scatter Plot of x1 vs y1", pch = 16)
plot(data$x2,data$y2,main="Scatter Plot of x2 vs y2", pch = 16)
plot(data$x3,data$y3,main="Scatter Plot of x3 vs y3", pch = 16)
plot(data$x4,data$y4,main="Scatter Plot of x4 vs y4", pch = 16)
lm() function.# save the regression model of each pair of dataset as objects
#x1 vs y1
lm1 <- lm(x1~y1,data=data)
#x2 vs y2
lm2 <- lm(x2~y2,data=data)
#x3 vs y3
lm3 <- lm(x3~y3,data=data)
#x4 vs y4
lm4 <- lm(x4~y4,data=data)
par(mfrow=c(2,2))
plot(data$x1,data$y1,main="Scatter Plot of x1 vs y1 with regression line.", pch = 16)
abline(lm1)
plot(data$x2,data$y2,main="Scatter Plot of x2 vs y2 with regression line.", pch = 16)
abline(lm2)
plot(data$x3,data$y3,main="Scatter Plot of x3 vs y3 with regression line.", pch = 16)
abline(lm3)
plot(data$x4,data$y4,main="Scatter Plot of x4 vs y4 with regression line.", pch = 16)
abline(lm4)
# print summary to compare models
#x1 vs y1
summary(lm1)
Call: lm(formula = x1 ~ y1, data = data)
Residuals: Min 1Q Median 3Q Max -2.6522 -1.5117 -0.2657 1.2341 3.8946
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9975 2.4344 -0.410 0.69156
y1 1.3328 0.3142 4.241 0.00217 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 2.019 on 9 degrees of freedom Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295 F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
#x2 vs y2
summary(lm2)
Call: lm(formula = x2 ~ y2, data = data)
Residuals: Min 1Q Median 3Q Max -1.8516 -1.4315 -0.3440 0.8467 4.2017
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9948 2.4354 -0.408 0.69246
y2 1.3325 0.3144 4.239 0.00218 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 2.02 on 9 degrees of freedom Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292 F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
#x3 vs y3
summary(lm3)
Call: lm(formula = x3 ~ y3, data = data)
Residuals: Min 1Q Median 3Q Max -2.9869 -1.3733 -0.0266 1.3200 3.2133
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.0003 2.4362 -0.411 0.69097
y3 1.3334 0.3145 4.239 0.00218 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 2.019 on 9 degrees of freedom Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292 F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
#x4 vs y4
summary(lm4)
Call: lm(formula = x4 ~ y4, data = data)
Residuals: Min 1Q Median 3Q Max -2.7859 -1.4122 -0.1853 1.4551 3.3329
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.0036 2.4349 -0.412 0.68985
y4 1.3337 0.3143 4.243 0.00216 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 2.018 on 9 degrees of freedom Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297 F-statistic: 18 on 1 and 9 DF, p-value: 0.002165 The r-squared of the models are around the same, not so strong.