Students will learn to create scatterplots and describe the relationships between two numeric variables.
body_wgt<-c(120, 187, 109, 103, 131, 165, 158, 116)
backpack_wgt<-c(26, 30, 26, 24, 29, 35, 31, 28)
backpack_df<-data.frame(body_wgt, backpack_wgt)
Which variable should be the response and which should be the explantory?
## YOUR ANSWER HERE ##
library(tidyverse)
ggplot(backpack_df, aes(x=body_wgt, y=backpack_wgt))+
geom_point()+
xlab("Body Weight (lb)")+
ylab("Backpack Weight (lb)")+
ggtitle("Scatterplot of Backpack Weight vs Body Weight")
When looking at a scatterplot you want to be able to describe the overall pattern and for striking departures from that pattern.
You can describe the overall pattern of a scatterplot by the:
How would you describe the above scatterplot?
### YOUR ANSWER HERE ###
cor(body_wgt, backpack_wgt)
## [1] 0.7946927
What happens when you switch the order of the variables?
## YOU TRY IT
First: Load in the data
data("anscombe")
str(anscombe)
## 'data.frame': 11 obs. of 8 variables:
## $ x1: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x2: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x3: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x4: num 8 8 8 8 8 8 8 19 8 8 ...
## $ y1: num 8.04 6.95 7.58 8.81 8.33 ...
## $ y2: num 9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 ...
## $ y3: num 7.46 6.77 12.74 7.11 7.81 ...
## $ y4: num 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 ...
Directions:
If your birthday is:
x1
and
y1
x2
and y2
x3
and
y3
x4
and
y4
Complete the following tasks:
x
and y
variablescor(anscombe$x2, anscombe$y2)
## [1] 0.8162365
mean(anscombe$x2)
## [1] 9
sd(anscombe$x2)
## [1] 3.316625
mean(anscombe$y2)
## [1] 7.500909
sd(anscombe$y2)
## [1] 2.031657
## SPACE FOR YOUR WORK ##
ggplot(anscombe, aes(x2, y2))+
geom_point()
lm(y2~x2, data=anscombe)
##
## Call:
## lm(formula = y2 ~ x2, data = anscombe)
##
## Coefficients:
## (Intercept) x2
## 3.001 0.500
ggplot(backpack_df, aes(x=body_wgt, y=backpack_wgt))+
geom_point()+
geom_smooth(method="lm", se=FALSE, color="red", lty=2)+
xlab("Body Weight (lb)")+
ylab("Backpack Weight (lb)")+
ggtitle("Scatterplot of Backpack Weight vs Body Weight")
## `geom_smooth()` using formula 'y ~ x'
What is this line?
# Y~X
# Y~X
mod<-lm(backpack_wgt~body_wgt, data=backpack_df)
summary(mod)
##
## Call:
## lm(formula = backpack_wgt ~ body_wgt, data = backpack_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2444 -1.2750 0.1133 0.9308 3.7532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.26493 3.93692 4.131 0.00614 **
## body_wgt 0.09080 0.02831 3.207 0.01844 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.27 on 6 degrees of freedom
## Multiple R-squared: 0.6315, Adjusted R-squared: 0.5701
## F-statistic: 10.28 on 1 and 6 DF, p-value: 0.01844
Residual Plot:
res_df<-backpack_df%>%
cbind(res=mod$residuals)
ggplot(res_df, aes(body_wgt, res))+
geom_point()+
geom_hline(yintercept=0, color="red", lwd=1, lty=2)+
ggtitle("Residual Plot")
QQ Plot:
qqnorm(mod$residuals)
qqline(mod$residuals)