These commands help visualize relationships between two quantitative variables, calculate correlation coefficients, and model relationships using least-squares regression. Some commands require mosaic (noted in comments) so you should install that package.
library(mosaic)
We will use the KidsFeet data from the mosaicData package
# Foot length and width on a bunch of kids
data(KidsFeet)
head(KidsFeet) #extra
## name birthmonth birthyear length width sex biggerfoot domhand
## 1 David 5 88 24.4 8.4 B L R
## 2 Lars 10 87 25.4 8.8 B L L
## 3 Zach 12 87 24.5 9.7 B R R
## 4 Josh 1 88 25.2 9.8 B L R
## 5 Lang 2 88 25.1 8.9 B L R
## 6 Scotty 3 88 25.7 9.7 B R R
Make a figure of width as a function of length.
#
ggplot(KidsFeet, aes(x=length, y=width)) +
geom_point(col="darkblue")
Add a least squares regression line - method one.
#
ggplot(KidsFeet, aes(x=length, y=width)) +
geom_point(col="darkblue") +
geom_lm(col="red") #from mosaic
Add a least squares regression line - method two - also includes confidence interval.
#
ggplot(KidsFeet, aes(x=length, y=width)) +
geom_point(col="darkblue") +
geom_smooth(method=lm, col="red")
Examine the correlation coefficient between length and width
# this syntax depends on mosaic
cor(width~length, data=KidsFeet)
## [1] 0.6410961
# this syntax fine either way
cor(KidsFeet$length, KidsFeet$width)
## [1] 0.6410961
lm()Run a linear regression model and view coefficient point estimates
# lm() constructs a linear model of a quantitative variable
# if predictor is a categorical factor, this is ANOVA
# if predictor is quantitative, this is regression
feet.lm <- lm(width~length, data=KidsFeet)
# lm() just makes the model, to look at an ANOVA table we can use anova()
feet.lm
##
## Call:
## lm(formula = width ~ length, data = KidsFeet)
##
## Coefficients:
## (Intercept) length
## 2.8623 0.2479
The summary() command provides both the \(R^2\) value (“Multiple R-squared”) and \(t\)-tests for the null hypothesis that intercept \(\beta_0 = 0\) and slope \(\beta-1=0\):
summary(feet.lm)
##
## Call:
## lm(formula = width ~ length, data = KidsFeet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83864 -0.31056 -0.00892 0.27622 0.76300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8623 1.2081 2.369 0.0232 *
## length 0.2480 0.0488 5.081 1.1e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3963 on 37 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.3951
## F-statistic: 25.82 on 1 and 37 DF, p-value: 1.097e-05
Get confidence interval for the coefficients using confint()
confint(feet.lm, level=0.95)
## 2.5 % 97.5 %
## (Intercept) 0.4144776 5.3100746
## length 0.1490758 0.3468197
Find residual values for all the data points using residuals(), then make a histogram and a q-q plot.
KidsFeet$res.width <- residuals(feet.lm)
histogram(KidsFeet$res.width)
xqqmath(KidsFeet$res.width)
## Warning in qqmath.numeric(x, data = data, panel = panel, ...): explicit 'data'
## specification ignored
ggplot(KidsFeet, aes(x=length, y=res.width)) +
geom_point() +
geom_hline(yintercept=0, lty="dashed")
A bunch of diagnostic plots using plot()
# the first two are a kind of residual plot and a qqplot
plot(feet.lm)