These commands help visualize relationships between two quantitative variables, calculate correlation coefficients, and model relationships using least-squares regression. Some commands require mosaic (noted in comments) so you should install that package.

library(mosaic)

We will use the KidsFeet data from the mosaicData package

# Foot length and width on a bunch of kids
data(KidsFeet)
head(KidsFeet) #extra
##     name birthmonth birthyear length width sex biggerfoot domhand
## 1  David          5        88   24.4   8.4   B          L       R
## 2   Lars         10        87   25.4   8.8   B          L       L
## 3   Zach         12        87   24.5   9.7   B          R       R
## 4   Josh          1        88   25.2   9.8   B          L       R
## 5   Lang          2        88   25.1   8.9   B          L       R
## 6 Scotty          3        88   25.7   9.7   B          R       R

Visualizing relationship between two quantitative variables using scatterplots

Make a figure of width as a function of length.

# 
ggplot(KidsFeet, aes(x=length, y=width)) +
  geom_point(col="darkblue")

Add a least squares regression line - method one.

# 
ggplot(KidsFeet, aes(x=length, y=width)) +
  geom_point(col="darkblue") +
  geom_lm(col="red")  #from mosaic

Add a least squares regression line - method two - also includes confidence interval.

# 
ggplot(KidsFeet, aes(x=length, y=width)) +
  geom_point(col="darkblue") +
  geom_smooth(method=lm, col="red")

Calculate correlation coefficient

Examine the correlation coefficient between length and width

# this syntax depends on mosaic
cor(width~length, data=KidsFeet)
## [1] 0.6410961
# this syntax fine either way
cor(KidsFeet$length, KidsFeet$width)
## [1] 0.6410961

Constructing and analyzing regression model using lm()

Run a linear regression model and view coefficient point estimates

# lm() constructs a linear model of a quantitative variable
# if predictor is a categorical factor, this is ANOVA
# if predictor is quantitative, this is regression
feet.lm <- lm(width~length, data=KidsFeet)
# lm() just makes the model, to look at an ANOVA table we can use anova()
feet.lm
## 
## Call:
## lm(formula = width ~ length, data = KidsFeet)
## 
## Coefficients:
## (Intercept)       length  
##      2.8623       0.2479

The summary() command provides both the \(R^2\) value (“Multiple R-squared”) and \(t\)-tests for the null hypothesis that intercept \(\beta_0 = 0\) and slope \(\beta-1=0\):

summary(feet.lm)
## 
## Call:
## lm(formula = width ~ length, data = KidsFeet)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83864 -0.31056 -0.00892  0.27622  0.76300 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8623     1.2081   2.369   0.0232 *  
## length        0.2480     0.0488   5.081  1.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3963 on 37 degrees of freedom
## Multiple R-squared:  0.411,  Adjusted R-squared:  0.3951 
## F-statistic: 25.82 on 1 and 37 DF,  p-value: 1.097e-05

Get confidence interval for the coefficients using confint()

confint(feet.lm, level=0.95)
##                 2.5 %    97.5 %
## (Intercept) 0.4144776 5.3100746
## length      0.1490758 0.3468197

Assessing model conditions

Check normality of residuals

Find residual values for all the data points using residuals(), then make a histogram and a q-q plot.

KidsFeet$res.width <- residuals(feet.lm)
histogram(KidsFeet$res.width)

xqqmath(KidsFeet$res.width)
## Warning in qqmath.numeric(x, data = data, panel = panel, ...): explicit 'data'
## specification ignored

Make residual plot to check for linearity, homoscedasticity and outliers

ggplot(KidsFeet, aes(x=length, y=res.width)) +
  geom_point() +
  geom_hline(yintercept=0, lty="dashed")

A bunch of diagnostic plots using plot()

# the first two are a kind of residual plot and a qqplot
plot(feet.lm)