We load the ‘MASS’ package for the dataset named ‘cats’.
The ‘cats’ data set is a data frame object in R. It have 144 rows and three columns. Each column is a variable. The first is a variable of factor type: F for female cats and M for male cats, the second and the third variables are of numeric type. They are body weight (in kilograms) and heart weight (in grams).
'data.frame': 144 obs. of 3 variables:
$ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ Bwt: num 2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
$ Hwt: num 7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...
Have a look at the first 6 lines of the data frame. We will use this data example to illustrate how to compare whether the slopes of two linear regressions are the same or not.
Sex Bwt Hwt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
We create a plot in the trellis graphical style. The left panel is for the relationship between heart weight and body weight of female cats, the right, that for male cats.
# conditional by gender - panels
xyplot(Hwt ~ Bwt | Sex, data=cats, type=c("p","g","r"),
xlab="Body weight (kg)",
ylab="Heart weight (g)")
It might be easier to compare slopes if they share the same frame. The impression is that the regression line for female cats (by comparison) has a larger intercept but a shallower slope, while that for male cats has a steeper slope but a smaller intercept.
# grouped by sex - in one panel
xyplot(Hwt ~ Bwt, groups=Sex, data=cats, type=c("p","g","r"),
xlab="Body weight (Kilograms)",
ylab="Heart weight (Grams)",
auto.key=TRUE)
First, we fit a simple linear model ignoring cats gender. In effect, we assume that cats regardless of gender share the same regression line.
Call:
lm(formula = Hwt ~ Bwt, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.569 -0.963 -0.092 1.043 5.124
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.357 0.692 -0.52 0.61
Bwt 4.034 0.250 16.12 <2e-16
Residual standard error: 1.45 on 142 degrees of freedom
Multiple R-squared: 0.647, Adjusted R-squared: 0.644
F-statistic: 260 on 1 and 142 DF, p-value: <2e-16
The fitted model is
Hwt = -0.36 + 4.03 Bwt
Next we include cats gender into the regression model. The term ‘Sex’ is a factor variable having two levels. The default coding in R follows the alphanumeric rule by treating ‘F’ in (‘F’ and ‘M’ for Sex) as the reference level.
Call:
lm(formula = Hwt ~ Bwt + Sex, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.583 -0.970 -0.095 1.043 5.102
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4150 0.7273 -0.57 0.57
Bwt 4.0758 0.2948 13.83 <2e-16
SexM -0.0821 0.3040 -0.27 0.79
Residual standard error: 1.46 on 141 degrees of freedom
Multiple R-squared: 0.647, Adjusted R-squared: 0.642
F-statistic: 129 on 2 and 141 DF, p-value: <2e-16
The fitted model is
Hwt = -0.42 + 4.08 Bwt for female cats,
Hwt = (-0.42 - 0.08) + 4.08 Bwt for male cats
The above model assumes different intercepts for female and male cats but the same slope for both. To assume also different slopes between cats gender, we add the ‘interaction’ term ‘Sex:Bwt’ to the model.
# For convenience: lm(Hwt ~ Bwt*Sex, data=cats)
# = summary(cats_3 <- lm(Hwt ~ Bwt*Sex, data=cats))
# a*b interpreted as a+b+a:b
summary(cats_3 <- lm(Hwt ~ Bwt + Sex + Bwt:Sex, data=cats))
Call:
lm(formula = Hwt ~ Bwt + Sex + Bwt:Sex, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.773 -1.012 -0.120 0.927 4.865
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.981 1.843 1.62 0.10796
Bwt 2.636 0.776 3.40 0.00088
SexM -4.165 2.062 -2.02 0.04526
Bwt:SexM 1.676 0.837 2.00 0.04722
Residual standard error: 1.44 on 140 degrees of freedom
Multiple R-squared: 0.657, Adjusted R-squared: 0.649
F-statistic: 89.2 on 3 and 140 DF, p-value: <2e-16
Call:
lm(formula = Hwt ~ Bwt:Sex, data = cats)
Residuals:
Min 1Q Median 3Q Max
-3.569 -0.963 -0.094 1.042 5.125
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.346 0.835 -0.41 0.68
Bwt:SexF 4.028 0.361 11.17 <2e-16
Bwt:SexM 4.031 0.285 14.13 <2e-16
Residual standard error: 1.46 on 141 degrees of freedom
Multiple R-squared: 0.647, Adjusted R-squared: 0.642
F-statistic: 129 on 2 and 141 DF, p-value: <2e-16
The fitted model is
Hwt = 2.98 + 2.64 Bwt for female cats,
Hwt = (2.98 - 4.17) + (2.64+1.68) Bwt for male cats
The ‘interaction’ term ‘Sex:Bwt’ is statistically significant translates to that rate of change in heart weight in relation body weight for male cats is larger than that for female cats.
The above three models are nested models, i.e., the latter models can be simplified to former models by removing terms in them. We can compare these models by the F-test for the full-reduced model framework via the ‘anova’ command in R.
```r
anova(cats_1, cats_2, cats_3)
Analysis of Variance Table
Model 1: Hwt ~ Bwt
Model 2: Hwt ~ Bwt + Sex
Model 3: Hwt ~ Bwt + Sex + Bwt:Sex
Res.Df RSS Df Sum of Sq F Pr(>F)
1 142 299.5
2 141 299.4 1 0.155 0.074 0.7853
3 140 291.1 1 8.332 4.008 0.0472
We see that model 2 is not different from model 1 but model 3 is different from model 2. The difference is in the gender by body weight term. The F-test gives the same p-value as in the output following the t-value of the model 3 summary. This is not a coincidence.
For the cats data example, the linear relationship between heart weight and body weight is positive and stronger for male cats than for female cats. For each additional increase in body weight by one kilogram, the increase in heart weight is about \(4.32 \pm 0.31\) grams for male cats and about \(2.64 \pm 0.78\) grams for female cats.
Note that the standard error for the slope coefficient estimate for male cats is smaller than that of the female cats. It can be obtained as follows:
Bwt Bwt:SexM
Bwt 0.602024 -0.602024
Bwt:SexM -0.602024 0.701114
[1] 0.314785
We can also get a rough estimate of this standard error by considering the data for male cats alone.
Call:
lm(formula = Hwt ~ Bwt, data = cats, subset = Sex == "M")
Residuals:
Min 1Q Median 3Q Max
-3.773 -1.048 -0.298 0.984 4.865
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.184 0.998 -1.19 0.24
Bwt 4.313 0.340 12.69 <2e-16
Residual standard error: 1.56 on 95 degrees of freedom
Multiple R-squared: 0.629, Adjusted R-squared: 0.625
F-statistic: 161 on 1 and 95 DF, p-value: <2e-16
Ans: Bwt slope in [lm(Hwt ~ Bwt, data=cats, subset=Sex==‘M’)] (4.313) = Bwt slope (2.636) + Bwt:SexM slope (1.676) in Model 3. Bwt:SexM is the interation of sex among Bwt.
Ans: The female cat’s sample size smaller than male cat