1 Data

We load the ‘MASS’ package for the dataset named ‘cats’.

library(MASS)
data(cats, package="MASS")

The ‘cats’ data set is a data frame object in R. It have 144 rows and three columns. Each column is a variable. The first is a variable of factor type: F for female cats and M for male cats, the second and the third variables are of numeric type. They are body weight (in kilograms) and heart weight (in grams).

# structure of data 
str(cats)

'data.frame':   144 obs. of  3 variables:
 $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
 $ Bwt: num  2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
 $ Hwt: num  7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...

Have a look at the first 6 lines of the data frame. We will use this data example to illustrate how to compare whether the slopes of two linear regressions are the same or not.

# as oppose to the tail end of the data
head(cats)

  Sex Bwt Hwt
1   F 2.0 7.0
2   F 2.0 7.4
3   F 2.0 9.5
4   F 2.1 7.2
5   F 2.1 7.3
6   F 2.1 7.6

2 Visualization

# load the built-in package
library(lattice)

We create a plot in the trellis graphical style. The left panel is for the relationship between heart weight and body weight of female cats, the right, that for male cats.

# conditional by gender - panels
xyplot(Hwt ~ Bwt | Sex, data=cats, type=c("p","g","r"),
       xlab="Body weight (kg)",
       ylab="Heart weight (g)")

It might be easier to compare slopes if they share the same frame. The impression is that the regression line for female cats (by comparison) has a larger intercept but a shallower slope, while that for male cats has a steeper slope but a smaller intercept.

# grouped by sex - in one panel
xyplot(Hwt ~ Bwt, groups=Sex, data=cats, type=c("p","g","r"),
       xlab="Body weight (Kilograms)",
       ylab="Heart weight (Grams)",
       auto.key=TRUE)

3 Linear regression models

First, we fit a simple linear model ignoring cats gender. In effect, we assume that cats regardless of gender share the same regression line.

summary(cats_1 <- lm(Hwt ~ Bwt, data=cats))


Call:
lm(formula = Hwt ~ Bwt, data = cats)

Residuals:
   Min     1Q Median     3Q    Max 
-3.569 -0.963 -0.092  1.043  5.124 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.357      0.692   -0.52     0.61
Bwt            4.034      0.250   16.12   <2e-16

Residual standard error: 1.45 on 142 degrees of freedom
Multiple R-squared:  0.647, Adjusted R-squared:  0.644 
F-statistic:  260 on 1 and 142 DF,  p-value: <2e-16

The fitted model is

Hwt = -0.36 + 4.03 Bwt

Next we include cats gender into the regression model. The term ‘Sex’ is a factor variable having two levels. The default coding in R follows the alphanumeric rule by treating ‘F’ in (‘F’ and ‘M’ for Sex) as the reference level.

summary(cats_2 <- lm(Hwt ~ Bwt + Sex, data=cats))


Call:
lm(formula = Hwt ~ Bwt + Sex, data = cats)

Residuals:
   Min     1Q Median     3Q    Max 
-3.583 -0.970 -0.095  1.043  5.102 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.4150     0.7273   -0.57     0.57
Bwt           4.0758     0.2948   13.83   <2e-16
SexM         -0.0821     0.3040   -0.27     0.79

Residual standard error: 1.46 on 141 degrees of freedom
Multiple R-squared:  0.647, Adjusted R-squared:  0.642 
F-statistic:  129 on 2 and 141 DF,  p-value: <2e-16

The fitted model is

Hwt = -0.42 + 4.08 Bwt for female cats,

Hwt = (-0.42 - 0.08) + 4.08 Bwt for male cats

The above model assumes different intercepts for female and male cats but the same slope for both. To assume also different slopes between cats gender, we add the ‘interaction’ term ‘Sex:Bwt’ to the model.

# For convenience: lm(Hwt ~ Bwt*Sex, data=cats)
# = summary(cats_3 <- lm(Hwt ~ Bwt*Sex, data=cats))
# a*b interpreted as a+b+a:b 
summary(cats_3 <- lm(Hwt ~ Bwt + Sex + Bwt:Sex, data=cats))


Call:
lm(formula = Hwt ~ Bwt + Sex + Bwt:Sex, data = cats)

Residuals:
   Min     1Q Median     3Q    Max 
-3.773 -1.012 -0.120  0.927  4.865 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    2.981      1.843    1.62  0.10796
Bwt            2.636      0.776    3.40  0.00088
SexM          -4.165      2.062   -2.02  0.04526
Bwt:SexM       1.676      0.837    2.00  0.04722

Residual standard error: 1.44 on 140 degrees of freedom
Multiple R-squared:  0.657, Adjusted R-squared:  0.649 
F-statistic: 89.2 on 3 and 140 DF,  p-value: <2e-16

summary(lm(Hwt ~ Bwt:Sex, data=cats))


Call:
lm(formula = Hwt ~ Bwt:Sex, data = cats)

Residuals:
   Min     1Q Median     3Q    Max 
-3.569 -0.963 -0.094  1.042  5.125 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.346      0.835   -0.41     0.68
Bwt:SexF       4.028      0.361   11.17   <2e-16
Bwt:SexM       4.031      0.285   14.13   <2e-16

Residual standard error: 1.46 on 141 degrees of freedom
Multiple R-squared:  0.647, Adjusted R-squared:  0.642 
F-statistic:  129 on 2 and 141 DF,  p-value: <2e-16

The fitted model is

Hwt = 2.98 + 2.64 Bwt for female cats,

Hwt = (2.98 - 4.17) + (2.64+1.68) Bwt for male cats

The ‘interaction’ term ‘Sex:Bwt’ is statistically significant translates to that rate of change in heart weight in relation body weight for male cats is larger than that for female cats.

The above three models are nested models, i.e., the latter models can be simplified to former models by removing terms in them. We can compare these models by the F-test for the full-reduced model framework via the ‘anova’ command in R.



```r
anova(cats_1, cats_2, cats_3)

Analysis of Variance Table

Model 1: Hwt ~ Bwt
Model 2: Hwt ~ Bwt + Sex
Model 3: Hwt ~ Bwt + Sex + Bwt:Sex
  Res.Df   RSS Df Sum of Sq     F Pr(>F)
1    142 299.5                          
2    141 299.4  1     0.155 0.074 0.7853
3    140 291.1  1     8.332 4.008 0.0472

We see that model 2 is not different from model 1 but model 3 is different from model 2. The difference is in the gender by body weight term. The F-test gives the same p-value as in the output following the t-value of the model 3 summary. This is not a coincidence.

4 Conclusions

For the cats data example, the linear relationship between heart weight and body weight is positive and stronger for male cats than for female cats. For each additional increase in body weight by one kilogram, the increase in heart weight is about \(4.32 \pm 0.31\) grams for male cats and about \(2.64 \pm 0.78\) grams for female cats.

Note that the standard error for the slope coefficient estimate for male cats is smaller than that of the female cats. It can be obtained as follows:

show(vm <- vcov(cats_3)[c(2,4), c(2,4)])

               Bwt  Bwt:SexM
Bwt       0.602024 -0.602024
Bwt:SexM -0.602024  0.701114

sqrt(vm[1,1]+vm[2,2]+2*vm[1,2])

[1] 0.314785

We can also get a rough estimate of this standard error by considering the data for male cats alone.

summary(lm(Hwt ~ Bwt, data=cats, subset=Sex=='M'))


Call:
lm(formula = Hwt ~ Bwt, data = cats, subset = Sex == "M")

Residuals:
   Min     1Q Median     3Q    Max 
-3.773 -1.048 -0.298  0.984  4.865 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -1.184      0.998   -1.19     0.24
Bwt            4.313      0.340   12.69   <2e-16

Residual standard error: 1.56 on 95 degrees of freedom
Multiple R-squared:  0.629, Adjusted R-squared:  0.625 
F-statistic:  161 on 1 and 95 DF,  p-value: <2e-16

5 Further questions

What is the difference between the slope estimate for male cats in Model 3 we have used earlier and that in the analysis of the male cats alone?

Ans: Bwt slope in [lm(Hwt ~ Bwt, data=cats, subset=Sex==‘M’)] (4.313) = Bwt slope (2.636) + Bwt:SexM slope (1.676) in Model 3. Bwt:SexM is the interation of sex among Bwt.

Why is that the standard error for the slope estimate of female cats is much larger than that for male cats?

Ans: The female cat’s sample size smaller than male cat

Cats - Things you should have known

Chang-Chun Chen

2020-09-27

1 Data

2 Visualization

3 Linear regression models

4 Conclusions

5 Further questions