Data Analysis

Meowtrix Analyst: Cats Data

Introduction

Many hypothesize that a cat’s body weight directly influences its heart size. To quantify this relationship and examine potential moderating factors, I analyzed the cats dataset from the MASS package in R. This dataset records body weight (Bwt, kg), heart weight (Hwt, g), and sex (Sex) for 144 domestic cats, providing a robust sample for statistical exploration.

Visualization

I began by visualizing the raw association between body weight and heart weight using a scatterplot.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(MASS)

Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select
library(ggfortify)

data("cats")

raw_plot <- cats %>%
  ggplot(aes(x = Bwt, y = Hwt)) +
  geom_point(size = 3, alpha = 0.6) +
  labs(
    x = "Body Weight (kg)",
    y = "Heart Weight (g)",
    title = "Association between Body Weight and Heart Weight in Cats"
  ) +
  theme_bw()

raw_plot

The plot revealed a clear positive trend: heavier cats tend to have larger hearts. However, the relationship appeared slightly curvilinear, and variance increased with body weight.

Model Fit

1. Simple Linear Model
I first fit an ordinary least squares (OLS) regression model:

fit_linear <- lm(Hwt ~ Bwt, data = cats)
summary(fit_linear)

Call:
lm(formula = Hwt ~ Bwt, data = cats)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5694 -0.9634 -0.0921  1.0426  5.1238 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.3567     0.6923  -0.515    0.607    
Bwt           4.0341     0.2503  16.119   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.452 on 142 degrees of freedom
Multiple R-squared:  0.6466,    Adjusted R-squared:  0.6441 
F-statistic: 259.8 on 1 and 142 DF,  p-value: < 2.2e-16

The linear model was highly significant (p < 0.001) with R² = 0.6466.

2. Polynomial Model
To address the curvature, I fit a second-degree polynomial model:

fit_poly <- lm(Hwt ~ poly(Bwt, 2), data = cats)
summary(fit_poly)

Call:
lm(formula = Hwt ~ poly(Bwt, 2), data = cats)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9935 -1.0341 -0.1467  1.0140  4.3705 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    10.6306     0.1202  88.451   <2e-16 ***
poly(Bwt, 2)1  23.4114     1.4422  16.233   <2e-16 ***
poly(Bwt, 2)2   2.4996     1.4422   1.733   0.0853 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.442 on 141 degrees of freedom
Multiple R-squared:  0.654, Adjusted R-squared:  0.6491 
F-statistic: 133.3 on 2 and 141 DF,  p-value: < 2.2e-16
autoplot(fit_poly)
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.

While the quadratic term was not statistically significant at α=0.05 (p = 0.085), it captured subtle curvature in the relationship. The model explained 65.4% of variance, a negligible increase from the linear model’s 64.7%. The polynomial fit visually captured the nonlinearity:

raw_plot + 
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), 
              color = "hotpink", se = FALSE)

The polynomial fit was overlaid on the scatterplot. While it captures slight curvature in the data, the curve remains nearly linear, reflecting the weak quadratic effect observed in the model (p = 0.085).

3. Exploring Sex as a Moderator
Visual inspection by sex suggested different relationships:

sex_plot <- cats %>%
  ggplot(aes(x = Bwt, y = Hwt, color = Sex)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_manual(values = c("darkred", "darkblue")) +
  labs(
    x = "Body Weight (kg)",
    y = "Heart Weight (g)",
    title = "Body vs Heart Weight by Sex",
    color = "Sex"
  ) +
  theme_bw() +
  theme(legend.position = "bottom")

sex_plot
`geom_smooth()` using formula = 'y ~ x'

Male cats showed both higher intercepts and steeper slopes, indicating a potential interaction effect. Visually, the regression lines are not parallel, suggesting that the effect of body weight on heart weight differs between sexes.

Final Model
I fit an interaction model to formally test whether the body weight-heart weight relationship differs by sex:

fit_final <- lm(Hwt ~ Bwt * Sex, data = cats)
summary(fit_final)

Call:
lm(formula = Hwt ~ Bwt * Sex, data = cats)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7728 -1.0118 -0.1196  0.9272  4.8646 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.9813     1.8428   1.618 0.107960    
Bwt           2.6364     0.7759   3.398 0.000885 ***
SexM         -4.1654     2.0618  -2.020 0.045258 *  
Bwt:SexM      1.6763     0.8373   2.002 0.047225 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.442 on 140 degrees of freedom
Multiple R-squared:  0.6566,    Adjusted R-squared:  0.6493 
F-statistic: 89.24 on 3 and 140 DF,  p-value: < 2.2e-16

The model showed significant interaction (p = 0.047), confirming the visual pattern.

Interpretation:

  • Intercept (2.98 g): Estimated heart weight for a female cat at 0 kg body weight (theoretical and non-significant, p = 0.108).

  • Bwt (2.64 g/kg): For female cats, each additional kg of body weight is associated with a 2.64 g increase in heart weight (p < 0.001).

  • SexM (-4.17 g): The intercept difference: male cats have 4.17 g lighter hearts than females at Bwt = 0 kg (p = 0.045). This counterintuitive result occurs because the intercept is extrapolated beyond the data range.

  • Bwt:SexM (1.68 g/kg): The effect of body weight is stronger in males; their total slope is 2.64 + 1.68 = 4.31 g/kg.

Thus, the relationship differs substantially by sex:

  • Female slope: 2.64 g heart weight per kg body weight

  • Male slope: 4.31 g heart weight per kg body weight

The model explained 65.7% of variance (R² = 0.657), a marginal improvement over previous specifications. Residual diagnostics showed acceptable patterns with no severe OLS violations.

confint(fit_final)
                 2.5 %      97.5 %
(Intercept) -0.6620801  6.62470490
Bwt          1.1024137  4.17041438
SexM        -8.2416012 -0.08919944
Bwt:SexM     0.0208271  3.33170228

While all coefficients except the intercept were statistically significant at α = 0.05, confidence intervals revealed substantial uncertainty. The interaction effect (Bwt:SexM) spanned from nearly zero (0.02) to substantial (3.33 g/kg), indicating that the male-specific slope increase, though likely present, is not precisely estimated.

Conclusion

This analysis confirmed the existence of a positive relationship between body weight and heart weight in domestic cats. However, the key discovery was that this dependence differs significantly depending on gender: in cats, the increase in heart mass per kilogram of body weight (approximately 4.3 g/kg) was almost 1.6 times stronger than in cats (2.6 g/kg). Although the polynomial model showed slight non-linearity, the biologically more significant and interpretable model was the one taking into account the interaction of gender and weight, explaining about 65.7% of the variability in the data.

Practical conclusions

Veterinary diagnostics and standards: When assessing heart health and establishing heart mass standards for breeds, it is necessary to take into account not only the overall size of the animal, but also its gender. A “normal” heart for a cat will be heavier than for a cat of comparable weight. This can help in more accurate early diagnosis of cardiomyopathies.

Biological studies: Differences in allometric scales (the ratio of organ and body size) between the sexes may indicate the effect of sex hormones on the development of the cardiovascular system or be related to differences in muscle mass. This is a direction for further physiological research.

Experiment planning: In studies where heart mass is a dependent variable, gender should be included in the model as a covariate or interaction factor, and not just as a correction, in order to avoid bias in estimates.

Limitations of the study

Correlation, not causation: The study is based on observational data, which does not allow establishing a causal relationship. A larger heart can be either a cause or a consequence of a larger body size or overall physical activity level.

Uncertainty of the interaction assessment: Although the interaction effect (different slope for different genders) statistically significant, its confidence interval is wide (0.02–3.33 g/kg). This means that the true magnitude of the effect can be either minimal or sufficiently pronounced. A more accurate estimate requires data with a larger sample.

Limited sample size: The sample includes 144 domestic cats without taking into account breed, age, castration and conditions of detention. These factors can affect the body weight to heart ratio. The results cannot be automatically extrapolated to wild members of the felidae family or to other species.

Imperfection of residues: Slight signs of heteroskedasticity (non-constant variance) and deviations from normality are observed in diagnostic graphs of residues. Although the model remains informative, it indicates that not all patterns in the data are taken into account.

Directions for future research

  • Inclusion of additional covariates in the model: age, castration status, breed.

  • Using more complex models (for example, mixed models) if the data is collected from several clinics or nurseries.

  • Verification of the stability of the conclusion about the interaction on independent samples.

Despite the limitations, the analysis convincingly shows that the relationship between body size and heart size in cats is not universal and is significantly modified by the sex of the animal. This is an important example of how the simple inclusion of an interaction factor in a regression model allows us to identify deeper biological patterns.