Background and Introduction

Urinary obstruction is a serious condition that puts cats’ life under risk. Recurrence of urinary obstruction after treatment is a major concern for pet owners and healthcare professionals. The purpose of this study is to find the factors contributing the most to cats needing more care. The data collected on cats before and after treatment consists of 105 instances with 93 attributes. We explore the range of predictors. Based on the available data, 72.38% of cats do not need further care, while 10.48% return to the hospital after treatment. The rest 17% do not contain information regarding further medical history due to death, euthanasia, or absence of follow-up. Since 17% is a sizeable portion, we randomly imputed the missing values while maintaining the original proportions of readmitted cats.

library(readxl)
cats <- read_excel("Data/dataset.xls")
cats <- cats |>
  rename(readmit = `Readmit?`, outdoors = `Access to outdoors?`)

Analysis

library(ggformula)
#Changing all string "NA" and "Unknown" values with actual NA
cats <- cats %>%
  mutate(across(where(is.character), ~na_if(.x, "NA"))) %>% 
  mutate(across(where(is.character), ~na_if(.x, "Unknown")))

#imputing missing NA values for cats readmission
prop_table <- cats %>%
  filter(!is.na(readmit)) %>%
  count(readmit) %>%
  mutate(prop = n / sum(n))

prob_yes <- prop_table %>%
  filter(readmit == "Yes") %>%
  pull(prop)

prob_no <- prop_table %>%
  filter(readmit == "No") %>%
  pull(prop)

set.seed(123)
cats1 <- cats %>%
  mutate(readmit = ifelse(is.na(readmit), 
                          sample(c("Yes", "No"), sum(is.na(readmit)), 
                                 replace = TRUE, prob = c(prob_yes, prob_no)), 
                          readmit))
plot <- gf_bar(~ readmit, data = cats1, format = "proportion", 
               fill = c("blue", "red")) +
   geom_bar_pattern(position = "dodge", pattern_density = 0.07) +
  labs(title = "Distribution of Cats", y = "Count", 
       x = "Recurring Urinary Obstruction") +
  theme_minimal() +
  scale_pattern_manual(values = c("No" = "stripe", "Yes" = "crosshatch"))

wrapped_text <- str_wrap("Data summaries show significant differences in certain factors between cats that required additional treatment and those that fully recovered. In particular, we focused on such components as age, breed, weight, outdoor access, size and duration of catheterization, diet, blood glucose, temperature, and water access.", width = 55)
text <- ggdraw() + 
  draw_text(wrapped_text, size = 12, hjust = 0, x = 0.05,
            gp = gpar(fontsize = 12))
plot_grid(text, plot, ncol = 2, rel_widths = c(1.5, 1))

Numerical variables analysis

Based on the graphs of numerical features, older cats tend to need additional treatment for urinary obstruction, longer catheterization is associated with lower chance of reoccuring condition, low or high weights are associated with higher risk for ongoing problem. Both low temperature and low blood glucose are linked to cat readmission. Although it is not clear whether blood glucose and temperature are causal or resulting factors in cats requiring additional treatment, we decided to include them in the study to draw healthcare providers’ attention to what first signs signal initial problems.

#Creating a binary water variable
cats1$water_binary <- ifelse(cats1$Water == "No", 0, 1)

#Calculating proportions for water access variable
prop_table <- cats1 %>%
  filter(!is.na(water_binary)) %>%
  count(water_binary) %>%
  mutate(prop = n / sum(n))

prob_yes <- prop_table %>%
  filter(water_binary == "1") %>%
  pull(prop)

prob_no <- prop_table %>%
  filter(water_binary == "0") %>%
  pull(prop)

#Imputing missing values for water access variable
set.seed(123)
cats1 <- cats1 %>%
  mutate(water_binary = ifelse(is.na(water_binary), 
                          sample(c("1", "0"), sum(is.na(water_binary)), 
                                 replace = TRUE, prob = c(prob_yes, prob_no)), 
                          water_binary))
cats1$`Size of u-cath`[cats1$`Size of u-cath` == "Slippery sam"] <- 3.5
cats1$`Size of u-cath` <- as.numeric(cats1$`Size of u-cath`)
cats1$`BG` <- as.numeric(cats1$`BG`)
#Selecting variables of interest
cats2 <- cats1 %>%
  dplyr::select(readmit, Age, Breed, `Weight (lbs)`, `outdoors`, 
                water_binary, `Hrs of cath`, `Size of u-cath`, BG, Temp, Diet) 

#Creating a graph
cats2 %>% 
  dplyr::select(Age, `Weight (lbs)`, `Hrs of cath`, BG, Temp, readmit) %>% 
  drop_na() %>% 
  pivot_longer(cols = -readmit, names_to = "type", values_to = "value") %>%
  dplyr::mutate(type =  dplyr::recode(type, 
                       "Hrs of cath" = "Catheter Duration (hrs)", 
                       "Weight (lbs)" = "Weight (lbs)", 
                       "Age" = "Age (years)",
                       "BG" = "Blood glucose",
                       "Temp" = "Temperature")) %>%
  ggplot(aes(value, fill = readmit, pattern = readmit)) + 
  geom_density_pattern(alpha = 0.5, 
                       pattern_density = 0.07, 
                       pattern_fill = "black",
                       pattern_key_scale_factor = 0.6) +  
  facet_wrap(~ type, scales = "free") +  
  theme_minimal() +  
  theme(legend.position = "right") + 
  labs(title = "Distribution of Numerical Features by Readmission Status",
       fill = "Readmission Status", pattern = "Readmission Status") +
  scale_x_continuous(name = "Measured Values") +
  scale_y_continuous(name = "Density") +
  scale_fill_manual(values = c("No" = "grey70", "Yes" = "grey40"))

Categorical variables analysis

Qualitative output reveal that cats with access to outdoors, water, and smaller catheter size don’t need additional treatment frequently. DSH breed, although constituting the biggest proportion of all observations, tends to encounter urinary obstruction again. It is not clear how diet can influence additional treatment, but it can be connected to cats’ blood glucose, temperature, or weight, so we will proceed with including it in the model.

cats2 <- cats2 %>%
  mutate(across(where(is.character), ~na_if(.x, "NA"))) %>% 
  mutate(across(where(is.character), ~na_if(.x, "Unknown")))%>% 
  mutate(across(where(is.character), ~na_if(.x, "Unknonw")))

cats2$Diet <- tolower(cats2$Diet)  #Convert to lowercase
cats2$Diet <- factor(cats2$Diet)
cats2$Diet <- factor(cats2$Diet, 
                     levels = setdiff(levels(cats2$Diet), "unknown"))

cats2$`Size of u-cath` <- factor(cats1$`Size of u-cath`)

cats2 %>% 
  drop_na() %>% 
  dplyr::select(Breed, outdoors, water_binary, `Size of u-cath`, 
                Diet, readmit) %>%
  count(Breed) %>% 
  top_n(3, n) %>% 
  pull(Breed) -> top_breeds

cats2 %>% 
  drop_na() %>% 
  dplyr::select(Breed, outdoors, water_binary, `Size of u-cath`, 
                Diet, readmit) %>% 
  filter(Breed %in% top_breeds) %>% 
  pivot_longer(c(-readmit), names_to = "type", values_to = "value") %>%
  mutate(type =  dplyr::recode(type, 
                       "Breed" = "Breed", 
                       "outdoors" = "Access to outdoors", 
                       "Size of u-cath" = "Catheter size",
                       "water_binary" = "Access to water",
                       "Diet" = "Diet"),
    value =  dplyr::recode(value, `1` = "Yes", `0` = "No")) %>%
  ggplot(aes(value, fill = readmit, pattern = readmit)) +
  geom_bar_pattern(position = "dodge", pattern_density = 0.07, 
                   pattern_fill = "black", 
                   pattern_key_scale_factor = 0.6) +
  facet_wrap(~ type, scales = "free") +
  theme_minimal() +
   theme(legend.position = "right") + 
  labs(title = "Distribution of Qualitative Features by Readmission Status",
       fill = "Readmission Status", pattern = "Readmission Status") +
  scale_fill_manual(values = c("No" = "grey70", "Yes" = "grey40"))

Fitting the model

In order to predict the likelihood of the cat being readmitted for additional treatment, we constructed a logistic regression model.

cats3 <- cats2 %>%
  drop_na() %>%
  filter(Breed %in% top_breeds) %>%
  mutate(Readmit_numeric = ifelse(readmit == "Yes", 1, 0))

logit_model <- glm(Readmit_numeric ~ `Weight (lbs)` + 
                     `Hrs of cath`+ Age + `Size of u-cath`, data = cats3, 
                   family = binomial)
summary(logit_model)
## 
## Call:
## glm(formula = Readmit_numeric ~ `Weight (lbs)` + `Hrs of cath` + 
##     Age + `Size of u-cath`, family = binomial, data = cats3)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)   
## (Intercept)       -4.55349    1.94847  -2.337  0.01944 * 
## `Weight (lbs)`     0.23077    0.12043   1.916  0.05534 . 
## `Hrs of cath`     -0.04664    0.02317  -2.012  0.04417 * 
## Age                0.29927    0.11173   2.679  0.00739 **
## `Size of u-cath`5 -1.63523    0.78519  -2.083  0.03729 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 77.801  on 89  degrees of freedom
## Residual deviance: 61.713  on 85  degrees of freedom
## AIC: 71.713
## 
## Number of Fisher Scoring iterations: 6
null_model <- glm(Readmit_numeric ~ 1, family = binomial, data = cats3)
log_lik_null <- logLik(null_model)
log_lik_full <- logLik(logit_model)
mcfadden_r2 <- 1 -(log_lik_full/log_lik_null)
mcfadden_r2

Results and discussion

After testing 10 variables for significance, only 4 appeared to be significant. Notably, age, weight, hours of catheterization, and size of catheter appear to be significant in predicting the outcome of further treatment, as they have p-values below the significance level of 0.05. Weight (lbs) has a p-value of 0.0544, which is still below the significance level of 0.1, so we keep it in the model. Water, diet, breed, outdoor access, blood glucose, temperature have high p-values, thus, they don’t add a lot of explanatory power to the model.

After performing a logistic regression on the available data, we created the following logistic equation: \(\pi = \text{Probability of Readmission}\) \[\log(\frac{\widehat{\pi}}{1-\widehat{\pi}}) = −4.74494 + 0.22939 (Weight(lbs)) − 0.04480(CatheterizationDuration) +\] \[ 0.32203(Age) − 1.76548(SizeOfCatheter5)\]

Pseudo R-squared value for this model is 0.2115582, so it explains 21.16% of the variability in the response variable. The results of the logistic model suggest that for each additional pound of weight, the odds of readmission increase by about 25.8%. For each additional hour of catheterization, the odds of readmission decrease by about 4.4%. For each additional year of age, the odds of readmission increase by about 38%. Having a catheter size of 5 reduces the odds of readmission by about 83% compared to cats with other catheter sizes.

While cat owners and healthcare providers can’t influence the risks of cat urinary obstruction associated with age, they should recommend pet owners to control cats’ weight, as additional pounds contribute to risks of the condition. While treating the cat, veterenarians should favor catheter of size 5 and increase the duration of catheterization.