Project 1 Dogs

Author

Ehiggs

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggfortify)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
dogs <- read_csv("Project1dogset.csv")
Rows: 700 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): Breed, KC, SEX, NEUTER
dbl (15): ID, BOAS, Age, WEIGHT, BCS, CFR, MUZZLELENGTH, CRANIALLENGTH, CRAN...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dogs2 <- read_csv("Project2dogset.csv")
Rows: 154 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): Breed, Pure/Cross, SEX, NEUTER
dbl (15): ID, BOAS, Age, BRED, SHOWN, BCS, CFR, MUZZLELENGTH, CRANIALLENGTH,...
num  (1): WEIGHT

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

write a brief introduction that describes your dataset topic and the variables (you may need to define them for your audience), and establish what you plan to explore. You also MUST identify the source for your dataset

The following analysis is regarding the correlation between Brachycephalic Obstructive Airway Syndrome and smaller craniofacial ratios with different dog breeds. In other words, the chances of a dog having issues breathing due to a flat face. Brachycephalic Obstructive Airway Syndrome, or issues breathing, will be abbreviated as “BOAS”, with 1 meaning that the dog has it and 0 meaning that the dog doesn’t. A flat face is the “CFR”, with the larger the number, the longer their muzzle is. The source for the dataset is the Royal Vererinary College and the link is: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0137496

I got the data from the “Supporting Information” section of the article “Impact of Facial Conformation on Canine Health: Brachycephalic Obstructive Airway Syndrome”.

Filtering out all crossbreeds so the comparison is purely with full bred dogs.

I’m not sure why, but my computer will NOT let me do the filters in a single chunk of code. It stops seeing “Breed” as an object when I try to do it in one chunk. YES, I am removing the outliers. BOAS is based off of other factors such as collapsed nostrils and some breeds have more risk for that and it doesn’t relate to the Cranial Facial Ratio as much. Think of the data as being for dogs with “normal” airflow.

filterdog <- dogs |>
  filter(!str_detect(Breed, "CROSS BREED")) #"str_detect" command is from  https://www.spsanderson.com/steveondata/posts/2024-05-23/
filterdog2 <- filterdog |>
  filter(!str_detect(Breed, "SHIH TZU"))
filterdog3 <- filterdog2 |>
  filter(!str_detect(Breed, "DOGUE DE BORDEAUX"))
filterdog4 <- filterdog3 |>
  filter(!str_detect(Breed, "FRENCH BULLDOG"))

Finding the average of each value and grouping each datapoint for each column by the breed of dog.

newdogfilter <- filterdog4 |> 
      group_by(Breed) |>
      summarise(avgBOAS = mean(BOAS)*100, 
                avgWeight = mean(WEIGHT),
                avgBCS = mean(BCS),
                avgCFR = mean(CFR),
                avgMuzzle = mean(MUZZLELENGTH),
                avgHeadLength = mean(CRANIALLENGTH),
                avgHeadWidth = mean(CRANIALWIDTH),
                avgLowerJawWidth = mean(LOWERJAWWIDTH),
                avgPC1 = mean(PC1),
                avgPC2 = mean(PC2),
                avgNeckLength = mean(NECKLENGTH),
                avgNeckwidth = mean(NECKGIRTH),
                avgNaresRatio = mean(NARESRATIO))

removing breeds that don’t have any signs of BOAS.

newdogfilterfilter <- newdogfilter |>
  filter(avgBOAS != 0)

This is a trial data plot. It does not have enough data points to convincingly display my hypothesis.

plot1 <- ggplot(newdogfilterfilter , aes(x= avgCFR , y= avgBOAS, group = Breed, colour = Breed, size = avgHeadLength)) +
  scale_color_brewer(palette = "Spectral")+
    geom_smooth(method = "lm", formula = y ~ x)+
  geom_point() +
  theme_dark()+
    labs(x = "Cranial Facial Raito", y = "Brachycephalic Obstructive Airway Syndrome")
plot1
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Chance of Brachycephalic Obstructive Airway Syndrome = -194.57(Average Cranial Facial Ratio) + 95.25

cor(newdogfilterfilter$avgBOAS, newdogfilterfilter$avgCFR)
[1] -0.8966039
fit1 <- lm(avgBOAS ~ avgCFR, data = newdogfilterfilter)
summary(fit1)

Call:
lm(formula = avgBOAS ~ avgCFR, data = newdogfilterfilter)

Residuals:
   Min     1Q Median     3Q    Max 
-19.95 -14.54   8.61   9.75  16.30 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    95.25      10.46   9.108 3.95e-05 ***
avgCFR       -194.57      36.32  -5.357  0.00106 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.12 on 7 degrees of freedom
Multiple R-squared:  0.8039,    Adjusted R-squared:  0.7759 
F-statistic:  28.7 on 1 and 7 DF,  p-value: 0.001056
#I kept getting errors about continuous and discrete palettes, so I used the website https://datascientistdude.medium.com/r-coding-errors-suck-continuous-value-supplied-to-discrete-scale-26887112bcc7     to help me turn the data into something I could color.

#I also wanted to have three colors on my graph, so I used this website as a reference for using conditional statements: https://sparkbyexamples.com/r-programming/replace-values-based-on-condition-in-r/

filterdog$BOAS[filterdog$BOAS == 0 & filterdog$NARESRATIO >= 1.00] <- 2

filterdog$Breathing_Problems <- cut(filterdog$BOAS, breaks = c(-Inf,0.5, 1, Inf), labels = c("No", "Yes", "Yes but in green"))
plot2 <- ggplot(filterdog , aes(x= CFR , y= NARESRATIO))+
  geom_point(aes(color = Breathing_Problems)) +
  geom_smooth(se = FALSE, method = "lm", formula = y ~ x, color = "lightyellow")+
  scale_color_brewer(palette = "Pastel1")+
  theme_dark()+
    labs(x = "Cranial Facial Raito", y = "Openness of nose", color = "breathing problems", title = "Ratio of muzzle length to skull and likelihood \nof breathing problems")
plot2 

cor(dogs$NARESRATIO, dogs$CFR)
[1] 0.7752364
fit2 <- lm(NARESRATIO ~ CFR, data = dogs)
summary(fit2)

Call:
lm(formula = NARESRATIO ~ CFR, data = dogs)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.62083 -0.11694 -0.01317  0.09509  0.71204 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07720    0.02107   3.664 0.000267 ***
CFR          1.26196    0.03892  32.424  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1851 on 698 degrees of freedom
Multiple R-squared:  0.601, Adjusted R-squared:  0.6004 
F-statistic:  1051 on 1 and 698 DF,  p-value: < 2.2e-16

EQUATION: openness of nose = 1.26196 Cranial Facial Ratio + 0.07720

Something interesting about the graph is that the equation is not super relevant, the color is. The equation is only showing that the muzzle length is related to the Cranial Facial ratio. The earlier equation is actually more helpful when looking for correlation between airway problems and muzzle length.

For the current graph, the adjusted R-squared value is 0.6004 because the dataset is showing a moderately strong correlation between an open nose and the ratio between the skull and muzzle length. The color shows whether or not a dog has BOAS. The P value is 2.2 * 10^-16, which is an incredibly small number. The P value is the chance of getting a result that goes against the hypothesis. The hypothesis for the x and y-axis is that the cranial facial ratio is related to how open their nose is. With the current P value, that means that there is a 2.2 to the 10^-16th chance that we will get a value that goes against the hypothesis.

For the equation that shows correlation between the Cranial Facial ratio and chances of airway problems, the equation is: Chance of Brachycephalic Obstructive Airway Syndrome = -194.57(Average Cranial Facial Ratio) + 95.25

As shown in “Fit1”, the adjusted R-squared is :0.7759, meaning that there is a strong correlation between the ratio of muzzle length to skull and the chance of having breathing problems. The P-value is: 0.001056, meaning that the data is relevant and there is only a 0.11% chance that there will be a datapoint that goes against the hypothesis that breathing issues are related to muzzle length.

I did not use the latter equation because BOAS was a variable that naturally had either a value of 0 or 1, to graph the chance of a dog getting BOAS, I would have to average it out amongst the entire breed which meant that I would either have too few data points, or had to use the BOAS variable for color and two other variables for the x and y-axis.

I cleaned up the data using filter commands for the first dataset but ended up needing more datapoints so I decided to use the set that only filtered out crossbreeds and cut the BOAS variable into three groups to show whether or not the dog had breathing issues. The final graph shows that dogs are most likely to have breathing problems if they have a short muzzle with a narrow nose.

The results are not surprising. If you listen to those flat-faced dogs you can easily tell that they can’t breathe.

I wish that the graph had more colors. I wanted to color it by breed but when I did that, the graph ended up looking incredibly messy.