Project Objectives

Project Introduction

Rabies- The closest thing we’d get to zombies, probably. The viral disease transmitted through the bite of an infected animal has long been a global concern. The importance of studying rabies extends beyond the confines of its direct health implications; there is a delicate interplay of societal, behavioral and environmental factors that contribute to its prevalence.

Thanks to a long string of media depicting rabies as the gateway virus to zombies, I wound up worrying about rabies a lot. Now that I’m wrapping up my time at Hollins in Virginia and probably moving back to Texas while I search for a job, I wanted to tackle a project revolving around rabies and the odds of me dealing with it. The best way to do this is was utilizing the data provided by the Texas Department of State Health Services on Rabies cases. Compiled by year and month, I focused on the three first months of 2022 and 2023. In the future, I hope to expand on the full twelve months and complete a four year analysis, along with including temperature to see the relationship between weather and rabies cases.

This project becomes more than a statistical exploration; it’s a personal endeavor to understand and communicate the real risks and societal implications of rabies, separating fact from fiction.

Join me on this journey as we unravel the many layers of rabies data from the first three months of 2022 and 2023 in Texas. Beyond the numbers and statistical trends, we’ll explore the narratives that shape our perceptions of this viral threat and contribute to informed discussions on public health strategies.

Dataset Overview

rabies <- read.csv("/Users/linne/Desktop/coding projects/data files/rabies/RabiesData.csv")
str(rabies)
## 'data.frame':    4324 obs. of  6 variables:
##  $ Year              : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
##  $ Month             : chr  "January" "January" "January" "January" ...
##  $ PublicHealthRegion: int  5 5 5 11 11 11 6 7 7 7 ...
##  $ County            : chr  "Angelina" "Angelina" "Angelina" "Aransas" ...
##  $ Animal            : chr  "Dog" "Dog" "Dog" "Dog" ...
##  $ TestResult        : chr  "Negative" "Negative" "Negative" "Negative" ...
summary(rabies)
##       Year         Month           PublicHealthRegion    County         
##  Min.   :2022   Length:4324        Min.   : 1.000     Length:4324       
##  1st Qu.:2022   Class :character   1st Qu.: 3.000     Class :character  
##  Median :2023   Mode  :character   Median : 6.000     Mode  :character  
##  Mean   :2023                      Mean   : 5.568                       
##  3rd Qu.:2023                      3rd Qu.: 7.000                       
##  Max.   :2023                      Max.   :11.000                       
##     Animal           TestResult       
##  Length:4324        Length:4324       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Explorartory Data Analysis

2022

By months

summary2022 <- summary(subset(rabies, Year == 2022))
summary2022
##       Year         Month           PublicHealthRegion    County         
##  Min.   :2022   Length:2053        Min.   : 1.000     Length:2053       
##  1st Qu.:2022   Class :character   1st Qu.: 3.000     Class :character  
##  Median :2022   Mode  :character   Median : 6.000     Mode  :character  
##  Mean   :2022                      Mean   : 5.598                       
##  3rd Qu.:2022                      3rd Qu.: 7.000                       
##  Max.   :2022                      Max.   :11.000                       
##     Animal           TestResult       
##  Length:2053        Length:2053       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
rabies22 <- subset(rabies, Year == 2022)
rabies22pos <- subset(rabies, Year == 2022 & TestResult == "Positive")

# Let's create a bar plot for only the year 2022
ggplot(rabies22pos, aes(x = Month)) +
  geom_bar(fill = "red", stat = "count") +
  labs(title = "Distribution of Rabies Positive Cases in 2022 by Month",
       x = "Month",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

rabies22neg <- subset(rabies, Year == 2022 & TestResult == "Negative")

# Let's create a bar plot for only the year 2022
ggplot(rabies22neg, aes(x = Month)) +
  geom_bar(fill = "green", stat = "count") +
  labs(title = "Distribution of Rabies Negative Cases in 2022 by Month",
       x = "Month",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

Based on these graphs, the month of March had the most negative and positive cases. The February had the least positive cases and January had the least negative cases of 2022. Let’s analyze by county.

County

Let’s start with the negative results.

# Filter the dataset for 2022 and positive test results.
rabies22_positive <- subset(rabies, Year == 2022 & TestResult == "Positive")

# Now we count the number of positive cases for each county
county_rabies_cases <- table(rabies22_positive$County)

# Find the county with the most positive cases
highest_county <- names(which.max(county_rabies_cases))

# Find the county with the least positive cases
lowest_county <- names(which.min(county_rabies_cases))

# Display the result
cat("County with the most positive cases in 2022:", highest_county, "\n")
## County with the most positive cases in 2022: McLennan
cat("County with the least positive cases in 2022:", lowest_county, "\n")
## County with the least positive cases in 2022: Atascosa

Animals

ggplot(rabies22, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Rabies Cases by Animal Type in 2022",
       x = "Animal Type",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

Animals by Months

ggplot(rabies22, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Rabies Cases by Animal Type by Months of 2022",
       x = "Animal Type",
       y = "Count") +
  facet_wrap(~ Month) +  # Facet by month
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

Keep in mind, these are just tested animals. Let’s do positive cases.

# Create the bar plot for the distribution of positive rabies cases by animal type
ggplot(rabies22_positive, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Positive Rabies Cases by Animal Type by Months of 2022",
       x = "Animal Type",
       y = "Count") +
  facet_wrap(~ Month) +  # Facet by month
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

The highest Animal group tested were Skunks from March of 2022. This is a revolving trend throughout our dataset.

Public Health Region

Let’s focus on Public Health Regions instead.

ggplot(rabies22_positive, aes(x = factor(PublicHealthRegion))) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Positive Rabies Cases by Public Health Region",
       x = "Public Health Region",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

phr_positive_cases <- table(rabies22_positive$PublicHealthRegion)

# Find the PublicHealthRegion with the most positive cases
most_positive_phr <- names(which.max(phr_positive_cases))

most_positive_phr
## [1] "7"
# Find the PublicHealthRegion with the least positive cases
least_positive_phr <- names(which.min(phr_positive_cases))

least_positive_phr
## [1] "1"

Who tested the most in general?

phr_cases <- table(rabies$PublicHealthRegion)

# Find the PublicHealthRegion with the most cases
most_cases_phr <- names(which.max(phr_cases))

most_cases_phr
## [1] "3"

Who tested the least?

# Find the PublicHealthRegion with the least cases
least_cases_phr <- names(which.min(phr_cases))

least_cases_phr
## [1] "5"

2023

summary2023 <- summary(subset(rabies, Year == 2023))
summary2023
##       Year         Month           PublicHealthRegion    County         
##  Min.   :2023   Length:2271        Min.   : 1.000     Length:2271       
##  1st Qu.:2023   Class :character   1st Qu.: 3.000     Class :character  
##  Median :2023   Mode  :character   Median : 6.000     Mode  :character  
##  Mean   :2023                      Mean   : 5.542                       
##  3rd Qu.:2023                      3rd Qu.: 7.000                       
##  Max.   :2023                      Max.   :11.000                       
##     Animal           TestResult       
##  Length:2271        Length:2271       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

By months

rabies23pos <- subset(rabies, Year == 2023 & TestResult == "Positive")

# Let's create a bar plot for only the year 2023
ggplot(rabies23pos, aes(x = Month)) +
  geom_bar(fill = "red", stat = "count") +
  labs(title = "Distribution of Rabies Positive Cases in 2023 by Month",
       x = "Month",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

rabies23neg <- subset(rabies, Year == 2023 & TestResult == "Negative")

# Let's create a bar plot for only the year 2023
ggplot(rabies23neg, aes(x = Month)) +
  geom_bar(fill = "green", stat = "count") +
  labs(title = "Distribution of Rabies Negative Cases in 2023 by Month",
       x = "Month",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

Based on these graphs, the month of March had the most negative and positive cases. The February had the least positive and the least negative cases of 2023. Let’s analyze by county once more.

County

Let’s start with the negative results.

county_rabies_cases23 <- table(rabies23pos$County)

# Find the county with the most positive cases
highest_county23 <- names(which.max(county_rabies_cases23))

# Find the county with the least positive cases
lowest_county23 <- names(which.min(county_rabies_cases23))

# Display the result
cat("County with the most positive cases in 2023:", highest_county23, "\n")
## County with the most positive cases in 2023: Hays
cat("County with the least positive cases in 2023:", lowest_county23, "\n")
## County with the least positive cases in 2023: Atascosa

Animals

rabies23 <- subset(rabies, Year == 2023)
ggplot(rabies23, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Rabies Cases by Animal Type in 2023",
       x = "Animal Type",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels 

Animals by Months

ggplot(rabies23, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Rabies Cases by Animal Type by Months of 2023",
       x = "Animal Type",
       y = "Count") +
  facet_wrap(~ Month) +  # Facet by month
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

Let’s do positive results.

# Create the bar plot for the distribution of positive rabies cases by animal type
ggplot(rabies23pos, aes(x = Animal)) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Positive Rabies Cases by Animal Type by Months of 2023",
       x = "Animal Type",
       y = "Count") +
  facet_wrap(~ Month) +  # Facet by month
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

The highest Animal group tested that tested positive were Skunks from March of 2023.

Public Health Region

Let’s focus on Public Health Regions again.

ggplot(rabies23pos, aes(x = factor(PublicHealthRegion))) +
  geom_bar(fill = "skyblue", stat = "count") +
  labs(title = "Distribution of Positive Rabies Cases by Public Health Region",
       x = "Public Health Region",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

phr_positive_cases23 <- table(rabies23pos$PublicHealthRegion)

# Find the PublicHealthRegion with the most positive cases
most_positive_phr23 <- names(which.max(phr_positive_cases23))

most_positive_phr23
## [1] "7"
# Find the PublicHealthRegion with the most positive cases
least_positive_phr23 <- names(which.min(phr_positive_cases23))

least_positive_phr23
## [1] "9"
rabies_2023 <- subset(rabies, Year == 2023)

# Let's count the number of cases for each PublicHealthRegion in 2023
phr_cases_2023 <- table(rabies_2023$PublicHealthRegion)

print(phr_cases_2023)
## 
##   1   2   3   4   5   6   7   8   9  10  11 
## 118 103 532  82  35 495 479 237  45  56  89

Who tested the least?

# Find the PublicHealthRegion with the least cases
least_cases_phr23 <- names(which.min(phr_cases_2023))

least_cases_phr23
## [1] "5"

Who tested the most?

# Find the PublicHealthRegion with the least cases
most_cases_phr23 <- names(which.max(phr_cases_2023))

most_cases_phr23
## [1] "3"

2022 VS 2023

analysis_positive_negative <- function(data, year) {
  analysis <- data %>%
    group_by(Year, TestResult) %>%
    summarize(Count = n())
  analysis$Year <- as.factor(year)
  return(analysis)
}

# Make a bar graph for each year
positive_negative_plot_2022 <- ggplot(analysis_positive_negative(rabies22, 2022), aes(x = TestResult, y = Count, fill = TestResult)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Distribution of Positive and Negative Rabies Cases in 2022",
       x = "Test Result",
       y = "Count") +
  scale_fill_manual(values = c("Positive" = "red", "Negative" = "green")) +  # Set colors for positive and negative cases
  theme_minimal()

positive_negative_plot_2023 <- ggplot(analysis_positive_negative(rabies23, 2023), aes(x = TestResult, y = Count, fill = TestResult)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Distribution of Positive and Negative Rabies Cases in 2023",
       x = "Test Result",
       y = "Count") +
  scale_fill_manual(values = c("Positive" = "red", "Negative" = "green")) +  # Set colors for positive and negative cases
  theme_minimal()

# Display the plots
positive_negative_plot_2022

positive_negative_plot_2023

count_cases <- function(data) {
  cases <- nrow(data)
  return(cases)
}

Total number of cases:

cases_2022 <- count_cases(rabies22)
cases_2023 <- count_cases(rabies23)

# Print results
print("Year 2022:")
## [1] "Year 2022:"
print(cases_2022)
## [1] 2053
print("Year 2023:")
## [1] "Year 2023:"
print(cases_2023)
## [1] 2271

Geospatial Analysis

locations <- read.csv("/Users/linne/Desktop/coding projects/data files/rabies/CountyLocations.csv")

Let’s merge our data utilizing the common key: county.

merged_data <- inner_join(rabies, locations, by = "County")

Checking for any missing values.

anyNA
## function (x, recursive = FALSE)  .Primitive("anyNA")
colSums(is.na(merged_data))
##               Year              Month PublicHealthRegion             County 
##                  0                  0                  0                  0 
##             Animal         TestResult           Latitude          Longitude 
##                  0                  0                  0                  0

Seems like we’re good to go!

Let’s do a map of just Texas with highlighted areas of Rabies cases.

positive_cases <- subset(merged_data, TestResult == "Positive")

#Create a Map
rabiesmap <- leaflet(data = positive_cases) %>%
  addTiles() %>%
  addCircleMarkers(
    radius = 7,  
    color = "red",
    fillOpacity = 0.5,
    popup = ~paste("County: ", County, "<br>",
                   "Rabies Cases: ", rabies)
  )

# Display the map
rabiesmap

Time Series Analysis

Let’s move onto a Time Series Analysis! Our goal here is to identify any patterns.

rabies_positive <- filter(rabies, TestResult == "Positive")

# We need to convert Month to a factor with ordered levels to ensure correct ordering in the time series plot, since our excel has data all over
rabies_positive$Month <- factor(rabies_positive$Month, levels = c("January", "February", "March"), ordered = TRUE)

# Let's create a time series plot for positive rabies cases
time_series_plot <- ggplot(rabies_positive, aes(x = Month, group = Year, color = as.factor(Year))) +
  geom_line(stat = "count") +
  labs(title = "Temporal Trends in Positive Rabies Cases",
       x = "Month",
       y = "Count",
       color = "Year") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

# Show plot
time_series_plot

Seemingly, both years dipped in positive rabies cases in February and increased by March. 2023 had a larger yield of positive cases.

Wrapping Up

In this project, we dived into the analysis of rabies cases spanning the first three months of 2022 and 2023. We started by exploring the distribution of cases across various dimensions such as time, county, and animal type. Through exploratory data analysis, we gained insights into the trends and patterns of rabies occurrences. Geospatial analysis provided us with a visual representation of the spatial distribution of cases, highlighting areas of concern and potential factors contributing to variations. In the future, we could potentially move to create a predictive model and use its output to compare with 2024’s actual cases.

Also, this project definitely helped ease my anxiety over rabies.