Explore the Distribution of Rabies Cases: Conduct exploratory data analysis (EDA) to understand the distribution of rabies cases over the first three months of 2022 and 2023.
Identify Temporal Trends Through Time Series Analysis: Utilize time series line plots to identify temporal trends in rabies cases.
Visualize Rabies Over a Map Plot positive rabies cases over a map using location data.
Rabies- The closest thing we’d get to zombies, probably. The viral disease transmitted through the bite of an infected animal has long been a global concern. The importance of studying rabies extends beyond the confines of its direct health implications; there is a delicate interplay of societal, behavioral and environmental factors that contribute to its prevalence.
Thanks to a long string of media depicting rabies as the gateway virus to zombies, I wound up worrying about rabies a lot. Now that I’m wrapping up my time at Hollins in Virginia and probably moving back to Texas while I search for a job, I wanted to tackle a project revolving around rabies and the odds of me dealing with it. The best way to do this is was utilizing the data provided by the Texas Department of State Health Services on Rabies cases. Compiled by year and month, I focused on the three first months of 2022 and 2023. In the future, I hope to expand on the full twelve months and complete a four year analysis, along with including temperature to see the relationship between weather and rabies cases.
This project becomes more than a statistical exploration; it’s a personal endeavor to understand and communicate the real risks and societal implications of rabies, separating fact from fiction.
Join me on this journey as we unravel the many layers of rabies data from the first three months of 2022 and 2023 in Texas. Beyond the numbers and statistical trends, we’ll explore the narratives that shape our perceptions of this viral threat and contribute to informed discussions on public health strategies.
rabies <- read.csv("/Users/linne/Desktop/coding projects/data files/rabies/RabiesData.csv")
str(rabies)
## 'data.frame': 4324 obs. of 6 variables:
## $ Year : int 2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
## $ Month : chr "January" "January" "January" "January" ...
## $ PublicHealthRegion: int 5 5 5 11 11 11 6 7 7 7 ...
## $ County : chr "Angelina" "Angelina" "Angelina" "Aransas" ...
## $ Animal : chr "Dog" "Dog" "Dog" "Dog" ...
## $ TestResult : chr "Negative" "Negative" "Negative" "Negative" ...
summary(rabies)
## Year Month PublicHealthRegion County
## Min. :2022 Length:4324 Min. : 1.000 Length:4324
## 1st Qu.:2022 Class :character 1st Qu.: 3.000 Class :character
## Median :2023 Mode :character Median : 6.000 Mode :character
## Mean :2023 Mean : 5.568
## 3rd Qu.:2023 3rd Qu.: 7.000
## Max. :2023 Max. :11.000
## Animal TestResult
## Length:4324 Length:4324
## Class :character Class :character
## Mode :character Mode :character
##
##
##
2022
By months
summary2022 <- summary(subset(rabies, Year == 2022))
summary2022
## Year Month PublicHealthRegion County
## Min. :2022 Length:2053 Min. : 1.000 Length:2053
## 1st Qu.:2022 Class :character 1st Qu.: 3.000 Class :character
## Median :2022 Mode :character Median : 6.000 Mode :character
## Mean :2022 Mean : 5.598
## 3rd Qu.:2022 3rd Qu.: 7.000
## Max. :2022 Max. :11.000
## Animal TestResult
## Length:2053 Length:2053
## Class :character Class :character
## Mode :character Mode :character
##
##
##
rabies22 <- subset(rabies, Year == 2022)
rabies22pos <- subset(rabies, Year == 2022 & TestResult == "Positive")
# Let's create a bar plot for only the year 2022
ggplot(rabies22pos, aes(x = Month)) +
geom_bar(fill = "red", stat = "count") +
labs(title = "Distribution of Rabies Positive Cases in 2022 by Month",
x = "Month",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
rabies22neg <- subset(rabies, Year == 2022 & TestResult == "Negative")
# Let's create a bar plot for only the year 2022
ggplot(rabies22neg, aes(x = Month)) +
geom_bar(fill = "green", stat = "count") +
labs(title = "Distribution of Rabies Negative Cases in 2022 by Month",
x = "Month",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
Based on these graphs, the month of March had the most negative and positive cases. The February had the least positive cases and January had the least negative cases of 2022. Let’s analyze by county.
County
Let’s start with the negative results.
# Filter the dataset for 2022 and positive test results.
rabies22_positive <- subset(rabies, Year == 2022 & TestResult == "Positive")
# Now we count the number of positive cases for each county
county_rabies_cases <- table(rabies22_positive$County)
# Find the county with the most positive cases
highest_county <- names(which.max(county_rabies_cases))
# Find the county with the least positive cases
lowest_county <- names(which.min(county_rabies_cases))
# Display the result
cat("County with the most positive cases in 2022:", highest_county, "\n")
## County with the most positive cases in 2022: McLennan
cat("County with the least positive cases in 2022:", lowest_county, "\n")
## County with the least positive cases in 2022: Atascosa
Animals
ggplot(rabies22, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Rabies Cases by Animal Type in 2022",
x = "Animal Type",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
Animals by Months
ggplot(rabies22, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Rabies Cases by Animal Type by Months of 2022",
x = "Animal Type",
y = "Count") +
facet_wrap(~ Month) + # Facet by month
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
Keep in mind, these are just tested animals. Let’s do positive cases.
# Create the bar plot for the distribution of positive rabies cases by animal type
ggplot(rabies22_positive, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Positive Rabies Cases by Animal Type by Months of 2022",
x = "Animal Type",
y = "Count") +
facet_wrap(~ Month) + # Facet by month
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
The highest Animal group tested were Skunks from March of 2022. This is a revolving trend throughout our dataset.
Public Health Region
Let’s focus on Public Health Regions instead.
ggplot(rabies22_positive, aes(x = factor(PublicHealthRegion))) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Positive Rabies Cases by Public Health Region",
x = "Public Health Region",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
phr_positive_cases <- table(rabies22_positive$PublicHealthRegion)
# Find the PublicHealthRegion with the most positive cases
most_positive_phr <- names(which.max(phr_positive_cases))
most_positive_phr
## [1] "7"
# Find the PublicHealthRegion with the least positive cases
least_positive_phr <- names(which.min(phr_positive_cases))
least_positive_phr
## [1] "1"
Who tested the most in general?
phr_cases <- table(rabies$PublicHealthRegion)
# Find the PublicHealthRegion with the most cases
most_cases_phr <- names(which.max(phr_cases))
most_cases_phr
## [1] "3"
Who tested the least?
# Find the PublicHealthRegion with the least cases
least_cases_phr <- names(which.min(phr_cases))
least_cases_phr
## [1] "5"
2023
summary2023 <- summary(subset(rabies, Year == 2023))
summary2023
## Year Month PublicHealthRegion County
## Min. :2023 Length:2271 Min. : 1.000 Length:2271
## 1st Qu.:2023 Class :character 1st Qu.: 3.000 Class :character
## Median :2023 Mode :character Median : 6.000 Mode :character
## Mean :2023 Mean : 5.542
## 3rd Qu.:2023 3rd Qu.: 7.000
## Max. :2023 Max. :11.000
## Animal TestResult
## Length:2271 Length:2271
## Class :character Class :character
## Mode :character Mode :character
##
##
##
By months
rabies23pos <- subset(rabies, Year == 2023 & TestResult == "Positive")
# Let's create a bar plot for only the year 2023
ggplot(rabies23pos, aes(x = Month)) +
geom_bar(fill = "red", stat = "count") +
labs(title = "Distribution of Rabies Positive Cases in 2023 by Month",
x = "Month",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
rabies23neg <- subset(rabies, Year == 2023 & TestResult == "Negative")
# Let's create a bar plot for only the year 2023
ggplot(rabies23neg, aes(x = Month)) +
geom_bar(fill = "green", stat = "count") +
labs(title = "Distribution of Rabies Negative Cases in 2023 by Month",
x = "Month",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
Based on these graphs, the month of March had the most negative and positive cases. The February had the least positive and the least negative cases of 2023. Let’s analyze by county once more.
County
Let’s start with the negative results.
county_rabies_cases23 <- table(rabies23pos$County)
# Find the county with the most positive cases
highest_county23 <- names(which.max(county_rabies_cases23))
# Find the county with the least positive cases
lowest_county23 <- names(which.min(county_rabies_cases23))
# Display the result
cat("County with the most positive cases in 2023:", highest_county23, "\n")
## County with the most positive cases in 2023: Hays
cat("County with the least positive cases in 2023:", lowest_county23, "\n")
## County with the least positive cases in 2023: Atascosa
Animals
rabies23 <- subset(rabies, Year == 2023)
ggplot(rabies23, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Rabies Cases by Animal Type in 2023",
x = "Animal Type",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
Animals by Months
ggplot(rabies23, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Rabies Cases by Animal Type by Months of 2023",
x = "Animal Type",
y = "Count") +
facet_wrap(~ Month) + # Facet by month
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
Let’s do positive results.
# Create the bar plot for the distribution of positive rabies cases by animal type
ggplot(rabies23pos, aes(x = Animal)) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Positive Rabies Cases by Animal Type by Months of 2023",
x = "Animal Type",
y = "Count") +
facet_wrap(~ Month) + # Facet by month
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
The highest Animal group tested that tested positive were Skunks from
March of 2023.
Public Health Region
Let’s focus on Public Health Regions again.
ggplot(rabies23pos, aes(x = factor(PublicHealthRegion))) +
geom_bar(fill = "skyblue", stat = "count") +
labs(title = "Distribution of Positive Rabies Cases by Public Health Region",
x = "Public Health Region",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
phr_positive_cases23 <- table(rabies23pos$PublicHealthRegion)
# Find the PublicHealthRegion with the most positive cases
most_positive_phr23 <- names(which.max(phr_positive_cases23))
most_positive_phr23
## [1] "7"
# Find the PublicHealthRegion with the most positive cases
least_positive_phr23 <- names(which.min(phr_positive_cases23))
least_positive_phr23
## [1] "9"
rabies_2023 <- subset(rabies, Year == 2023)
# Let's count the number of cases for each PublicHealthRegion in 2023
phr_cases_2023 <- table(rabies_2023$PublicHealthRegion)
print(phr_cases_2023)
##
## 1 2 3 4 5 6 7 8 9 10 11
## 118 103 532 82 35 495 479 237 45 56 89
Who tested the least?
# Find the PublicHealthRegion with the least cases
least_cases_phr23 <- names(which.min(phr_cases_2023))
least_cases_phr23
## [1] "5"
Who tested the most?
# Find the PublicHealthRegion with the least cases
most_cases_phr23 <- names(which.max(phr_cases_2023))
most_cases_phr23
## [1] "3"
2022 VS 2023
analysis_positive_negative <- function(data, year) {
analysis <- data %>%
group_by(Year, TestResult) %>%
summarize(Count = n())
analysis$Year <- as.factor(year)
return(analysis)
}
# Make a bar graph for each year
positive_negative_plot_2022 <- ggplot(analysis_positive_negative(rabies22, 2022), aes(x = TestResult, y = Count, fill = TestResult)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Distribution of Positive and Negative Rabies Cases in 2022",
x = "Test Result",
y = "Count") +
scale_fill_manual(values = c("Positive" = "red", "Negative" = "green")) + # Set colors for positive and negative cases
theme_minimal()
positive_negative_plot_2023 <- ggplot(analysis_positive_negative(rabies23, 2023), aes(x = TestResult, y = Count, fill = TestResult)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Distribution of Positive and Negative Rabies Cases in 2023",
x = "Test Result",
y = "Count") +
scale_fill_manual(values = c("Positive" = "red", "Negative" = "green")) + # Set colors for positive and negative cases
theme_minimal()
# Display the plots
positive_negative_plot_2022
positive_negative_plot_2023
count_cases <- function(data) {
cases <- nrow(data)
return(cases)
}
Total number of cases:
cases_2022 <- count_cases(rabies22)
cases_2023 <- count_cases(rabies23)
# Print results
print("Year 2022:")
## [1] "Year 2022:"
print(cases_2022)
## [1] 2053
print("Year 2023:")
## [1] "Year 2023:"
print(cases_2023)
## [1] 2271
locations <- read.csv("/Users/linne/Desktop/coding projects/data files/rabies/CountyLocations.csv")
Let’s merge our data utilizing the common key: county.
merged_data <- inner_join(rabies, locations, by = "County")
Checking for any missing values.
anyNA
## function (x, recursive = FALSE) .Primitive("anyNA")
colSums(is.na(merged_data))
## Year Month PublicHealthRegion County
## 0 0 0 0
## Animal TestResult Latitude Longitude
## 0 0 0 0
Seems like we’re good to go!
Let’s do a map of just Texas with highlighted areas of Rabies cases.
positive_cases <- subset(merged_data, TestResult == "Positive")
#Create a Map
rabiesmap <- leaflet(data = positive_cases) %>%
addTiles() %>%
addCircleMarkers(
radius = 7,
color = "red",
fillOpacity = 0.5,
popup = ~paste("County: ", County, "<br>",
"Rabies Cases: ", rabies)
)
# Display the map
rabiesmap
Let’s move onto a Time Series Analysis! Our goal here is to identify any patterns.
rabies_positive <- filter(rabies, TestResult == "Positive")
# We need to convert Month to a factor with ordered levels to ensure correct ordering in the time series plot, since our excel has data all over
rabies_positive$Month <- factor(rabies_positive$Month, levels = c("January", "February", "March"), ordered = TRUE)
# Let's create a time series plot for positive rabies cases
time_series_plot <- ggplot(rabies_positive, aes(x = Month, group = Year, color = as.factor(Year))) +
geom_line(stat = "count") +
labs(title = "Temporal Trends in Positive Rabies Cases",
x = "Month",
y = "Count",
color = "Year") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
# Show plot
time_series_plot
Seemingly, both years dipped in positive rabies cases in February and increased by March. 2023 had a larger yield of positive cases.
In this project, we dived into the analysis of rabies cases spanning the first three months of 2022 and 2023. We started by exploring the distribution of cases across various dimensions such as time, county, and animal type. Through exploratory data analysis, we gained insights into the trends and patterns of rabies occurrences. Geospatial analysis provided us with a visual representation of the spatial distribution of cases, highlighting areas of concern and potential factors contributing to variations. In the future, we could potentially move to create a predictive model and use its output to compare with 2024’s actual cases.
Also, this project definitely helped ease my anxiety over rabies.