Dengue fever, also known as bone break fever, is a tropical disease spread by the bite of an infected mosquito, primarily Aedes aegypti. This disease is caused by the dengue virus which belongs to the flavivirus genus and consist of four distinct serotypes. A person can be infected with multiple different serotypes over their lifetime. Immunity acquired from prior infection does not confer immunity to subsequent infections for other dengue serotypes. The disease is endemic in the Caribbean, Central and South America with cases periodically occurring in Florida.
Florida routinely experiences two types of dengue cases. The first are travel-related where a person becomes infected in another country and exhibits symptoms after returning home. The second are locally-acquired cases where a person with no known history of international travel becomes ill from an infected mosquito in Florida. While local transmission of dengue is rare, the spread of dengue in Florida is a major public health concern because the vector,Ae. aegypti, is widespread in major population centers where frequent international travel occurs to dengue endemic countries.
This report wrangles and visualizes data acquired from the Florida Department of Health to illustrate that:
This section loads the necessary libraries for performing the data preparation and analysis. The glimpse() function is leveraged from the skimr library to quickly assess the structure of the dengue dataset in order to perform any required tidying and data transformations.
# load library
library(tidyverse)
library(here) # file pathways
library(skimr) # data exploring
library(janitor) # data cleaning
library(kableExtra) # table formatting
library(plotly) # interactive plots
library(hrbrthemes) # plot themes
# read in dengue data
dengue <- read_csv(here('data', 'dengueCasesFL.csv'))
The dengue dataset is cleaned to remove the spaces within column names and set to lower case. Then, the data frame is re-ordered by date and arranged by most recent by year then month.
# clean dengue dataset
cleandengue <- dengue %>%
# clean col names
clean_names() %>%
# rename serotype and case fields
rename(serotype = serotype_detected,
cases = number_of_cases,
type = case_type) %>%
# reorder columns by year, month, county, then the rest of the column
select(year,
month,
county,
everything()) %>% # get rest of columns
# arrange by year and month
arrange(year,month) %>%
# transform year and month as factor for plots
mutate(year = factor(year),
month = factor(month, levels = month.name))
# glimpse
glimpse(cleandengue)
## Rows: 1,750
## Columns: 6
## $ year <fct> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2…
## $ month <fct> April, April, April, August, August, August, August, December…
## $ county <chr> "Orange", "Sarasota", "Hillsborough", "Orange", "Miami-Dade",…
## $ serotype <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unknown", "DENV-…
## $ cases <dbl> 1, 1, 1, 1, 1, 1, 6, 2, 1, 1, 1, 3, 1, 1, 2, 1, 2, 1, 1, 1, 1…
## $ type <chr> "travel", "travel", "travel", "travel", "travel", "local", "l…
The dataset contains the following columns:
First, the top 10 counties where the number of dengue cases is greatest is identified. Based on the summarized data in the table below, Miami-Dade County reports the greatest number of dengue cases within Florida, with 2347 total cases reported from 2010 to 2024.
# summarize cases by county
county <- cleandengue %>%
# group by county
group_by(county) %>%
# sum cases
summarize(Cases = sum(cases)) %>%
# rename county for table
rename(County = county) %>%
arrange(-Cases) %>%
head(10) %>%
kable(format = "html", booktabs = TRUE) %>%
kable_styling(font_size = 18)
# display county data table
county
County | Cases |
---|---|
Miami-Dade | 2347 |
Broward | 384 |
Hillsborough | 245 |
Palm Beach | 225 |
Monroe | 186 |
Orange | 175 |
Lee | 102 |
Osceola | 65 |
Collier | 50 |
Duval | 48 |
Although a case summary is useful, the frequency at which cases are reported each month is another reliable indicator for the prevalence of dengue. Given that the number of cases are tabulated in the dataset by month for each county, a jittered point plot is useful for visualizing not only which county experiences the largest number of cases, but the monthly frequency at which cases have occurred. Since months where no cases were confirmed are absent from the dataset, a jittered plot is helpful for illustrating how many months experienced dengue for each county. A greater density of dots indicates a higher frequency of reported cases, whereas an absence or lower density represent less frequent cases reported. In addition, the type of cases are facet wrapped to see if the type of cases occur similarly between travel and locally acquired types.
One important factor to consider for interpreting the plot are the disparities in testing and reporting for local public health agencies which likely under-represent the actual extent of dengue incidence occurring in Florida.
cleandengue %>%
# re-factor counties by the greatest number of cases
mutate(county = fct_reorder(county, cases, .fun = max, .desc = FALSE)) %>%
# plot re-factored cases by county and color by year
ggplot(aes(x=county,y=cases,color=year)) +
# jittered points
geom_jitter() +
# flip coords on plot
coord_flip() +
# set theme
theme_minimal() +
# facet wrap by type
facet_wrap(~type) +
# set main title
labs(title='Monthly Dengue Cases') +
# adjust plot elements
theme(
# center title
plot.title = element_text(hjust = 0.5)) +
# set y label
ylab('Cases') +
# set x label
xlab('County')
Figure 2. Travel versus local dengue cases in Florida from 2010-2024. Miami-Dade County by and far has reported the greatest number of both types of dengue cases
## Saving 7 x 7 in image
Next, the months which had the greatest number of dengue cases reported are identified. Florida is known as a tourism destination both domestically and internationally. Given that Aedes aegypti is most active in the summer months, understanding seasonal trends of travel cases is vital toward preventing dengue’s local transmission and spread. The second visualization is a stacked bar plot time series which displays the number of monthly imported dengue cases versus the number of locally acquired cases from 2010 to 2024.
cleandengue %>%
# group by month and type
group_by(month, type) %>%
# summarize cases
summarise(sumCases = sum(cases)) %>%
# create stacked bar plot for monthly cases by type
ggplot(aes(x = month,
y = sumCases,
fill = type)) +
# bar plot
geom_bar(stat = "identity") +
# set title
labs(title='Monthly Dengue Cases') +
theme_minimal() +
# center plot title
theme(plot.title = element_text(hjust = 0.5)) +
# set x label
xlab('Month') +
# set y label
ylab('Total Cases') +
# set theme settings text
theme(plot.title = element_text(hjust = 0.5),
# x-axis text
axis.text.x = element_text(size = 11, # text size
angle = 30, # angle
vjust = 0.8, # vertical justification
hjust = 0.8)) + # horizontal justification
# set bar fill colors based on case type
scale_fill_manual(values = c("local" = "#EF6F6C", "travel" = "#DDAE7E"), name = "Case Type")
Figure 3. Total monthly dengue cases by type. Travel-related cases are the largest source of dengue cases in Florida, while a small subset are a result of local transmission
After summarizing cases by month, the majority of cases have occurred in July, August, September, and October. However, the yearly abundance is another relevant factor for understanding the disease’s prevalence in recent years. The next block summarizes cases by year and type to illustrate the number of cases each year in an interactive plot.
# plot yearly time series
ts_dengue <- cleandengue %>%
# group by year and type
group_by(year, type) %>%
# sum cases
summarise(cases = sum(cases)) %>%
# plot; set fill based on type
ggplot(aes(x=year,y=cases, fill=type))+
# bar plot
geom_bar(stat="identity") +
# set main title
labs(title='Yearly Dengue Cases') +
# set y axis label
ylab("Number of Cases") +
# set x axis label
xlab("Year") +
# set theme
theme_ipsum() +
# set bar colors based on type
scale_fill_manual(values = c("local" = "#69b3a2", "travel" = "#DDAE7E"), name = "Case Type")
# convert plot to interactive plot with
ts_dengue <- ggplotly(ts_dengue)
# display plot
ts_dengue
Figure 4. Total reported dengue cases each year. The number of cases has greatly increased since 2022, following a record-low in 2021, likely due to increased travel restrictions