Dengue fever is a mosquito-borne illness typically found in tropical and subtropical regions. The dengue fever data set contains humidity, temperature, and tree cover data for 2000 administrative regions, as well as whether or not that region had dengue fever cases between the years 1961 and 1990. For this project, I will explore the geographic prevalence of dengue fever.

As a note, the maps would not publish to html. To view them, run the code in RStudio and uncomment both of the mapview functions.

#Imports
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(mapview)
#Data
data <- read.table("C:/Users/Carlisle Ferguson/Downloads/dengue.csv", header=TRUE, sep=',')
summary(data)
##        X              humid            humid90            temp       
##  Min.   :   1.0   Min.   : 0.6714   Min.   : 1.066   Min.   :-18.68  
##  1st Qu.: 500.8   1st Qu.:10.0088   1st Qu.:10.307   1st Qu.: 11.10  
##  Median :1000.5   Median :16.1433   Median :16.870   Median : 20.99  
##  Mean   :1000.5   Mean   :16.7013   Mean   :17.244   Mean   : 18.41  
##  3rd Qu.:1500.2   3rd Qu.:23.6184   3rd Qu.:24.131   3rd Qu.: 25.47  
##  Max.   :2000.0   Max.   :30.2665   Max.   :30.539   Max.   : 29.45  
##                   NA's   :2         NA's   :2        NA's   :2       
##      temp90           h10pix          h10pix90          trees     
##  Min.   :-10.07   Min.   : 4.317   Min.   : 5.848   Min.   : 0.0  
##  1st Qu.: 12.76   1st Qu.:14.584   1st Qu.:14.918   1st Qu.: 1.0  
##  Median : 22.03   Median :23.115   Median :24.130   Median :15.0  
##  Mean   : 19.41   Mean   :21.199   Mean   :21.557   Mean   :22.7  
##  3rd Qu.: 25.98   3rd Qu.:28.509   3rd Qu.:28.627   3rd Qu.:37.0  
##  Max.   : 29.66   Max.   :31.134   Max.   :31.134   Max.   :85.0  
##  NA's   :2                                          NA's   :12    
##     trees90          NoYes             Xmin              Xmax        
##  Min.   : 0.00   Min.   :0.0000   Min.   :-179.50   Min.   :-172.00  
##  1st Qu.: 6.00   1st Qu.:0.0000   1st Qu.: -12.00   1st Qu.: -10.00  
##  Median :30.60   Median :0.0000   Median :  16.00   Median :  17.75  
##  Mean   :35.21   Mean   :0.4155   Mean   :  13.31   Mean   :  15.63  
##  3rd Qu.:63.62   3rd Qu.:1.0000   3rd Qu.:  42.62   3rd Qu.:  44.50  
##  Max.   :97.10   Max.   :1.0000   Max.   : 178.00   Max.   : 180.00  
##  NA's   :12                                                          
##       Ymin             Ymax       
##  Min.   :-54.50   Min.   :-55.50  
##  1st Qu.:  6.00   1st Qu.:  5.00  
##  Median : 18.00   Median : 17.00  
##  Mean   : 19.78   Mean   : 18.16  
##  3rd Qu.: 39.00   3rd Qu.: 37.00  
##  Max.   : 82.50   Max.   : 68.50  
## 
#Plots
ggplot(data,aes(x=temp, y=humid, color=NoYes)) + geom_point() + ggtitle("Temperature vs Humidity")
## Warning: Removed 2 rows containing missing values (geom_point).

pltdata <- subset(data, select=c(temp,humid, NoYes))
pltdata$NoYes <- as.factor(pltdata$NoYes)
ggplot(pltdata,aes(x=NoYes, y=humid, fill=NoYes)) + geom_boxplot() + ggtitle("Humidity Box Plot")
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

ggplot(pltdata,aes(x=NoYes, y=temp, fill=NoYes)) + geom_boxplot() + ggtitle("Temperature Box Plot")
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

histdata <- subset(data, select=c(Ymax, Ymin, NoYes))
histdata$NoYes <- as.character(histdata$NoYes)
ggplot(histdata, aes(x=Ymax, fill=NoYes, color=NoYes)) + geom_histogram() + ggtitle("Maximum Latitude Histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(histdata, aes(x=Ymin, fill=NoYes, color=NoYes)) + geom_histogram() + ggtitle("Minimum Latitude Histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The scatter plot shows that the majority of the “Yes” cases took place in a hot, humid environment. The box plots also demonstrate this, as the average temperature and humidity values are higher in “Yes” regions, and the histograms show that the minimum and maximum latitudes are around the equator. However, where are these “Yes” regions? This can be difficult to visualize without a map. While the data set has longitude and latitude data, it lacks the names of cities and regions. It also only provides the minimum and maximum longitude and latitude. To visualize the data set, I created new columns for the average longitude and latitude for each row. Then, I used the mapview library to create a map with all the administrative regions in the data set.

#Mapping
df <- data.frame(data)
long_lat <- subset(data, select = c(Xmin, Xmax, Ymin, Ymax, NoYes))
long_lat$mean_long <- rowMeans(long_lat[,c('Xmin','Xmax')], na.rm=TRUE)
long_lat$mean_lat <- rowMeans(long_lat[,c('Ymin','Ymax')], na.rm=TRUE)

locations <- st_as_sf(long_lat, coords = c("mean_long", "mean_lat"), crs = 4326)
#mapview(locations, zcol="NoYes")

To examine the prevalence of dengue fever closer, I filtered the data set and remapped only the regions that had dengue fever cases.

#Mapping - Dengue Only
dengue_yes <- subset(long_lat, NoYes == 1, select = c(Xmin, Xmax, Ymin, Ymax, mean_long, mean_lat, NoYes))

locs_yes <- st_as_sf(dengue_yes, coords = c("mean_long", "mean_lat"), crs = 4326)
#mapview(locs_yes)

Some final overall stats for the “Yes” data:

#Final Stats
yes <-subset(data, NoYes == 1, select = c(temp, humid, Ymin, Ymax))
summary(yes)
##       temp             humid              Ymin              Ymax        
##  Min.   : 0.5083   Min.   : 0.8962   Min.   :-27.000   Min.   :-29.000  
##  1st Qu.:23.0083   1st Qu.:20.2602   1st Qu.:  2.000   1st Qu.: -0.250  
##  Median :25.4083   Median :24.0117   Median : 10.000   Median :  8.500  
##  Mean   :24.3447   Mean   :23.1505   Mean   :  7.539   Mean   :  6.178  
##  3rd Qu.:26.5917   3rd Qu.:26.7706   3rd Qu.: 15.000   3rd Qu.: 14.500  
##  Max.   :29.3583   Max.   :29.7924   Max.   : 36.500   Max.   : 34.500  
##  NA's   :2         NA's   :2

The plots and maps show that dengue fever is most commonly found in hot, humid environment between -27 and 36.5 degrees latitude. The dengue fever belt extends around the globe, including North America, South America, Africa, Asia, and Australia/Oceania.

#Bonus

library(readr)
urlfile = "https://raw.githubusercontent.com/carlisleferguson/RBridgeFinalProject/main/dengue.csv"
github_data <- read_csv(url(urlfile))
## Warning: Missing column names filled in: 'X1' [1]
## 
## -- Column specification --------------------------------------------------------
## cols(
##   X1 = col_double(),
##   humid = col_double(),
##   humid90 = col_double(),
##   temp = col_double(),
##   temp90 = col_double(),
##   h10pix = col_double(),
##   h10pix90 = col_double(),
##   trees = col_double(),
##   trees90 = col_double(),
##   NoYes = col_double(),
##   Xmin = col_double(),
##   Xmax = col_double(),
##   Ymin = col_double(),
##   Ymax = col_double()
## )
head(github_data)
## # A tibble: 6 x 14
##      X1 humid humid90  temp temp90 h10pix h10pix90 trees trees90 NoYes  Xmin
##   <dbl> <dbl>   <dbl> <dbl>  <dbl>  <dbl>    <dbl> <dbl>   <dbl> <dbl> <dbl>
## 1     1 0.671    4.42  2.04   8.47   17.4     17.8     0   1.5       0  70.5
## 2     2 7.65     8.17 12.3   14.9    11.0     11.7     0   1         0  62.5
## 3     3 6.98     9.56  6.93  14.6    17.5     17.6     0   1.2       0  68.5
## 4     4 1.11     1.83  4.64   6.05   17.4     17.5     0   0.6       0  67  
## 5     5 9.03     9.74 18.2   19.7    13.8     13.8     0   0         0  61  
## 6     6 8.91     9.52 11.9   16.6    11.7     11.7     0   0.200     0  64.5
## # ... with 3 more variables: Xmax <dbl>, Ymin <dbl>, Ymax <dbl>