An Empircal Review of the Data from (1961 - 1990)

A Brief Introduction

As our planet’s climate has been constantly changing over geological time. Scientist, experts, and politicians from different parts of the world have emphazied urgent need for greater awarness regarding climate change. In the recent years, epidemiologist have also indicated and have pointed out that there is a need of more focus approch towards diseases which are strongly associated with variation in temprature. In this assingment, we will look into a relavant issue of dengue fever and its association with the climate change. The data used in this study was originally collected by Department of Public Health, Wellington School of Medicine and Health Sciences, New Zealand. We have used Hales et al. comprehesive study of Potential effect of population and climate changes on global distribution of dengue fever: an empirical model as a reference point. This study was completed in 2002, and is available for review and reference at Pubmed. (https://www.ncbi.nlm.nih.gov/pubmed/12243917)

Vector-Borne Disease

According to World Health Organization, Vector(s) are living organisims that can transmit infectious diseases between humans or from animals to humans. Many of these vectors are bloodsucking insects, which ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later inject it into a new host during their subsequent blood meal (Source: http://www.who.int). The best example would be of “mosquitoes”. Others include ticks, flies, sandflies, fleas, triatomine bugs and some freshwater aquatic snails. So the definition of the “Vector-borne disease” is: A human illnesses caused by parasites, viruses and bacteria that are transmitted by mosquitoes and other potential vectors. In our dataset we have reported cases of various populations around the world. These reported cases will be test in a regression model, against the temprature and humidity levels.

Impact of Vector-Borne Disease.

First and most important thing is the impact of vector borne disease is significant. According WHO, every year, more than 700,000 deaths occur from disease such as malaria, dengue, and human African trypanosomiasis. The major vector borne disease account for 17% of all infectious desease. The irony is, that the burden of these is highest in the tropical and subtropical areas with poorest populations.

Humidity.

In simplest terms, humidity is the amount of water vapor in the present air. Water vapors is in gaseous state, which makes it invisible to human eye. In this dataset, we have the average integer value of vapour density in the air. The Vapour density is the density of a gas with respect to the density of hydrogen, at the same temperature and pressure. It is given by:

[Vapour density of the gas] = Density of the Gas / Density of Hydrogen. A vapour density is an interger value as mentioned earlier, it has no UNIT. However, Air (layer of gases, NOT just O2) has a vapour density of ONE.For this use, air has a molecular weight of 28.97 atomic mass units, and all other gas and vapour molecular weights are divided by this number to derive their vapour density (wikipedia.org)

Goal of this Assingment

The goal of this assingment is to find a strong predictive model using the current data for the estimation of dengue virus. The current dataset is very limited in terms of variables. But it can help us to understand the nature and potential tipping point, from where the Vector Borne Disease or Dengue Virus in our particular cases, takes place.

A Glimplse of Variables in the dataset.

This very specific dataset, designed to check for the effects for temprature and humidity on vector borne disease such as dengue fever. In this assingment we will use: the variable temp: which is temprature in celcius. The variable humidity which is a measure vapour density in air. The variable NoYes: which is binary variable; where; 1 = Reported Dengue Case, and 0=Not Reported Dengue Case. Lastly, the Xmax and Ymax are used to spatially map the dengue cases, according the temprature zones.

glimpse(dengue)
## Observations: 2,000
## Variables: 14
## $ X        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ humid    <dbl> 0.6713889, 7.6483340, 6.9790556, 1.1104163, 9.0270555...
## $ humid90  <dbl> 4.416667, 8.167500, 9.563058, 1.825361, 9.742751, 9.5...
## $ temp     <dbl> 2.037500, 12.325000, 6.925000, 4.641665, 18.175000, 1...
## $ temp90   <dbl> 8.470835, 14.925000, 14.591660, 6.046669, 19.710000, ...
## $ h10pix   <dbl> 17.35653, 10.98361, 17.50833, 17.41763, 13.84306, 11....
## $ h10pix90 <dbl> 17.80861, 11.69167, 17.62528, 17.51694, 13.84306, 11....
## $ trees    <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ trees90  <dbl> 1.5, 1.0, 1.2, 0.6, 0.0, 0.2, 0.0, 0.0, 0.0, 0.6, 0.0...
## $ NoYes    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Xmin     <dbl> 70.5, 62.5, 68.5, 67.0, 61.0, 64.5, 67.5, 64.0, 63.5,...
## $ Xmax     <dbl> 74.5, 64.5, 69.5, 68.0, 64.5, 65.5, 68.5, 66.5, 65.5,...
## $ Ymin     <dbl> 38.0, 35.5, 36.0, 35.0, 33.5, 36.5, 33.5, 35.0, 33.0,...
## $ Ymax     <dbl> 35.5, 34.5, 35.0, 34.0, 32.0, 35.0, 32.0, 33.0, 29.5,...

Summary of Variable(s)

summary(dengue)
##        X              humid            humid90            temp       
##  Min.   :   1.0   Min.   : 0.6714   Min.   : 1.066   Min.   :-18.68  
##  1st Qu.: 500.8   1st Qu.:10.0088   1st Qu.:10.307   1st Qu.: 11.10  
##  Median :1000.5   Median :16.1433   Median :16.870   Median : 20.99  
##  Mean   :1000.5   Mean   :16.7013   Mean   :17.244   Mean   : 18.41  
##  3rd Qu.:1500.2   3rd Qu.:23.6184   3rd Qu.:24.131   3rd Qu.: 25.47  
##  Max.   :2000.0   Max.   :30.2665   Max.   :30.539   Max.   : 29.45  
##                   NA's   :2         NA's   :2        NA's   :2       
##      temp90           h10pix          h10pix90          trees     
##  Min.   :-10.07   Min.   : 4.317   Min.   : 5.848   Min.   : 0.0  
##  1st Qu.: 12.76   1st Qu.:14.584   1st Qu.:14.918   1st Qu.: 1.0  
##  Median : 22.03   Median :23.115   Median :24.130   Median :15.0  
##  Mean   : 19.41   Mean   :21.199   Mean   :21.557   Mean   :22.7  
##  3rd Qu.: 25.98   3rd Qu.:28.509   3rd Qu.:28.627   3rd Qu.:37.0  
##  Max.   : 29.66   Max.   :31.134   Max.   :31.134   Max.   :85.0  
##  NA's   :2                                          NA's   :12    
##     trees90          NoYes             Xmin              Xmax        
##  Min.   : 0.00   Min.   :0.0000   Min.   :-179.50   Min.   :-172.00  
##  1st Qu.: 6.00   1st Qu.:0.0000   1st Qu.: -12.00   1st Qu.: -10.00  
##  Median :30.60   Median :0.0000   Median :  16.00   Median :  17.75  
##  Mean   :35.21   Mean   :0.4155   Mean   :  13.31   Mean   :  15.63  
##  3rd Qu.:63.62   3rd Qu.:1.0000   3rd Qu.:  42.62   3rd Qu.:  44.50  
##  Max.   :97.10   Max.   :1.0000   Max.   : 178.00   Max.   : 180.00  
##  NA's   :12                                                          
##       Ymin             Ymax       
##  Min.   :-54.50   Min.   :-55.50  
##  1st Qu.:  6.00   1st Qu.:  5.00  
##  Median : 18.00   Median : 17.00  
##  Mean   : 19.78   Mean   : 18.16  
##  3rd Qu.: 39.00   3rd Qu.: 37.00  
##  Max.   : 82.50   Max.   : 68.50  
## 

Understanding the relationship between “temprature” and “humidity”

The relationship between temprature and humidity is linear. However, from the figure below, we can see, that the amount of humidity or the vapour density increases susbtantially when temprature rise abobe 15 degree celcius.

Temprature vs Humidity and Reported Cases of Dengue Fever.

In the figure below, we can see the more precise relationship between temprature and humidity with colour jitters representing dengue cases. From the figure it becomes much more clear that the amount of cases reported are mostly ploted in the graph where humidity and temprature level are high.

## Warning: Removed 34 rows containing non-finite values (stat_lm).
## Warning: Removed 34 rows containing missing values (geom_point).

2x2 Table: For average temprature and cases of Degue Fever.

In simple terms the table represents the probablistic table. Where, when, temprature is high the proability of dengue is also high.

## 
## ------------------------------------------
##    temp_level      dengue_yes   dengue_no 
## ----------------- ------------ -----------
##  High Temprature     0.5973      0.4027   
## 
##  Low Temprature       0.04        0.96    
## ------------------------------------------

2x2 Table: For average humidity (vapor density) and Cases of Degue Fever.

The probablistic representations gives us a better picture of the results. For areas with high humidity level the probability of dengue fever being reported is much more higher than the low humidity areas.

## 
## ----------------------------------------
##   humid_level    dengue_Yes   dengue_No 
## --------------- ------------ -----------
##  High Humidity      0.75        0.25    
## 
##  Low Humidity     0.04651      0.9535   
## ----------------------------------------

Model-1: A linear model to understand association between temprature and humidity.

This model simply reprsents the level of significant between the relationship of temprature and humidity. One unit increase in temprature, increases the vapor density in the air by .782. This is an scientific explantion and proof that humidity level strictly depends upon the temprature. The obtained R2 is .73, which shows that 73% percent variance is explained by our model.

## 
## Call:
## lm(formula = humid ~ temp, data = dengue)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7469  -1.8696   0.4592   2.5524  12.0928 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.26477    0.21288   10.64   <2e-16 ***
## temp         0.78294    0.01059   73.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.782 on 1984 degrees of freedom
## Multiple R-squared:  0.7337, Adjusted R-squared:  0.7336 
## F-statistic:  5468 on 1 and 1984 DF,  p-value: < 2.2e-16

Figure-1: Temprature and Humidty levels.

Model-2: Logistic Regression between Temprature and Dengue cases.

We know from the existing literature and from our Model#1 that temprature and humidty are strongly correlated. However, we also need to test the assumption that wheather temprature is a strong indicator of dengue or not. From the table below, we can see that the log odds of temprature which is -6.33. This is can be tricky to intepret. So we use the ODD RATIOS options in our code to exponentiate the log coefficents. This ODD RATIOS tells us: Each unit increase in temprature, increase the odds dengue case being reported by a factor of 1.337. Similary, the deviance results show that the residual deviance is 1675, which we will compare with others models, to finally select the best model which explains better.

## 
## Call:
## glm(formula = NoYes ~ temp, family = "binomial", data = dengue)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1644  -0.5917  -0.1649   0.7193   3.5173  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.33146    0.30866  -20.51   <2e-16 ***
## temp         0.29109    0.01359   21.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2695.4  on 1985  degrees of freedom
## Residual deviance: 1675.7  on 1984  degrees of freedom
## AIC: 1679.7
## 
## Number of Fisher Scoring iterations: 6
## Waiting for profiling to be done...
##                                2.5 %      97.5 %
## (Intercept) 0.001779431 0.0009519672 0.003194425
## temp        1.337885591 1.3037607773 1.375125137

Figure- 2: Estiamtion of Dengue with Temprature.

From the figure it can be seen, that the with rise in temprature at 15 degree celcius, the probabilty of dengue case being gets higher.

Model-3 Humidity Levels and Dengue

For this model, we used metric variable humid and binary variable NoYes which represent reported dengue case; the obtained odd ratio is 1.399, which can be intrepted as: Each unit increase the vapour density or the humidity, increase the odds of dengue case being reported by factor of 1.399. As compare to our last model, this deviance obatained in this model 1352.6 is lower than the previous model, therefore, Vapour Density is more appriate indicator of dengue virus than temprature.

## 
## Call:
## glm(formula = NoYes ~ humid, family = "binomial", data = dengue)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7863  -0.3967  -0.2263   0.4722   3.4689  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.31550    0.25932  -24.35   <2e-16 ***
## humid        0.33623    0.01353   24.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2695.4  on 1985  degrees of freedom
## Residual deviance: 1352.6  on 1984  degrees of freedom
## AIC: 1356.6
## 
## Number of Fisher Scoring iterations: 5
## Waiting for profiling to be done...
##                              2.5 %      97.5 %
## (Intercept) 0.00180806 0.001070632 0.002961372
## humid       1.39965849 1.364023292 1.438405347

Figure-3: Humidity Levels/Vapour Density and Probaility of Dengue Cases

The model-3 is more effective than model-1 which uses temprature as measure to estimate the degue cases. In this model, average vapour density of 20, gives 50% chance of dengue case to be reported.

Model-4: Multiple Logistic Regression Model.

In this model, we used both temprature and vapour density. This estimates the approxmately the same amount of per unit increase estimation for vapour density/humidity and temprature. However, it has much lower deviance and AIC (1348 - 1354 for AIC) as compare to the previous two logistic models. Also, we from this model it is pretty much clear that temprature is weak predictor of dengue cases, and most important vapour density is high associated with prevalance of dengue virus.

## 
## Call:
## glm(formula = NoYes ~ humid + temp, family = "binomial", data = dengue)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7262  -0.3914  -0.2017   0.4819   3.5463  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.58948    0.30327 -21.728   <2e-16 ***
## humid        0.30515    0.01991  15.324   <2e-16 ***
## temp         0.03952    0.01936   2.042   0.0412 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2695.4  on 1985  degrees of freedom
## Residual deviance: 1348.4  on 1983  degrees of freedom
## AIC: 1354.4
## 
## Number of Fisher Scoring iterations: 6
## Waiting for profiling to be done...
##                                2.5 %      97.5 %
## (Intercept) 0.001374749 0.0007414193 0.002437061
## humid       1.356826012 1.3060899022 1.412208149
## temp        1.040316054 1.0015794292 1.080627957

Figure-4: The figure is very similar, to Figure#3. However, due to better fit of model the curve gives us slight textured resutls.

Post Estimation Test Results.

Using the post estimation test for to find the best, it seems that model#4, as mentined earlier is the best model. According to ANOVA table below. The model-3 has deviance of 1348.4 and a significant P value.

## Analysis of Deviance Table
## 
## Model 1: NoYes ~ temp
## Model 2: NoYes ~ humid
## Model 3: NoYes ~ humid + temp
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1      1984     1675.7                       
## 2      1984     1352.6  0   323.13           
## 3      1983     1348.4  1     4.17  0.04121 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Likelihood Ratio Test.

Similar to ANOVA test, the Likihood Ratio test also gives us very clear view of best model. and again model three seems to be the best model. This because the value of LOGLIK which 674.22 which slightly less than model-3 and the reason for this is another predictor variable is added to the model, as compare to previous two model. In this dataset there was very little information about another potential indicators. However, according to the literature temprature and vapour pressur are two of the most important predictors when it comes dengue virus. The results from our regression model are synonymous with current literature regarding the causes of dengue virus.

## Likelihood ratio test
## 
## Model 1: NoYes ~ temp
## Model 2: NoYes ~ humid
## Model 3: NoYes ~ humid + temp
##   #Df  LogLik Df    Chisq Pr(>Chisq)    
## 1   2 -837.87                           
## 2   2 -676.30  0 323.1321    < 2e-16 ***
## 3   3 -674.22  1   4.1672    0.04121 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A Combined View of All Three Logistic Models

library(visreg)
htmlreg(list(m2, m3, m4))
Statistical models
Model 1 Model 2 Model 3
(Intercept) -6.33*** -6.32*** -6.59***
(0.31) (0.26) (0.30)
temp 0.29*** 0.04*
(0.01) (0.02)
humid 0.34*** 0.31***
(0.01) (0.02)
AIC 1679.73 1356.60 1354.43
BIC 1690.92 1367.79 1371.21
Log Likelihood -837.87 -676.30 -674.22
Deviance 1675.73 1352.60 1348.43
Num. obs. 1986 1986 1986
p < 0.001, p < 0.01, p < 0.05

Spatial Mapping of Dengue Cases Based on Reported Cases.

Using the given coordinates in our dataset, we created a spatial mapping of the dengue cases reported from various regions. The spatial mapping of the data revels that most of the reported cases of dengue virus are reported from regions with both low and high temprature zones. But for the geographical regions with high vapour density the dengue are much higher, as compare to regions which low humidity. Western African region and South Eastern China are most affected regions. However, this research needs further investigation and more comprehensive data which be analyzed to pin point potential areas or regions, so that there can preemptive measures and effective policy making.