As our planet’s climate has been constantly changing over geological time. Scientist, experts, and politicians from different parts of the world have emphazied urgent need for greater awarness regarding climate change. In the recent years, epidemiologist have also indicated and have pointed out that there is a need of more focus approch towards diseases which are strongly associated with variation in temprature. In this assingment, we will look into a relavant issue of dengue fever and its association with the climate change. The data used in this study was originally collected by Department of Public Health, Wellington School of Medicine and Health Sciences, New Zealand. We have used Hales et al. comprehesive study of Potential effect of population and climate changes on global distribution of dengue fever: an empirical model as a reference point. This study was completed in 2002, and is available for review and reference at Pubmed. (https://www.ncbi.nlm.nih.gov/pubmed/12243917)
According to World Health Organization, Vector(s) are living organisims that can transmit infectious diseases between humans or from animals to humans. Many of these vectors are bloodsucking insects, which ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later inject it into a new host during their subsequent blood meal (Source: http://www.who.int). The best example would be of “mosquitoes”. Others include ticks, flies, sandflies, fleas, triatomine bugs and some freshwater aquatic snails. So the definition of the “Vector-borne disease” is: A human illnesses caused by parasites, viruses and bacteria that are transmitted by mosquitoes and other potential vectors. In our dataset we have reported cases of various populations around the world. These reported cases will be test in a regression model, against the temprature and humidity levels.
First and most important thing is the impact of vector borne disease is significant. According WHO, every year, more than 700,000 deaths occur from disease such as malaria, dengue, and human African trypanosomiasis. The major vector borne disease account for 17% of all infectious desease. The irony is, that the burden of these is highest in the tropical and subtropical areas with poorest populations.
In simplest terms, humidity is the amount of water vapor in the present air. Water vapors is in gaseous state, which makes it invisible to human eye. In this dataset, we have the average integer value of vapour density in the air. The Vapour density is the density of a gas with respect to the density of hydrogen, at the same temperature and pressure. It is given by:
[Vapour density of the gas] = Density of the Gas / Density of Hydrogen. A vapour density is an interger value as mentioned earlier, it has no UNIT. However, Air (layer of gases, NOT just O2) has a vapour density of ONE.For this use, air has a molecular weight of 28.97 atomic mass units, and all other gas and vapour molecular weights are divided by this number to derive their vapour density (wikipedia.org)
The goal of this assingment is to find a strong predictive model using the current data for the estimation of dengue virus. The current dataset is very limited in terms of variables. But it can help us to understand the nature and potential tipping point, from where the Vector Borne Disease or Dengue Virus in our particular cases, takes place.
This very specific dataset, designed to check for the effects for temprature and humidity on vector borne disease such as dengue fever. In this assingment we will use: the variable temp: which is temprature in celcius. The variable humidity which is a measure vapour density in air. The variable NoYes: which is binary variable; where; 1 = Reported Dengue Case, and 0=Not Reported Dengue Case. Lastly, the Xmax and Ymax are used to spatially map the dengue cases, according the temprature zones.
glimpse(dengue)
## Observations: 2,000
## Variables: 14
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ humid <dbl> 0.6713889, 7.6483340, 6.9790556, 1.1104163, 9.0270555...
## $ humid90 <dbl> 4.416667, 8.167500, 9.563058, 1.825361, 9.742751, 9.5...
## $ temp <dbl> 2.037500, 12.325000, 6.925000, 4.641665, 18.175000, 1...
## $ temp90 <dbl> 8.470835, 14.925000, 14.591660, 6.046669, 19.710000, ...
## $ h10pix <dbl> 17.35653, 10.98361, 17.50833, 17.41763, 13.84306, 11....
## $ h10pix90 <dbl> 17.80861, 11.69167, 17.62528, 17.51694, 13.84306, 11....
## $ trees <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ trees90 <dbl> 1.5, 1.0, 1.2, 0.6, 0.0, 0.2, 0.0, 0.0, 0.0, 0.6, 0.0...
## $ NoYes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Xmin <dbl> 70.5, 62.5, 68.5, 67.0, 61.0, 64.5, 67.5, 64.0, 63.5,...
## $ Xmax <dbl> 74.5, 64.5, 69.5, 68.0, 64.5, 65.5, 68.5, 66.5, 65.5,...
## $ Ymin <dbl> 38.0, 35.5, 36.0, 35.0, 33.5, 36.5, 33.5, 35.0, 33.0,...
## $ Ymax <dbl> 35.5, 34.5, 35.0, 34.0, 32.0, 35.0, 32.0, 33.0, 29.5,...
summary(dengue)
## X humid humid90 temp
## Min. : 1.0 Min. : 0.6714 Min. : 1.066 Min. :-18.68
## 1st Qu.: 500.8 1st Qu.:10.0088 1st Qu.:10.307 1st Qu.: 11.10
## Median :1000.5 Median :16.1433 Median :16.870 Median : 20.99
## Mean :1000.5 Mean :16.7013 Mean :17.244 Mean : 18.41
## 3rd Qu.:1500.2 3rd Qu.:23.6184 3rd Qu.:24.131 3rd Qu.: 25.47
## Max. :2000.0 Max. :30.2665 Max. :30.539 Max. : 29.45
## NA's :2 NA's :2 NA's :2
## temp90 h10pix h10pix90 trees
## Min. :-10.07 Min. : 4.317 Min. : 5.848 Min. : 0.0
## 1st Qu.: 12.76 1st Qu.:14.584 1st Qu.:14.918 1st Qu.: 1.0
## Median : 22.03 Median :23.115 Median :24.130 Median :15.0
## Mean : 19.41 Mean :21.199 Mean :21.557 Mean :22.7
## 3rd Qu.: 25.98 3rd Qu.:28.509 3rd Qu.:28.627 3rd Qu.:37.0
## Max. : 29.66 Max. :31.134 Max. :31.134 Max. :85.0
## NA's :2 NA's :12
## trees90 NoYes Xmin Xmax
## Min. : 0.00 Min. :0.0000 Min. :-179.50 Min. :-172.00
## 1st Qu.: 6.00 1st Qu.:0.0000 1st Qu.: -12.00 1st Qu.: -10.00
## Median :30.60 Median :0.0000 Median : 16.00 Median : 17.75
## Mean :35.21 Mean :0.4155 Mean : 13.31 Mean : 15.63
## 3rd Qu.:63.62 3rd Qu.:1.0000 3rd Qu.: 42.62 3rd Qu.: 44.50
## Max. :97.10 Max. :1.0000 Max. : 178.00 Max. : 180.00
## NA's :12
## Ymin Ymax
## Min. :-54.50 Min. :-55.50
## 1st Qu.: 6.00 1st Qu.: 5.00
## Median : 18.00 Median : 17.00
## Mean : 19.78 Mean : 18.16
## 3rd Qu.: 39.00 3rd Qu.: 37.00
## Max. : 82.50 Max. : 68.50
##
The relationship between temprature and humidity is linear. However, from the figure below, we can see, that the amount of humidity or the vapour density increases susbtantially when temprature rise abobe 15 degree celcius.
In the figure below, we can see the more precise relationship between temprature and humidity with colour jitters representing dengue cases. From the figure it becomes much more clear that the amount of cases reported are mostly ploted in the graph where humidity and temprature level are high.
## Warning: Removed 34 rows containing non-finite values (stat_lm).
## Warning: Removed 34 rows containing missing values (geom_point).
In simple terms the table represents the probablistic table. Where, when, temprature is high the proability of dengue is also high.
##
## ------------------------------------------
## temp_level dengue_yes dengue_no
## ----------------- ------------ -----------
## High Temprature 0.5973 0.4027
##
## Low Temprature 0.04 0.96
## ------------------------------------------
The probablistic representations gives us a better picture of the results. For areas with high humidity level the probability of dengue fever being reported is much more higher than the low humidity areas.
##
## ----------------------------------------
## humid_level dengue_Yes dengue_No
## --------------- ------------ -----------
## High Humidity 0.75 0.25
##
## Low Humidity 0.04651 0.9535
## ----------------------------------------
This model simply reprsents the level of significant between the relationship of temprature and humidity. One unit increase in temprature, increases the vapor density in the air by .782. This is an scientific explantion and proof that humidity level strictly depends upon the temprature. The obtained R2 is .73, which shows that 73% percent variance is explained by our model.
##
## Call:
## lm(formula = humid ~ temp, data = dengue)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7469 -1.8696 0.4592 2.5524 12.0928
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.26477 0.21288 10.64 <2e-16 ***
## temp 0.78294 0.01059 73.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.782 on 1984 degrees of freedom
## Multiple R-squared: 0.7337, Adjusted R-squared: 0.7336
## F-statistic: 5468 on 1 and 1984 DF, p-value: < 2.2e-16
We know from the existing literature and from our Model#1 that temprature and humidty are strongly correlated. However, we also need to test the assumption that wheather temprature is a strong indicator of dengue or not. From the table below, we can see that the log odds of temprature which is -6.33. This is can be tricky to intepret. So we use the ODD RATIOS options in our code to exponentiate the log coefficents. This ODD RATIOS tells us: Each unit increase in temprature, increase the odds dengue case being reported by a factor of 1.337. Similary, the deviance results show that the residual deviance is 1675, which we will compare with others models, to finally select the best model which explains better.
##
## Call:
## glm(formula = NoYes ~ temp, family = "binomial", data = dengue)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1644 -0.5917 -0.1649 0.7193 3.5173
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.33146 0.30866 -20.51 <2e-16 ***
## temp 0.29109 0.01359 21.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2695.4 on 1985 degrees of freedom
## Residual deviance: 1675.7 on 1984 degrees of freedom
## AIC: 1679.7
##
## Number of Fisher Scoring iterations: 6
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.001779431 0.0009519672 0.003194425
## temp 1.337885591 1.3037607773 1.375125137
From the figure it can be seen, that the with rise in temprature at 15 degree celcius, the probabilty of dengue case being gets higher.
For this model, we used metric variable humid and binary variable NoYes which represent reported dengue case; the obtained odd ratio is 1.399, which can be intrepted as: Each unit increase the vapour density or the humidity, increase the odds of dengue case being reported by factor of 1.399. As compare to our last model, this deviance obatained in this model 1352.6 is lower than the previous model, therefore, Vapour Density is more appriate indicator of dengue virus than temprature.
##
## Call:
## glm(formula = NoYes ~ humid, family = "binomial", data = dengue)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7863 -0.3967 -0.2263 0.4722 3.4689
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.31550 0.25932 -24.35 <2e-16 ***
## humid 0.33623 0.01353 24.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2695.4 on 1985 degrees of freedom
## Residual deviance: 1352.6 on 1984 degrees of freedom
## AIC: 1356.6
##
## Number of Fisher Scoring iterations: 5
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.00180806 0.001070632 0.002961372
## humid 1.39965849 1.364023292 1.438405347
The model-3 is more effective than model-1 which uses temprature as measure to estimate the degue cases. In this model, average vapour density of 20, gives 50% chance of dengue case to be reported.
In this model, we used both temprature and vapour density. This estimates the approxmately the same amount of per unit increase estimation for vapour density/humidity and temprature. However, it has much lower deviance and AIC (1348 - 1354 for AIC) as compare to the previous two logistic models. Also, we from this model it is pretty much clear that temprature is weak predictor of dengue cases, and most important vapour density is high associated with prevalance of dengue virus.
##
## Call:
## glm(formula = NoYes ~ humid + temp, family = "binomial", data = dengue)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7262 -0.3914 -0.2017 0.4819 3.5463
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.58948 0.30327 -21.728 <2e-16 ***
## humid 0.30515 0.01991 15.324 <2e-16 ***
## temp 0.03952 0.01936 2.042 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2695.4 on 1985 degrees of freedom
## Residual deviance: 1348.4 on 1983 degrees of freedom
## AIC: 1354.4
##
## Number of Fisher Scoring iterations: 6
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.001374749 0.0007414193 0.002437061
## humid 1.356826012 1.3060899022 1.412208149
## temp 1.040316054 1.0015794292 1.080627957
Using the post estimation test for to find the best, it seems that model#4, as mentined earlier is the best model. According to ANOVA table below. The model-3 has deviance of 1348.4 and a significant P value.
## Analysis of Deviance Table
##
## Model 1: NoYes ~ temp
## Model 2: NoYes ~ humid
## Model 3: NoYes ~ humid + temp
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 1984 1675.7
## 2 1984 1352.6 0 323.13
## 3 1983 1348.4 1 4.17 0.04121 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Similar to ANOVA test, the Likihood Ratio test also gives us very clear view of best model. and again model three seems to be the best model. This because the value of LOGLIK which 674.22 which slightly less than model-3 and the reason for this is another predictor variable is added to the model, as compare to previous two model. In this dataset there was very little information about another potential indicators. However, according to the literature temprature and vapour pressur are two of the most important predictors when it comes dengue virus. The results from our regression model are synonymous with current literature regarding the causes of dengue virus.
## Likelihood ratio test
##
## Model 1: NoYes ~ temp
## Model 2: NoYes ~ humid
## Model 3: NoYes ~ humid + temp
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -837.87
## 2 2 -676.30 0 323.1321 < 2e-16 ***
## 3 3 -674.22 1 4.1672 0.04121 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(visreg)
htmlreg(list(m2, m3, m4))
Model 1 | Model 2 | Model 3 | ||
---|---|---|---|---|
(Intercept) | -6.33*** | -6.32*** | -6.59*** | |
(0.31) | (0.26) | (0.30) | ||
temp | 0.29*** | 0.04* | ||
(0.01) | (0.02) | |||
humid | 0.34*** | 0.31*** | ||
(0.01) | (0.02) | |||
AIC | 1679.73 | 1356.60 | 1354.43 | |
BIC | 1690.92 | 1367.79 | 1371.21 | |
Log Likelihood | -837.87 | -676.30 | -674.22 | |
Deviance | 1675.73 | 1352.60 | 1348.43 | |
Num. obs. | 1986 | 1986 | 1986 | |
p < 0.001, p < 0.01, p < 0.05 |
Using the given coordinates in our dataset, we created a spatial mapping of the dengue cases reported from various regions. The spatial mapping of the data revels that most of the reported cases of dengue virus are reported from regions with both low and high temprature zones. But for the geographical regions with high vapour density the dengue are much higher, as compare to regions which low humidity. Western African region and South Eastern China are most affected regions. However, this research needs further investigation and more comprehensive data which be analyzed to pin point potential areas or regions, so that there can preemptive measures and effective policy making.