This is the second of a series of documents that try to explain the correlation between lethality and various factors in Mexico due to the COVID-19 disease.
In this document is explored the correlation between COVID-19 lethality and the tabaquism comorbidity. At very first sight it could be understood that due to the effects of the tabaquism in the respiratory system the lethality in the cases with tabaquism should be more than in the cases where no morbidity is found. Despite that this document tries to get an objective insight of this correlation.
Again as in the first document the data with records which the comorbidities and conditions are not entirely known are discarded.
#Clean the data not considering the cases where the predictors are not weell defined.
cleanDataCOVID<-function(dataToClean){
invalidCases<-c(97,98,99)
dataToClean <- dataToClean %>% filter(! ASMA %in% invalidCases) %>%
filter(! OBESIDAD %in% invalidCases) %>%
filter(! NEUMONIA %in% invalidCases) %>%
filter(! DIABETES %in% invalidCases) %>%
filter(! EPOC %in% invalidCases) %>%
filter(! INMUSUPR %in% invalidCases) %>%
filter(! HIPERTENSION %in% invalidCases) %>%
filter(! CARDIOVASCULAR %in% invalidCases) %>%
filter(! TABAQUISMO %in% invalidCases) %>%
filter(! OTRA_COM %in% invalidCases) %>%
filter(! RENAL_CRONICA %in% invalidCases)
}
data.covid<-fread("210921COVID19MEXICO.csv", encoding = "UTF-8")
#The positive cases are those wich have been confirmed be it may by agree of the ASOCIACIÓN CLÍNICA EPIDEMIOLÓGICA (1), by agree of the COMITÉ DE DICTAMINACIÓN (2), or by test (3).
data.covid<-data.covid[CLASIFICACION_FINAL %in% c(1,2,3),]
data.covid<-cleanDataCOVID(data.covid)
#Formatting of predictors
data.covid<-mutate(data.covid,fallecido=ifelse((FECHA_DEF=="9999-99-99"),0,1))
data.covid$fallecido<-factor(data.covid$fallecido,labels = c("NO","SI"))
data.covid$CLASIFICACION_FINAL<-factor(data.covid$CLASIFICACION_FINAL,
labels = c("CONFIRMADO_ASOCIACION_EPIDEMIOLOGICA",
"CONFIRMADO_COMITE",
"CONFIRMADO"))
data.covid$SECTOR<-factor(data.covid$SECTOR,labels = c("CRUZ ROJA","DIF","ESTATAL","IMSS",
"IMSS-BIENESTAR","ISSSTE","MUNICIPAL",
"PEMEX","PRIVADA","SEDENA","SEMAR",
"SSA","UNIVERSITARIO","NO DEFINIDO",
"NO ESPECIFICADO"))
data.covid$SEXO<-factor(data.covid$SEXO, labels = c("MUJER","HOMBRE"))
data.covid$TIPO_PACIENTE<-factor(data.covid$TIPO_PACIENTE,levels = c(1,2,99),labels = c("AMBULATORIO","HOSPITALIZADO","NO ESPECIFICADO"))
data.covid<-setStates(data.covid,"ENTIDAD_UM")
data.covid<-setStates(data.covid,"ENTIDAD_RES")
data.covid<-setStates(data.covid,"ENTIDAD_NAC")
data.covid<-arrangeYesNOFactor(data.covid,"TABAQUISMO")
data.covid<-arrangeYesNOFactor(data.covid,"ASMA")
data.covid<-arrangeYesNOFactor(data.covid,"OBESIDAD")
data.covid<-arrangeYesNOFactor(data.covid,"INDIGENA")
data.covid<-arrangeYesNOFactor(data.covid,"INTUBADO")
data.covid<-arrangeYesNOFactor(data.covid,"NEUMONIA")
data.covid<-arrangeYesNOFactor(data.covid,"EMBARAZO")
data.covid<-arrangeYesNOFactor(data.covid,"DIABETES")
data.covid<-arrangeYesNOFactor(data.covid,"EPOC")
data.covid<-arrangeYesNOFactor(data.covid,"INMUSUPR")
data.covid<-arrangeYesNOFactor(data.covid,"HIPERTENSION")
data.covid<-arrangeYesNOFactor(data.covid,"OTRA_COM")
data.covid<-arrangeYesNOFactor(data.covid,"CARDIOVASCULAR")
data.covid<-arrangeYesNOFactor(data.covid,"UCI")
data.covid.deceased<-data.covid[fallecido=="SI",]
data.covid.interned<-data.covid[TIPO_PACIENTE=="HOSPITALIZADO",]
## Data to consider
date.records<-data.covid$FECHA_ACTUALIZACION[[1]]
total.positive.cases<-nrow(data.covid)
total.hospitalized.cases<-nrow(data.covid.interned)
total.deceased.cases<-nrow(data.covid.deceased)
rate.lethality<-total.deceased.cases/total.positive.cases
rate.hospitalization<-total.hospitalized.cases/total.positive.cases
rate.lethality*100
## [1] 7.55
rate.hospitalization*100
## [1] 15.7
As per 2021-09-21 in México has been a total of 3564818 with a total of 560058 people admitted in hospitals and 269139 deceased giving a lethality of 7.55.
The lethality for smokers is:
total.smokers.deceased<-sum(data.covid.deceased$TABAQUISMO=="SI")
total.notsmokers.deceased<-sum(data.covid.deceased$TABAQUISMO=="NO")
total.notsmokers<-sum(data.covid$TABAQUISMO=="NO")
total.smokers<-sum(data.covid$TABAQUISMO=="SI")
rate.smokers.lethality<-total.smokers.deceased/total.smokers
\[ \frac{total.smokers.deceased}{total.smokers}*100 = \frac{20121}{240202}*100 = 8.377 \% \]
The lethality rate for smokers is closer that in other factors, doing the binomial test:
binom.test(total.smokers.deceased,total.smokers,p=rate.lethality)
##
## Exact binomial test
##
## data: total.smokers.deceased and total.smokers
## number of successes = 20121, number of trials = 2e+05, p-value <2e-16
## alternative hypothesis: true probability of success is not equal to 0.0757
## 95 percent confidence interval:
## 0.0827 0.0849
## sample estimates:
## probability of success
## 0.0838
The p-value falls to 2e-6 and the general lethality rate (7.55) is outside the confidence interval, even so, the causality of deceases is less than in other causes. Doing the chi squared test for the lethality:
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data.covid$fallecido and data.covid$TABAQUISMO
## X-squared = 240, df = 1, p-value <2e-16
The chi squared test suggest correlation between lethality and tabaquism.
Let’s analyze the hospitalization rate.
\[ \frac{total.smokers.interned}{total.smokers}=\frac{40171}{240202}*100=17.026 \% \]
Doing the binomial and chi test:
binom.test(total.smokers.interned,total.smokers,p=rate.hospitalization)
##
## Exact binomial test
##
## data: total.smokers.interned and total.smokers
## number of successes = 40171, number of trials = 2e+05, p-value <2e-16
## alternative hypothesis: true probability of success is not equal to 0.158
## 95 percent confidence interval:
## 0.166 0.169
## sample estimates:
## probability of success
## 0.167
chisq.test(data.covid$fallecido,data.covid$TABAQUISMO)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data.covid$fallecido and data.covid$TABAQUISMO
## X-squared = 240, df = 1, p-value <2e-16
The range for general hospitalization (15.711) falls outside the confidence interval.
Beyond these results which could give a very clear relation between the COVID-19 lethality and smoking there is another factor to consider, let’s explore the case of prevalence of tabaquism among the cases of obesity.
total.smokers.with.obesity <- nrow(data.covid[OBESIDAD=="SI" & TABAQUISMO=="SI",])
rate.smokers.with.obesity <-total.smokers.with.obesity/total.smokers
rate.smokers=total.smokers/total.positive.cases
The proportion of smokers with obesity is 21.287 and the proportion of smokers among the general population with COVID is 6.738. So, there is a relation between obesity and tabaquism. Is it really natural say that the tabaquism is a factor that can increase the rate of lethality? Let’s do a further analysis comparing the lethality rate among the population of smokers without obesity.
total.smokers.without.obesity<-nrow(data.covid[OBESIDAD=="NO" & TABAQUISMO=="SI",])
total.smokers.without.obesity.deceased<-nrow(data.covid[OBESIDAD=="NO" & TABAQUISMO=="SI" & fallecido=="SI",])
rate.lethality.smokers.without.obesity<-total.smokers.without.obesity.deceased/total.smokers.without.obesity
\[ \frac{total.smokers.without.obesity.deceased}{total.smokers.without.obesity}=\frac{13799}{188344}*100=7.326 \% \]
Quite near to the general lethality rate 7.55. Let’s do a binomial test to see if it fits in the interval.
binom.test(total.smokers.without.obesity.deceased,total.smokers.without.obesity,p=rate.lethality)
##
## Exact binomial test
##
## data: total.smokers.without.obesity.deceased and total.smokers.without.obesity
## number of successes = 13799, number of trials = 2e+05, p-value = 2e-04
## alternative hypothesis: true probability of success is not equal to 0.0755
## 95 percent confidence interval:
## 0.0721 0.0745
## sample estimates:
## probability of success
## 0.0733
The p-value falls to 0.002 and the rate of lethality is below the general lethality.
So it seems there are other factors that affect the lethality and we can’t, with the current analysis, conclude that there is o there is not a correlation between lethality and tabaquism. In order to clarify this let’s do a logistic regression and use analysis of variance and variance inflation factor to demonstrate the real relation.
model.lr<-glm(fallecido~TABAQUISMO+ASMA+OBESIDAD+SEXO+DIABETES+INMUSUPR+HIPERTENSION+OTRA_COM+CARDIOVASCULAR,data.covid,family = "binomial")
summary(model.lr)
##
## Call:
## glm(formula = fallecido ~ TABAQUISMO + ASMA + OBESIDAD + SEXO +
## DIABETES + INMUSUPR + HIPERTENSION + OTRA_COM + CARDIOVASCULAR,
## family = "binomial", data = data.covid)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.179 -0.332 -0.332 -0.245 2.809
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.21818 0.02727 44.7 <2e-16 ***
## TABAQUISMONO 0.09886 0.00827 11.9 <2e-16 ***
## ASMANO 0.33989 0.01650 20.6 <2e-16 ***
## OBESIDADNO -0.32666 0.00548 -59.6 <2e-16 ***
## SEXOHOMBRE 0.61921 0.00437 141.6 <2e-16 ***
## DIABETESNO -1.12696 0.00512 -220.2 <2e-16 ***
## INMUSUPRNO -0.79781 0.01714 -46.5 <2e-16 ***
## HIPERTENSIONNO -1.22533 0.00493 -248.4 <2e-16 ***
## OTRA_COMNO -0.99308 0.01119 -88.7 <2e-16 ***
## CARDIOVASCULARNO -0.67433 0.01191 -56.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1908144 on 3564817 degrees of freedom
## Residual deviance: 1670222 on 3564808 degrees of freedom
## AIC: 1670242
##
## Number of Fisher Scoring iterations: 6
We include all the biological factors in the data provided by the “secretaria de salud” except for EPOC and pneumonia because is not clear if the EPOC and pneumonia are a result of the COVID. We can, nevertheless, do an analysis with this regresors.
model.lr.bis<-glm(fallecido~TABAQUISMO+ASMA+OBESIDAD+EPOC+NEUMONIA+SEXO+DIABETES+INMUSUPR+HIPERTENSION+OTRA_COM+CARDIOVASCULAR,data.covid,family = "binomial")
summary(model.lr.bis)
##
## Call:
## glm(formula = fallecido ~ TABAQUISMO + ASMA + OBESIDAD + EPOC +
## NEUMONIA + SEXO + DIABETES + INMUSUPR + HIPERTENSION + OTRA_COM +
## CARDIOVASCULAR, family = "binomial", data = data.covid)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.697 -0.213 -0.213 -0.169 3.079
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.65069 0.03356 79.0 <2e-16 ***
## TABAQUISMONO 0.15891 0.00972 16.4 <2e-16 ***
## ASMANO 0.32393 0.01868 17.3 <2e-16 ***
## OBESIDADNO -0.13400 0.00638 -21.0 <2e-16 ***
## EPOCNO -0.86714 0.01588 -54.6 <2e-16 ***
## NEUMONIANO -3.15661 0.00496 -636.9 <2e-16 ***
## SEXOHOMBRE 0.47614 0.00505 94.2 <2e-16 ***
## DIABETESNO -0.71343 0.00598 -119.3 <2e-16 ***
## INMUSUPRNO -0.44794 0.02017 -22.2 <2e-16 ***
## HIPERTENSIONNO -0.90900 0.00570 -159.4 <2e-16 ***
## OTRA_COMNO -0.73366 0.01340 -54.8 <2e-16 ***
## CARDIOVASCULARNO -0.41923 0.01441 -29.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1908144 on 3564817 degrees of freedom
## Residual deviance: 1226088 on 3564806 degrees of freedom
## AIC: 1226112
##
## Number of Fisher Scoring iterations: 6
In any case the effect of tabaquism is similar and it seems the probability of death when the case is of a person who smokes decreases. Let’s do the ANOVA and VIF analysis.
vif(model.lr)
## TABAQUISMO ASMA OBESIDAD SEXO DIABETES
## 1.02 1.01 1.05 1.02 1.23
## INMUSUPR HIPERTENSION OTRA_COM CARDIOVASCULAR
## 1.02 1.27 1.02 1.03
We can see that the tabaquism is an orthogonal factor and there is no correlation with other factors at least in what to COVID concerns, though we can see a correlation between diabetes and hypertension.
Now let’s see the ANOVA analysis.
anova(model.lr, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: fallecido
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 3564817 1908144
## TABAQUISMO 1 211 3564816 1907932 <2e-16 ***
## ASMA 1 106 3564815 1907826 <2e-16 ***
## OBESIDAD 1 19121 3564814 1888705 <2e-16 ***
## SEXO 1 17761 3564813 1870944 <2e-16 ***
## DIABETES 1 123542 3564812 1747401 <2e-16 ***
## INMUSUPR 1 3477 3564811 1743925 <2e-16 ***
## HIPERTENSION 1 63317 3564810 1680608 <2e-16 ***
## OTRA_COM 1 7396 3564809 1673212 <2e-16 ***
## CARDIOVASCULAR 1 2990 3564808 1670222 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Is not clear the roll of tabaquism in the lethality of COVID-19. With the data provided by the “secretaría de salud” we could say that the tabaquism is one of the less significant factors in the development of the COVID disease. Even so is worth further study of the relation.
It seems that the main factors of the development of severe COVID are:
*Diabetes and hypertension are correlated as it can see in the VIF analysis.
So we can conclude that the mexican government should put special attention to the control of the diabetes, hypertension and obesity and that tabaquism requires a more detailed analysis.
In the case of the asthma could happens that the medication could contribute to e less severe COVID disease.