Abstract

This is the second of a series of documents that try to explain the correlation between lethality and various factors in Mexico due to the COVID-19 disease.

In this document is explored the correlation between COVID-19 lethality and the tabaquism comorbidity. At very first sight it could be understood that due to the effects of the tabaquism in the respiratory system the lethality in the cases with tabaquism should be more than in the cases where no morbidity is found. Despite that this document tries to get an objective insight of this correlation.

Development and calculations

Again as in the first document the data with records which the comorbidities and conditions are not entirely known are discarded.

#Clean the data not considering the cases where the predictors are not weell defined.
cleanDataCOVID<-function(dataToClean){
  invalidCases<-c(97,98,99)
  dataToClean <- dataToClean %>% filter(! ASMA %in% invalidCases) %>%
    filter(! OBESIDAD %in% invalidCases) %>%
    filter(! NEUMONIA %in% invalidCases) %>%
    filter(! DIABETES %in% invalidCases) %>%
    filter(! EPOC %in% invalidCases) %>%
    filter(! INMUSUPR %in% invalidCases) %>%
    filter(! HIPERTENSION %in% invalidCases) %>%
    filter(! CARDIOVASCULAR %in% invalidCases) %>%
    filter(! TABAQUISMO %in% invalidCases) %>%
    filter(! OTRA_COM %in% invalidCases) %>%
    filter(! RENAL_CRONICA %in% invalidCases)
}

data.covid<-fread("210921COVID19MEXICO.csv", encoding = "UTF-8")

#The positive cases are those wich have been confirmed be it may by agree of the ASOCIACIÓN CLÍNICA EPIDEMIOLÓGICA (1), by agree of the COMITÉ DE  DICTAMINACIÓN (2), or by test (3). 
data.covid<-data.covid[CLASIFICACION_FINAL %in% c(1,2,3),]

data.covid<-cleanDataCOVID(data.covid)

#Formatting of predictors
data.covid<-mutate(data.covid,fallecido=ifelse((FECHA_DEF=="9999-99-99"),0,1))
data.covid$fallecido<-factor(data.covid$fallecido,labels = c("NO","SI"))

data.covid$CLASIFICACION_FINAL<-factor(data.covid$CLASIFICACION_FINAL,
                                 labels = c("CONFIRMADO_ASOCIACION_EPIDEMIOLOGICA",
                                            "CONFIRMADO_COMITE",
                                            "CONFIRMADO"))

data.covid$SECTOR<-factor(data.covid$SECTOR,labels = c("CRUZ ROJA","DIF","ESTATAL","IMSS",
                                           "IMSS-BIENESTAR","ISSSTE","MUNICIPAL",
                                           "PEMEX","PRIVADA","SEDENA","SEMAR",
                                           "SSA","UNIVERSITARIO","NO DEFINIDO",
                                           "NO ESPECIFICADO"))

data.covid$SEXO<-factor(data.covid$SEXO, labels = c("MUJER","HOMBRE"))
data.covid$TIPO_PACIENTE<-factor(data.covid$TIPO_PACIENTE,levels = c(1,2,99),labels = c("AMBULATORIO","HOSPITALIZADO","NO ESPECIFICADO"))
data.covid<-setStates(data.covid,"ENTIDAD_UM")
data.covid<-setStates(data.covid,"ENTIDAD_RES")
data.covid<-setStates(data.covid,"ENTIDAD_NAC")

data.covid<-arrangeYesNOFactor(data.covid,"TABAQUISMO")
data.covid<-arrangeYesNOFactor(data.covid,"ASMA")
data.covid<-arrangeYesNOFactor(data.covid,"OBESIDAD")
data.covid<-arrangeYesNOFactor(data.covid,"INDIGENA")
data.covid<-arrangeYesNOFactor(data.covid,"INTUBADO")
data.covid<-arrangeYesNOFactor(data.covid,"NEUMONIA")
data.covid<-arrangeYesNOFactor(data.covid,"EMBARAZO")
data.covid<-arrangeYesNOFactor(data.covid,"DIABETES")
data.covid<-arrangeYesNOFactor(data.covid,"EPOC")
data.covid<-arrangeYesNOFactor(data.covid,"INMUSUPR")
data.covid<-arrangeYesNOFactor(data.covid,"HIPERTENSION")
data.covid<-arrangeYesNOFactor(data.covid,"OTRA_COM")
data.covid<-arrangeYesNOFactor(data.covid,"CARDIOVASCULAR")
data.covid<-arrangeYesNOFactor(data.covid,"UCI")

data.covid.deceased<-data.covid[fallecido=="SI",]
data.covid.interned<-data.covid[TIPO_PACIENTE=="HOSPITALIZADO",]

## Data to consider
date.records<-data.covid$FECHA_ACTUALIZACION[[1]]
  
total.positive.cases<-nrow(data.covid)

total.hospitalized.cases<-nrow(data.covid.interned)
total.deceased.cases<-nrow(data.covid.deceased)

rate.lethality<-total.deceased.cases/total.positive.cases

rate.hospitalization<-total.hospitalized.cases/total.positive.cases

rate.lethality*100
## [1] 7.55
rate.hospitalization*100
## [1] 15.7

As per 2021-09-21 in México has been a total of 3564818 with a total of 560058 people admitted in hospitals and 269139 deceased giving a lethality of 7.55.

The lethality for smokers is:

total.smokers.deceased<-sum(data.covid.deceased$TABAQUISMO=="SI")
total.notsmokers.deceased<-sum(data.covid.deceased$TABAQUISMO=="NO")
total.notsmokers<-sum(data.covid$TABAQUISMO=="NO")
total.smokers<-sum(data.covid$TABAQUISMO=="SI")

rate.smokers.lethality<-total.smokers.deceased/total.smokers

\[ \frac{total.smokers.deceased}{total.smokers}*100 = \frac{20121}{240202}*100 = 8.377 \% \]

The lethality rate for smokers is closer that in other factors, doing the binomial test:

binom.test(total.smokers.deceased,total.smokers,p=rate.lethality)
## 
##  Exact binomial test
## 
## data:  total.smokers.deceased and total.smokers
## number of successes = 20121, number of trials = 2e+05, p-value <2e-16
## alternative hypothesis: true probability of success is not equal to 0.0757
## 95 percent confidence interval:
##  0.0827 0.0849
## sample estimates:
## probability of success 
##                 0.0838

The p-value falls to 2e-6 and the general lethality rate (7.55) is outside the confidence interval, even so, the causality of deceases is less than in other causes. Doing the chi squared test for the lethality:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data.covid$fallecido and data.covid$TABAQUISMO
## X-squared = 240, df = 1, p-value <2e-16

The chi squared test suggest correlation between lethality and tabaquism.

Let’s analyze the hospitalization rate.

\[ \frac{total.smokers.interned}{total.smokers}=\frac{40171}{240202}*100=17.026 \% \]

Doing the binomial and chi test:

binom.test(total.smokers.interned,total.smokers,p=rate.hospitalization)
## 
##  Exact binomial test
## 
## data:  total.smokers.interned and total.smokers
## number of successes = 40171, number of trials = 2e+05, p-value <2e-16
## alternative hypothesis: true probability of success is not equal to 0.158
## 95 percent confidence interval:
##  0.166 0.169
## sample estimates:
## probability of success 
##                  0.167
chisq.test(data.covid$fallecido,data.covid$TABAQUISMO)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data.covid$fallecido and data.covid$TABAQUISMO
## X-squared = 240, df = 1, p-value <2e-16

The range for general hospitalization (15.711) falls outside the confidence interval.

Beyond these results which could give a very clear relation between the COVID-19 lethality and smoking there is another factor to consider, let’s explore the case of prevalence of tabaquism among the cases of obesity.

total.smokers.with.obesity <- nrow(data.covid[OBESIDAD=="SI" & TABAQUISMO=="SI",])
rate.smokers.with.obesity <-total.smokers.with.obesity/total.smokers

rate.smokers=total.smokers/total.positive.cases

The proportion of smokers with obesity is 21.287 and the proportion of smokers among the general population with COVID is 6.738. So, there is a relation between obesity and tabaquism. Is it really natural say that the tabaquism is a factor that can increase the rate of lethality? Let’s do a further analysis comparing the lethality rate among the population of smokers without obesity.

total.smokers.without.obesity<-nrow(data.covid[OBESIDAD=="NO" & TABAQUISMO=="SI",])
total.smokers.without.obesity.deceased<-nrow(data.covid[OBESIDAD=="NO" & TABAQUISMO=="SI" & fallecido=="SI",])
rate.lethality.smokers.without.obesity<-total.smokers.without.obesity.deceased/total.smokers.without.obesity

\[ \frac{total.smokers.without.obesity.deceased}{total.smokers.without.obesity}=\frac{13799}{188344}*100=7.326 \% \]

Quite near to the general lethality rate 7.55. Let’s do a binomial test to see if it fits in the interval.

binom.test(total.smokers.without.obesity.deceased,total.smokers.without.obesity,p=rate.lethality)
## 
##  Exact binomial test
## 
## data:  total.smokers.without.obesity.deceased and total.smokers.without.obesity
## number of successes = 13799, number of trials = 2e+05, p-value = 2e-04
## alternative hypothesis: true probability of success is not equal to 0.0755
## 95 percent confidence interval:
##  0.0721 0.0745
## sample estimates:
## probability of success 
##                 0.0733

The p-value falls to 0.002 and the rate of lethality is below the general lethality.

So it seems there are other factors that affect the lethality and we can’t, with the current analysis, conclude that there is o there is not a correlation between lethality and tabaquism. In order to clarify this let’s do a logistic regression and use analysis of variance and variance inflation factor to demonstrate the real relation.

model.lr<-glm(fallecido~TABAQUISMO+ASMA+OBESIDAD+SEXO+DIABETES+INMUSUPR+HIPERTENSION+OTRA_COM+CARDIOVASCULAR,data.covid,family = "binomial")

summary(model.lr)
## 
## Call:
## glm(formula = fallecido ~ TABAQUISMO + ASMA + OBESIDAD + SEXO + 
##     DIABETES + INMUSUPR + HIPERTENSION + OTRA_COM + CARDIOVASCULAR, 
##     family = "binomial", data = data.covid)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.179  -0.332  -0.332  -0.245   2.809  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.21818    0.02727    44.7   <2e-16 ***
## TABAQUISMONO      0.09886    0.00827    11.9   <2e-16 ***
## ASMANO            0.33989    0.01650    20.6   <2e-16 ***
## OBESIDADNO       -0.32666    0.00548   -59.6   <2e-16 ***
## SEXOHOMBRE        0.61921    0.00437   141.6   <2e-16 ***
## DIABETESNO       -1.12696    0.00512  -220.2   <2e-16 ***
## INMUSUPRNO       -0.79781    0.01714   -46.5   <2e-16 ***
## HIPERTENSIONNO   -1.22533    0.00493  -248.4   <2e-16 ***
## OTRA_COMNO       -0.99308    0.01119   -88.7   <2e-16 ***
## CARDIOVASCULARNO -0.67433    0.01191   -56.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1908144  on 3564817  degrees of freedom
## Residual deviance: 1670222  on 3564808  degrees of freedom
## AIC: 1670242
## 
## Number of Fisher Scoring iterations: 6

We include all the biological factors in the data provided by the “secretaria de salud” except for EPOC and pneumonia because is not clear if the EPOC and pneumonia are a result of the COVID. We can, nevertheless, do an analysis with this regresors.

model.lr.bis<-glm(fallecido~TABAQUISMO+ASMA+OBESIDAD+EPOC+NEUMONIA+SEXO+DIABETES+INMUSUPR+HIPERTENSION+OTRA_COM+CARDIOVASCULAR,data.covid,family = "binomial")

summary(model.lr.bis)
## 
## Call:
## glm(formula = fallecido ~ TABAQUISMO + ASMA + OBESIDAD + EPOC + 
##     NEUMONIA + SEXO + DIABETES + INMUSUPR + HIPERTENSION + OTRA_COM + 
##     CARDIOVASCULAR, family = "binomial", data = data.covid)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.697  -0.213  -0.213  -0.169   3.079  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       2.65069    0.03356    79.0   <2e-16 ***
## TABAQUISMONO      0.15891    0.00972    16.4   <2e-16 ***
## ASMANO            0.32393    0.01868    17.3   <2e-16 ***
## OBESIDADNO       -0.13400    0.00638   -21.0   <2e-16 ***
## EPOCNO           -0.86714    0.01588   -54.6   <2e-16 ***
## NEUMONIANO       -3.15661    0.00496  -636.9   <2e-16 ***
## SEXOHOMBRE        0.47614    0.00505    94.2   <2e-16 ***
## DIABETESNO       -0.71343    0.00598  -119.3   <2e-16 ***
## INMUSUPRNO       -0.44794    0.02017   -22.2   <2e-16 ***
## HIPERTENSIONNO   -0.90900    0.00570  -159.4   <2e-16 ***
## OTRA_COMNO       -0.73366    0.01340   -54.8   <2e-16 ***
## CARDIOVASCULARNO -0.41923    0.01441   -29.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1908144  on 3564817  degrees of freedom
## Residual deviance: 1226088  on 3564806  degrees of freedom
## AIC: 1226112
## 
## Number of Fisher Scoring iterations: 6

In any case the effect of tabaquism is similar and it seems the probability of death when the case is of a person who smokes decreases. Let’s do the ANOVA and VIF analysis.

vif(model.lr)
##     TABAQUISMO           ASMA       OBESIDAD           SEXO       DIABETES 
##           1.02           1.01           1.05           1.02           1.23 
##       INMUSUPR   HIPERTENSION       OTRA_COM CARDIOVASCULAR 
##           1.02           1.27           1.02           1.03

We can see that the tabaquism is an orthogonal factor and there is no correlation with other factors at least in what to COVID concerns, though we can see a correlation between diabetes and hypertension.

Now let’s see the ANOVA analysis.

anova(model.lr, test="Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: fallecido
## 
## Terms added sequentially (first to last)
## 
## 
##                Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
## NULL                         3564817    1908144             
## TABAQUISMO      1      211   3564816    1907932   <2e-16 ***
## ASMA            1      106   3564815    1907826   <2e-16 ***
## OBESIDAD        1    19121   3564814    1888705   <2e-16 ***
## SEXO            1    17761   3564813    1870944   <2e-16 ***
## DIABETES        1   123542   3564812    1747401   <2e-16 ***
## INMUSUPR        1     3477   3564811    1743925   <2e-16 ***
## HIPERTENSION    1    63317   3564810    1680608   <2e-16 ***
## OTRA_COM        1     7396   3564809    1673212   <2e-16 ***
## CARDIOVASCULAR  1     2990   3564808    1670222   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

Is not clear the roll of tabaquism in the lethality of COVID-19. With the data provided by the “secretaría de salud” we could say that the tabaquism is one of the less significant factors in the development of the COVID disease. Even so is worth further study of the relation.

It seems that the main factors of the development of severe COVID are:

*Diabetes and hypertension are correlated as it can see in the VIF analysis.

So we can conclude that the mexican government should put special attention to the control of the diabetes, hypertension and obesity and that tabaquism requires a more detailed analysis.

In the case of the asthma could happens that the medication could contribute to e less severe COVID disease.