On Covid-19 data from Brazil National Surveillance System of Severe Acute Respiratory Illness
Over the year of 2020 the world has seen an unprecedented severe pandemic due to the Novel Coronavirus (SARS-CoV-2), this has put challenges over each and every surveillance system on how to produce and curate data about the pandemic. Brazil has one of the biggest and most well established public health system, in Portuguese, Sistema Único de Saúde (SUS).
From the national surveillance system to the influenza (SIVEP-Gripe) has coming the most of the data about Severe Acute Respiratory Syndrome SARS cases due to the SARS-Cov-2. We compiled those data and ran Logistic Regression Models, aiming to seen how can Socio-Demographic factors as, Age, Sex, Race, Region, and years of Schoolarity can be associated to mortality outcome. We ran another series of Logistic Regression Models, aiming to seen how can Severity factors as, ICU, Dispnea, Saturation, Ventilation Support, as well as symptoms factors as, Fever, Cough, Sore Throat, Uncomfortable Respiration and Recent Lying-in
SIVEP-Gripe disposes publicly anonymazed data from cases of any Severe Acute Respiratory Illness (SARI) that has been hospitalized, in public hospitals or private. The data has information on socio-demographic factors as well clinic factors, as of symptoms. SIVEP-Gripe has been a very wide used system in Brazil to understand and work with Covid-19 SARS cases, due to, being a system that captures universally hospitalized cases of SARI over Brazil.
We selected the confirmed Covid-19 cases from SIVEP-Gripe. From those cases, we selected cases that has information fulfilled for age, sex, race, state, and school years studied. The former three ones variables were transformed and/or aggregated. Race has been summarized in 3 categories from the 5 originals ones, we picked together Black & Brown people as well as Yellow & Indigenous people. The state variable were resumed as for the 5 administrative regions of Brazil, being, North, Northeast, Center-West, Southeast and South region. This aggregation joints states that has geographically more similiar resources and health systems access. The School years were leveled as to equivalent of the International Standard Classification of Education (ICSED) with addition of 2 more levels for the illirate ones and for the graduate level ones.
Socio-Demographic Data Summary:
library(gtsummary)
socio_demo_data %>%
tbl_summary(by = region) %>%
bold_labels()
| Characteristic | Center-West, N = 9,7611 | North, N = 10,0511 | Northeast, N = 8,4341 | South, N = 31,2211 | Southeast, N = 60,0701 |
|---|---|---|---|---|---|
| outcome | |||||
| Death | 3,717 (38%) | 4,708 (47%) | 3,903 (47%) | 11,385 (37%) | 23,152 (39%) |
| Discharge | 6,021 (62%) | 5,265 (53%) | 4,476 (53%) | 19,765 (63%) | 36,815 (61%) |
| Unknown | 23 | 78 | 55 | 71 | 103 |
| age | 0 (-11, 11) | 1 (-11, 12) | 2 (-11, 14) | 2 (-10, 12) | 1 (-10, 12) |
| race | |||||
| Black & Brown | 5,930 (61%) | 8,475 (84%) | 6,804 (81%) | 3,257 (10%) | 18,717 (31%) |
| White | 3,602 (37%) | 1,328 (13%) | 1,513 (18%) | 27,754 (89%) | 40,740 (68%) |
| Yellow & Indigenous | 229 (2.3%) | 248 (2.5%) | 117 (1.4%) | 210 (0.7%) | 613 (1.0%) |
| sex | |||||
| Female | 4,654 (48%) | 4,676 (47%) | 4,070 (48%) | 14,629 (47%) | 28,577 (48%) |
| Male | 5,107 (52%) | 5,375 (53%) | 4,364 (52%) | 16,592 (53%) | 31,493 (52%) |
| school | |||||
| Graduate | 1,100 (11%) | 1,220 (12%) | 858 (10%) | 3,781 (12%) | 7,980 (13%) |
| High School | 2,712 (28%) | 2,812 (28%) | 2,158 (26%) | 8,565 (27%) | 17,775 (30%) |
| Illiterate | 702 (7.2%) | 1,341 (13%) | 1,460 (17%) | 1,314 (4.2%) | 3,338 (5.6%) |
| Pre School | 17 (0.2%) | 69 (0.7%) | 52 (0.6%) | 54 (0.2%) | 109 (0.2%) |
| Primary School | 5,230 (54%) | 4,609 (46%) | 3,906 (46%) | 17,507 (56%) | 30,868 (51%) |
|
1
n (%); Median (IQR)
|
|||||
Age is expressed as a continuous normalized variable from its mean, the value of \(2\) corresponds to the mean of this normalized variable, the mean age of cases is 60 years with a median of 63 years old.
Severity data summary:
library(gtsummary)
severity_data %>%
tbl_summary() %>%
bold_labels()
| Characteristic | N = 119,5371 |
|---|---|
| outcome | |
| Death | 46,865 (39%) |
| Discharge | 72,342 (61%) |
| Unknown | 330 |
| uti | |
| 1 | 41,670 (35%) |
| 2 | 77,867 (65%) |
| suport_ven | |
| 1 | 25,539 (21%) |
| 2 | 70,407 (59%) |
| 3 | 23,591 (20%) |
| febre | |
| 1 | 72,616 (61%) |
| 2 | 46,921 (39%) |
| tosse | |
| 1 | 89,536 (75%) |
| 2 | 30,001 (25%) |
| garganta | |
| 1 | 27,387 (23%) |
| 2 | 92,150 (77%) |
| dispneia | |
| 1 | 92,015 (77%) |
| 2 | 27,522 (23%) |
| desc_resp | |
| 1 | 83,771 (70%) |
| 2 | 35,766 (30%) |
| saturacao | |
| 1 | 87,179 (73%) |
| 2 | 32,358 (27%) |
| puerpera | |
| 1 | 733 (0.6%) |
| 2 | 118,804 (99%) |
| vacina | |
| 1 | 42,045 (35%) |
| 2 | 77,492 (65%) |
|
1
n (%)
|
|
Every severity and/or symptoms are binary variables, where \(1\) means yes for the status of the variable and \(2\) means No for the status of the variable. On the ventilation support, variable “suport_ven”, \(3\) means No, \(1\) yes and invasive, and \(2\) yes non-invasive.
We choose to use logistic regression models to try to capture how with it amount those socio-demographic variables can be associated to the outcome of a hospitalization of a severe acute respiratory illness with a confirmed RT-PCR test for Covid-19. We repeat the same models for the data without censuring for a positive test, the SARI totality of cases. This models are displayed in the Supplementary Materials.
Now, we construct a model that encompass all the possible pairwise interaction terms with the socio-demographic factors, our model has the following formula:
\[ \log(\frac{Death}{Discharge}) = \beta_0 + \beta_1(Age) + \beta_2(Sex) + \beta_3(Race) + \beta_4(Region) + \beta_5(School) + \sum_{i,j} \beta_ix_{i,j} \]
Where every variable is the but one a reference level, except age, which is continuous. Sex has the reference of Female, race has the reference of White people, region has the reference of Southeast and School, for the school years studied of described above, has the reference of graduate level. The last term on the equation is the all possible pairwise interaction terms. The model will have 150 degrees of freedom.
The following emerges from the model:
include_graphics("Plots/plot_sociodemografic_coefficients_covid.png")
Another model is adjusted to the severity factors and/or symptons that is present on patient, each factor is binary, with levels being 1 and 2, expect for ventilation support which has 3 levels. For those factors we ahve the following Odds Ratios:
include_graphics("Plots/plot_severity_coefficients_covid.png")