Covid Mortality vs Blood Type

Hugh Whelan

Updated on: 2020-06-06

Background

A study referenced at https://www.news-medical.net/news/20200603/Blood-group-type-may-affect-susceptibility-to-COVID-19-respiratory-failure.aspx reported:

"A lead SNP was also identified on chromosome 9 at the ABO blood group locus, and further analysis showed that A-positive participants were at a 45% increased for respiratory failure, while individuals with blood group O were at a 35% decreased risk for respiratory failure.

The authors say that early clinical reports have suggested the ABO blood group system is involved in determining susceptibility to COVID-19 and has also been implicated in susceptibility to SARS-CoV-“1.”

As ABO distributions vary by country (see https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html), the question arises: Do blood type distribution differences help explain differences in Covid death rates across countries?

  • Cross-country analysis is inherently problematic because of the widely differing standards and efficiency used to record Covid deaths.
    • Additional confounding factors are the timelines of infection and different policies relating to lockdowns and nursing homes.
  • The country mortality variable I use is Covid deaths as a percent of a country’s population aged 65 and older. Nearly all Covid deaths occur in this age group, thus normalizing by this population compensates for countries with different relative proportions of potentially vulnerable citizens.
  • I arbitrarily only include countries with >= 20 reported Covid deaths (N=80).

Results

  • Contrary to the article cited above, I did not find that A+ and O blood types were helpful in explaining country differences in Covid death rates (see the last section below).
  • I did a principal component analysis of country blood type data that helped illustrate that blood type country distributions cluster by region.
  • That analysis highlighed that Asian countries stand out in having relatively high percentages of B+ blood types in their populations.
  • We know a-priori that Asian countries have had relatively low Covid death rates. So it may be correlation without causation, but a country’s percentage of B+ blood types is a statistically significant “predictor” of its Covid death rate.
  • Again it may be coincidental (or due to other factors like mask usage), but within the US, Non-Hispanic Asians have significantly lower Covid deaths than would be predicted by their proportion of the population.

There are existing studies on ABO blood types and Covid. One study possibly confirming lower risk for B+ patients is here. One study here indicates more vulnerability for B blood groups.

Data

Blood type distribution was sourced from http://www.rhesusnegative.net/themission/bloodtypefrequencies/. I hand coded the “region” designation and the resulting data set is available at https://docs.google.com/spreadsheets/d/1XsOS4B0PtQyK1xTSF5GaFWuVCo-tKSGWrXvhxF8e4Nk/edit?usp=sharing

Covid death data is from J. Hopkins and population data is from the World Bank (see code for links).

Code

The R-code that includes links to the data I used is on Github in case anyone wants to run the analysis themselves or validate the results.

Principal Component Differentiation Of Blood Group Distribution

Rick Wicklin did an interesting Principal Component (“PC”) analysis of country ABO distributions. I re-created the analysis which shows a PC differentiation of blood distributions results that match well with geographic regions (the elliptical labelling of region is done after the fact and is independent of the PC results). Note: Af=Africa, As=Asia, Au=“Australian Group”,Ca=Central America,Eu=Europe,La=Latin America,Me=Middle East, Na=North America.

We also know that Covid mortality seems to show significant variations by geographic region, with Asian mortality being notably low by comparison to other regions.

Country Statistics For Covid Deaths/Population 65 & Older By Region
Region N Mean Median StdDev
Eu 35 0.08850% 0.0290% 0.1122%
La 8 0.10515% 0.0597% 0.1044%
Au 2 0.00276% 0.0028% 0.0002%
Me 9 0.06940% 0.0274% 0.0864%
As 11 0.00786% 0.0055% 0.0064%
Af 9 0.01561% 0.0066% 0.0124%
Na 2 0.16656% 0.1666% 0.0633%
Ca 4 0.06855% 0.0624% 0.0578%

We can do a regression of country Covid death rates against the first two principal components.

## 
## Call:
## lm(formula = DeathPctPopGE65 ~ PC1 + PC2, data = pcomps)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0012222 -0.0005390 -0.0002411  0.0002495  0.0034488 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.754e-04  9.644e-05   7.004 8.19e-10 ***
## PC1         -1.438e-04  5.643e-05  -2.549  0.01278 *  
## PC2          2.062e-04  6.431e-05   3.207  0.00196 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008626 on 77 degrees of freedom
## Multiple R-squared:  0.1789, Adjusted R-squared:  0.1576 
## F-statistic: 8.391 on 2 and 77 DF,  p-value: 0.0005052

The adjusted R-squared of 16% tells us that only a relatively small portion of the variability of country Covid death rates is explained by the first two principal components. You can see the Asian region is correlated with the lowest Covid death rate as it is highest in PC1 while simultaneoulsy being low in PC2.

The principal component graph shows us that the key vector that differentiates the Asian region is its higher percentages of B+ blood types.

Country statistics on % B+ By Region
Region N Mean Median StdDev
Eu 35 12.28% 12.00% 4.119%
La 8 7.28% 7.90% 3.043%
Au 2 8.50% 8.50% 0.707%
Me 9 17.92% 17.00% 5.309%
As 11 28.30% 27.36% 6.903%
Af 9 17.20% 17.46% 3.919%
Na 2 8.05% 8.05% 0.636%
Ca 4 10.54% 8.74% 4.303%
## 
## Call:
## lm(formula = DeathPctPopGE65 ~ Bp, data = bloodC)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0009749 -0.0005408 -0.0003456  0.0003401  0.0034811 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0013668  0.0002161   6.324 1.47e-08 ***
## Bp          -0.0046451  0.0012943  -3.589 0.000578 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008762 on 78 degrees of freedom
## Multiple R-squared:  0.1417, Adjusted R-squared:  0.1307 
## F-statistic: 12.88 on 1 and 78 DF,  p-value: 0.000578

A simple regression using only the country percentage of B+ blood type versus country Covid death rate shows B+ to be a significant explanatory variable (B+ coefficient T-stat of -3.59; Adj. R-squared of 13%) with essentially all of the explanatory power of the first two principal components.

Confirming data that B+ blood types may be protective?

I am sure someone with more knowledge and better data sets (presumably clinical patient data) could do a better analysis to explore this question (see studies cited above in the Results section). As a simple follow up, I looked at the US CDC death data by race. Their data at https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm#Race_Hispanic shows Non-Hispanic Asians represent a little less than 50% of the Covid deaths that would be expected based on their proportion of the population.

Simple Analysis Of Claim regarding A+ vs O Blood Group

The article cited above in the Background section suggested patients with A+ blood were at higher risk than those in the O blood group. At the country level I did not find any statistically meaningful relationships between various versions of a country’s A minus O spread and its Covid death rate. And in fact the coefficients ran in the opposite direction than expected.

## 
## Call:
## lm(formula = DeathPctPopGE65 ~ AOSprd, data = bloodC)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0007606 -0.0005797 -0.0003919  0.0000030  0.0037983 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0006345  0.0001211   5.238 1.34e-06 ***
## AOSprd      -0.0004060  0.0005929  -0.685    0.495    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000943 on 78 degrees of freedom
## Multiple R-squared:  0.005976,   Adjusted R-squared:  -0.006768 
## F-statistic: 0.469 on 1 and 78 DF,  p-value: 0.4955
## 
## Call:
## lm(formula = DeathPctPopGE65 ~ ApOpSprd, data = bloodC)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0007622 -0.0005714 -0.0003878 -0.0000030  0.0038006 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0006359  0.0001210   5.254 1.26e-06 ***
## ApOpSprd    -0.0004141  0.0006226  -0.665    0.508    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0009431 on 78 degrees of freedom
## Multiple R-squared:  0.00564,    Adjusted R-squared:  -0.007108 
## F-statistic: 0.4424 on 1 and 78 DF,  p-value: 0.5079
## 
## Call:
## lm(formula = DeathPctPopGE65 ~ ApOSprd, data = bloodC)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0008324 -0.0005567 -0.0003908  0.0000684  0.0037997 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0005712  0.0001383   4.129 9.07e-05 ***
## ApOSprd     -0.0007479  0.0006475  -1.155    0.252    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0009378 on 78 degrees of freedom
## Multiple R-squared:  0.01682,    Adjusted R-squared:  0.004213 
## F-statistic: 1.334 on 1 and 78 DF,  p-value: 0.2516