General outline

SafeGrams are a way to describe the safety characteristics of a treatment - drug, vaccine, food item, whatever - from a passive (‘spontaneous’) reporting dataset. In order to do so, it calculates the proportional reporting rate (PRR) - the rate by which a particular side effect of a particular medication is reported compared to the same side effect of other medications.

Theory

Let \(D\) be an \(m \times n\) matrix that represents the frequency with which each of the \(m\) adverse effects occur for each of the \(n\) drugs, so that \(D_{i,j}\) (\(i \in m\), \(j \in n\)) represents the number of times the adverse effect \(j\) has occurred with the treatment \(i\).

For convenience’s sake, let \(D_{*,j}\) denote \(\sum_{i=1}^{m} D_{i,j}\), let \(D_{i,*}\) denote \(\sum_{j=1}^{n} D_{i,j}\), and let \(D_{*,*}\) denote \(\sum_{i=1}^{m} \sum_{j=1}^{n} D_{i,j}\). Furthermore, let \(D_{* \neg i, j}\) denote \(\sum_{k \neq i}^{m} D_{k,j}\) and \(D_{i, * \neg j}\) denote \(\sum_{k \neq j}^{n} D_{i, k}\).

Then, \(PRR\) can be calculated for each combination \(D_{i,j}\) by the following formula:

\[ PRR_{i,j} = \frac{D_{i,j} / D_{i,*}}{D_{* \neg i, j} / D_{* \neg i, *}} = \frac{D_{i,j}}{D_{i,*}} \cdot \frac{D_{*\neg i, *}}{D_{*\neg i, j}} \] Expanding this, we get

\[ PRR_{i,j} = \frac{D_{i,j}}{\displaystyle\sum_{q=1}^n D_{i,q}} \cdot \frac{\displaystyle\sum_{r=1, r\ne i}^{m} \displaystyle\sum_{s=1}^{n} D_{r,s}}{\displaystyle\sum_{t=1, t\ne i}^{m} D_{t,j}} \] While this looks daunting, it is in fact fairly straightforward computationally - given a 2x2 contingency table

ADR of interest (\(j\)) All other ADRs (\(\neg j\)) TOTAL
Treatment of interest (\(i\)) \(a\) \(b\) \(a + b\)
All other treatments (\(\neg i\)) \(c\) \(d\) \(c + d\)

\(PRR\) would be calculated as

\[\frac{a/(a + b)}{c/(c + d)}\] This is quite easy to implement computationally.

Implementation

Loading the data

This example presupposes that the VAERS raw data files, from the CDC’s website, have been obtained and downloaded to a single folder named data\, then unzipped. VAERS tables are normalised into three separate tables: XXXXVAERSSYMPTOMS.csv, which contains symptom data, XXXXVAERSVAX.csv, which contains information about the vaccine and XXXXVAERSDATA.csv, which contains information about the recipient and the vaccination event. All are coded by VAERS_ID.

Loading the SYMPTOMS file

The SYMPTOMS file contains one or more rows per VAERS ID, with each row containing up to five symptoms in fields named SYMPTOM1 to SYMPTOM5 for names and SYMPTOMVERSION1 to SYMPTOMVERSION5 for symptom version identifiers. The importer function for the SYMPTOMS file

  • imports the CSV file into a data frame,
  • melts the data frame down to pair each VAERS_ID with a symptom, discarding the symptom sequential, and
  • filters NAs.
read_symptoms_df <- function(year, folder = paste(getwd(), '/data', sep = "")){
    df <- paste(folder, "/", year, "VAERSSYMPTOMS.csv", sep = "") %>%
          read_csv() %>%
          dplyr::select("VAERS_ID", "SYMPTOM1", "SYMPTOM2", "SYMPTOM3", "SYMPTOM4", "SYMPTOM5") %>%
          melt(id.vars = "VAERS_ID", variable.name = "SYMPTOM_SEQ", value.name = "SYMPTOM_NAME") %>%
          dplyr::select("VAERS_ID", "SYMPTOM_NAME") %>%
          dplyr::filter(is.na(SYMPTOM_NAME) == FALSE)
    
    return(df)
}

Loading the VAX file

The VAX file contains details about the particular vaccine. In the present case, we only read in the VAERS_ID and the vaccine type (VAX_TYPE). In VAERS, a VAX_TYPE is a broader category of vaccines, ranking above the product name - for instance, ANTH is the VAX_TYPE corresponding to anthrax vaccines, while AVA would be the particular anthrax vaccine. Vaccines of unknown types (UNK) are excluded. A single VAERS_ID may have multiple vaccine entries, such as when the incident pertains to multiple coadministered vaccines.

read_vax_df <- function(year, folder = paste(getwd(), '/data', sep = "")){
    df <-   paste(folder, "/", year, "VAERSVAX.csv", sep = "") %>%
            readr::read_csv() %>%
            dplyr::select("VAERS_ID", "VAX_TYPE") %>%
            reshape2::melt(id.vars = "VAERS_ID", variable.name = "VVAR", value.name = "VAX_TYPE") %>%
            dplyr::select("VAERS_ID", "VAX_TYPE") %>%
            dplyr::filter(is.na(VAX_TYPE) == FALSE & VAX_TYPE != "UNK")

    return(df)
}

Loading the DATA file

DATA files contain a record per incident, and as such they contain only one entry for each VAERS_ID.

read_data_df <- function(year, folder = paste(getwd(), '/data', sep = "")){
    res <-  paste(folder, "/", year, "VAERSDATA.csv", sep = "") %>%
            read_csv() %>%
        
            # Age and sex are included as stratifying factors.
            dplyr::select("VAERS_ID", "AGE_YRS", "SEX") %>%
        
            # Filter indeterminate or unknown sex
            dplyr::filter(is.na(SEX) == FALSE, SEX != "U") %>%
        
            # Add year marker
            dplyr::mutate(YEAR = year)

    return(res)
}

Create multiyear tables

The multiyear table aggregator obtains a table using one of the read_*_df functions, and vertically joins all tables. The start and end arguments specify the start and end years, while the type argument specifies which data source to import. The type argument may take the following values:

  • data
  • symptoms
  • vax
create_multiyear_df <- function(start, end, type = "data"){
    if (type %in% c("data", "symptoms", "vax")){
        start:end %>%
        lapply(eval(parse(text = paste("read", type, "df", sep = "_")))) %>%
        rbindlist(use.names = TRUE) %>%
        return()
    } else {
        stop(sprintf("'%s' is not an accepted source frame type. Accepted source frames are: data, symptoms, vax.", type))
    }
}

Merge into a multi-year unitary table

We create a data.frame suitable for openEBGM, in which case identifiers are in the column id, vaccine names are named var1, symptoms var2 and strata variables are of the format strat_*.

create_unitary_df <- function(start, end){
    res <- merge(x = create_multiyear_df(start, end, "vax"),
                 y = create_multiyear_df(start, end, "data"),
                 by = "VAERS_ID", allow.cartesian = TRUE) %>%
           merge(y = create_multiyear_df(start, end, "symptoms"),
                 by = "VAERS_ID", allow.cartesian = TRUE) %>%
           set_colnames(c("id", "var1", "strat_age", "strat_sex", "strat_year", "var2"))
    
    return(res)
}

Processing raw data frame and calculating RRs

Create raw-processed data frame

calculate_raw_PRRs <- function(start, end, excluded_issues = NULL, exclude_normals = TRUE, stratify = FALSE, digits = 3, max_cats = 10) {
    df <- create_unitary_df(start, end)

    if (exclude_normals == TRUE) {
        df <- dplyr::filter(df, !grepl(" (normal|negative)", df$var2, ignore.case = TRUE))
    }
    
    if (!is.null(excluded_issues)) {
        df <- dplyr::filter(df, !var2 %in% excluded_issues)
    }
    
    processRaw(df, stratify = stratify, zeroes = FALSE, digits = digits, max_cats = max_cats) %>%
        return()
}

raw_PRRs <- calculate_raw_PRRs(2006, 2017)

Map vaccines to categories

Sometimes, it’s useful to map vaccines to a particular category, e.g. all influenza vaccines. The following preferred mapping implements a map that categorises vaccines according to pathogens, somewhat arbitrarily, and stores it as a variable vaccine_categories, which can later be reused.

VAX CLASS
6VAX-F Hexavax
ADEN_4_7 Adenovirus
ANTH Anthrax
BCG BCG
CHOL Cholera
DPP DTP and similar
DT DTP and similar
DTAP DTP and similar
DTAPH DTP and similar
DTAPHEPBIP DTP and similar
DTAPIPV DTP and similar
DTAPIPVHIB DTP and similar
DTIPV DTP and similar
DTOX Diphteria
DTP DTP and similar
DTPHEP DTP and similar
DTPHIB DTP and similar
DTPIHI DTP and similar
DTPIPV DTP and similar
DTPPHIB DTP and similar
FLU(H1N1) Influenza
FLU3 Influenza
FLU4 Influenza
FLUA3 Influenza
FLUC3 Influenza
FLUC4 Influenza
FLUN(H1N1) Influenza
FLUN3 Influenza
FLUN4 Influenza
FLUR3 Influenza
FLUR4 Influenza
FLUX Influenza
FLUX(H1N1) Influenza
H5N1 Influenza
HBHEPB H.influenzae b
HBPV H.influenzae b
HEP Hepatitis
HEPA Hepatitis
HEPAB Hepatitis
HEPATYP Hepatitis
HIBV H.influenzae b
HPV2 HPV
HPV4 HPV
HPV9 HPV
HPVX HPV
IPV Polio (injectable)
JEV JEV
JEV1 JEV
JEVX JEV
LYME Lyme
MEA Measles
MEN Meningococcal
MENB Meningococcal
MENHIB Meningococcal
MER Measles
MM Measles
MMR MMR +/- varicella
MMRV MMR +/- varicella
MNQ Meningococcal
MNQHIB Meningococcal
MU Mumps
MUR Mumps
OPV Polio (oral)
PER Pertussis
PLAGUE Plague
PNC Pneumococcal
PNC10 Pneumococcal
PNC13 Pneumococcal
PPV Pneumococcal
RAB Rabies
RUB Rubella
RV Rotavirus
RV1 Rotavirus
RV5 Rotavirus
RVX Rotavirus
SMALL Smallpox
SSEV Summer/spring encephalitis
TBE Tick-borne encephalitis
TD DTP and similar
TDAP DTP and similar
TDAPIPV DTP and similar
TTOX Tetanus
TYP Typhoid
VARCEL Varicella
VARZOS Zoster
YF Yellow fever

Plot SafeGrams by vaccine category

The base plotting function ingests a data frame and creates a faceted result.

plot_safegram <- function(df, vaccines = NULL, exclude_inf = TRUE, exclude_issues = NULL, N_cutoff = 4, prr_cutoff = 0, prr_significance_margin = 3, wrap_by = "CLASS"){
    if (is.null(vaccines)) {
        vax <- unique(df$var1)
    } else {
        vax <- vaccines
    }
    
    if (exclude_inf == TRUE) {
        df <- dplyr::filter(df, PRR != Inf)
    }
    
    if (is.null(exclude_issues) == FALSE) {
        df <- dplyr::filter(df, !var2 %in% exclude_issues)
    }
    
    plot <- df %>%
            dplyr::filter(N > N_cutoff) %>%
            dplyr::filter(PRR > prr_cutoff) %>%
            dplyr::filter(var1 %in% vax) %>%
            ggplot(aes(x = PRR, y = N))
    
    
    plot +
        theme_grey() +
        scale_x_log10() + xlab("Proportional Reporting Ratio (PRR)") +
        scale_y_log10() + ylab("Number of reports")  +
        stat_density2d(aes(fill = log(..level..), alpha = ..level..), geom = "polygon") +
        scale_fill_gradient2(high = "#ffffbf", mid = "#fdae61", low = "#3288bd", guide = FALSE) +
        scale_alpha_continuous(range = c(0.3, 0.6), guide = FALSE) +
        geom_vline(aes(xintercept = prr_significance_margin)) +
        facet_wrap(eval(parse(text = paste("~", wrap_by))))
}

Finally, let’s create a wrap-up call that calculates the PRR table, appends the category mapping from the function above, then plot the results:

    raw_PRRs %>%
    categorise_vaccines(vaccine_categories) %>%
    plot_safegram()

Plot SafeGrams for a particular vaccine or vaccines

The plotting function can be also called to plot a single or limited number of vaccines, by providing the vaccines parameter. If segmentation should be by vaccine type rather than vaccine class (e.g. different HPV subclasses in lieu of HPV collated), the wrap_by parameter should be set to var1, the column name for vaccines. In this case, the four HPV vaccine identifiers (HPV2 (bivalent), HPV4 (quadrivalent), HPV9 (nonavalent) and HPVX (unknown/unreported valency)) are compared:

    raw_PRRs %>%
    plot_safegram(vaccines = c("HPV2", "HPV4", "HPV9", "HPVX"), wrap_by = "var1")

Plotting with exclusion lists

The preprocessing code automatically excludes ‘normal’ test results, e.g. Biopsy kidney normal. The following are crude but efficient methods to identify three classes of symptom entities that should be excluded:

  1. Vaccine failure events: these events constitute issues where the adverse event pertains to a reaction that the vaccine was supposed to prevent, e.g. positive HPV tests (Human papilloma virus test positive) for the HPV vaccine.
  2. Handling, storage and administration issues: in these events, the vaccine was incorrectly administered, and as such the event pertains to the vaccine but rather to the event of administration.
  3. Non-reactions: these are MedDRA codes that do not describe a reaction but rather a treatment, such as Resuscitation, or a test (rather than a test result), such as Blood culture. As a rule, negative or normal results are routinely filtered at raw PRR calculation (function calculate_raw_PRRs(), supra). This leaves only positive test results.

It is practically impossible to comb the over 5,000 individual MedDRA entities mentioned in the filtered list. Therefore, a preferred approach is as follows.

Vaccine failure events

Vaccine failure events can be identified better from the high-PRR events as they are highly specific to the individual vaccine. To filter out low-\(N\) events, only events above a certain minimum (set at a multiplier \(m\) times the number of years) are considered:

identify_top_PRRs <- function(start, end, multiplier = 3) {
    calculate_raw_PRRs(start, end) %>%
        dplyr::filter(PRR != Inf) %>%
        dplyr::filter(N > length(start:end)) %>%
        dplyr::arrange(desc(PRR)) %>%
        head(500)
}

Based on this, the items with the highest \(PRR\)s were reviewed, and from this, the following were isolated as potential vaccine failure markers:

identify_top_PRRs(2006, 2017) %>%
    head(500) %>%
    dplyr::filter(var2 %in% exclusion_failure_markers) %>%
    set_colnames(c("Vaccine", "Symptom", "N", "E", "RR", "PRR")) %>% 
    dplyr::arrange(desc(N)) %>%
    kable("html", escape = FALSE) %>%
    kable_styling("striped")
Vaccine Symptom N E RR PRR
VARZOS Herpes zoster 5124 619.9139675 8.266 30.479
VARCEL Varicella post vaccine 1390 171.7652388 8.092 90.937
VARCEL Drug ineffective 648 98.0052158 6.612 23.685
HPV4 Smear cervix abnormal 517 49.4054237 10.464 709.481
MMR Mumps 356 30.6899979 11.600 550.679
HPV4 Papilloma viral infection 348 33.2826614 10.456 668.586
MMR Rash morbilliform 344 40.9199972 8.407 26.606
HPV4 Human papilloma virus test positive 318 30.9255324 10.283 305.475
RV5 Rotavirus test positive 219 7.8029454 28.066 192.720
VARCEL Varicella virus test positive 190 33.3513684 5.697 14.361
HEP No therapeutic response 174 12.7857577 13.609 30.357
HPV4 Cervical dysplasia 150 14.5199146 10.331 360.230
MMR Measles 126 12.5127264 10.070 62.015
HPV4 Anogenital warts 125 13.0113520 9.607 92.367
VARZOS Ophthalmic herpes zoster 122 12.4894432 9.768 81.083
RV5 Rotavirus infection 119 4.3757694 27.195 157.080
MMR Measles antibody positive 79 8.0318176 9.836 53.463
RV5 Gastroenteritis rotavirus 70 2.4173831 28.957 246.400
HEP Hepatitis B surface antigen positive 67 3.4794029 19.256 95.704
VARZOS Herpes zoster ophthalmic 67 7.2019417 9.303 55.662
MMR Rubella antibody positive 65 6.9327268 9.376 41.401
HPV4 Smear cervix 59 6.1285354 9.627 94.460
FLUN4 Influenza A virus test positive 59 1.9483179 30.283 39.828
RV5 Culture stool positive 58 2.3255837 24.940 102.080
MMR Vaccine breakthrough infection 58 6.1718178 9.398 41.868
MMR Mumps antibody test positive 57 5.7490905 9.915 56.109
FLUN3 Influenza A virus test positive 47 3.6614550 12.836 15.719
HEP Therapeutic response decreased 43 2.7248336 15.781 44.670
PNC Blood culture positive 43 4.5929233 9.362 13.801
FLUN3 Influenza virus test positive 41 1.7697033 23.168 35.286
VARCEL Infection transmission via personal contact 38 6.8296318 5.564 13.447
MMR Measles antibody 37 3.8890906 9.514 44.515
FLUN4 Influenza virus test positive 34 0.9416870 36.105 50.661
MMR Mumps antibody test 34 3.7199997 9.140 36.815
HEP Hepatitis B antibody 33 2.0121848 16.400 50.280
HPV4 Cervix carcinoma 32 3.1114103 10.285 307.396
FLUN3 Influenza serology positive 32 1.7391911 18.399 25.189
LYME Lyme disease 31 0.0489647 633.110 954.346
SMALL Vaccinia 28 0.2604895 107.490 1101.396
HEP Hepatitis B antigen positive 28 1.2576155 22.264 319.965
RV5 Rotavirus test 28 1.2239914 22.876 73.920
SMALL Vaccinia virus infection 27 0.2520866 107.106 1062.060
MMR Rubella 27 2.3672726 11.406 292.355
HPV4 Loop electrosurgical excision procedure 27 2.7342696 9.875 129.683
MMR Rubella antibody test 26 3.6354543 7.152 16.560
PNC Pneumococcal bacteraemia 25 1.4075088 17.762 49.996
MNQ Meningitis meningococcal 25 2.0716312 12.068 35.126
HEP Therapy non-responder 24 1.3833771 17.349 60.946
PNC Meningitis pneumococcal 24 1.6297470 14.726 31.198
HPV4 Dysplasia 23 2.2628438 10.164 220.941
RV1 Rotavirus test positive 23 1.3084743 17.578 19.221
LYME Borrelia burgdorferi serology 21 0.0399168 526.094 730.297
FLUN(H1N1) Influenza serology positive 21 1.0053974 20.887 25.378
FLUN3 Influenza B virus test positive 21 1.2052290 17.424 23.371
FLUN4 Exposure via direct contact 18 0.5439054 33.094 44.884
FLUN4 Influenza B virus test positive 18 0.6413213 28.067 36.054
HEP Hepatitis B antibody positive 18 1.3414565 13.418 29.385
VARZOS Herpes zoster oticus 18 2.9172422 6.170 12.818
HIBV Haemophilus infection 17 1.0673577 15.927 58.221
HIBV Rotavirus test 17 1.8562743 9.158 15.188
PNC Rotavirus test 16 1.4815882 10.799 17.332
PNC Pneumococcal infection 15 1.3704690 10.945 17.726
DTAPIPVHIB Culture stool positive 15 1.4038207 10.685 13.067

Handling, storage & administration and non-reactions

In these cases, the event either pertains to the handling, storage and administration of the vaccine, or does not denote a reaction (e.g. denoting a test without a result). For this purpose, the 500 most frequently occurring events, regardless of vaccine, will be considered, therefore the events will be aggregated.

all_markers = c(exclusion_handling_markers, exclusion_non_reaction_markers)

identify_top_Ns <- function(start, end) {
    calculate_raw_PRRs(start, end) %>%
    head(500) %>%
    dplyr::filter(PRR != Inf) %>% 
    dplyr::filter(var2 %in% all_markers) %>%
    dplyr::select("var2", "N", "PRR") %>% 
    plyr::ddply(.(var2), summarize, sum_N = sum(N), avg_PRR = mean(PRR)) %>% 
    dplyr::arrange(desc(sum_N))
}

From a manual review of the top 500 entities by \(N\), two lists were created, one containing handling, storage & administration issues (exclusion_handling_markers) marked in orange in the table below, and of non-reactions (exclusion_non_reaction_markers), marked in green on the table below.

Symptom/s N PRR
Enema administration 2 12.5830
Blood thyroid stimulating hormone 1 12.3950
Culture stool 1 12.3740
C-reactive protein 1 9.2860
Resuscitation 1 8.7140
Autopsy 1 8.6330
Endotracheal intubation 1 7.7050
Similar reaction on previous exposure to drug 1 7.7010
Culture 1 6.0850
Computerised tomogram 3 5.8300
Lumbar puncture 3 4.8550
Electrocardiogram 2 4.6470
Extra dose administered 3 3.6935
Urine analysis 2 3.2790
Metabolic function test 2 3.1120
Immunoglobulin therapy 1 2.5810
Chest X-ray 1 2.4940
Intensive care 1 2.3800
Full blood count 5 2.2710
Electroencephalogram 1 2.1910
Expired product administered 1 1.6930
Exposure during pregnancy 1 1.5180
Laboratory test 2 1.4360
Blood test 4 1.3515
Immediate post-injection reaction 3 1.0580
No adverse event 5 0.7660
Expired drug administered 1 0.7480
Unevaluable event 1 0.7410
Inappropriate schedule of drug administration 2 0.7240
Drug administered to patient of inappropriate age 1 0.7090

Plot with exclusion lists applied

    raw_PRRs_with_exclusions <- calculate_raw_PRRs(2006, 2017, excluded_issues = c(exclusion_non_reaction_markers, exclusion_handling_markers, exclusion_failure_markers))
    raw_PRRs_with_exclusions %>%
    categorise_vaccines(vaccine_categories) %>%
    plot_safegram()
SafeGram of major vaccine classes, 2006-2017

SafeGram of major vaccine classes, 2006-2017

Now, we can also plot the adjusted SafeGram for the HPV vaccines by vaccine type:

    raw_PRRs_with_exclusions %>%
    plot_safegram(vaccines = c("HPV2", "HPV4", "HPV9", "HPVX"), wrap_by = "var1")
SafeGrams of HPV vaccines by valency, 2006-2017

SafeGrams of HPV vaccines by valency, 2006-2017

And we can compare specific flu vaccines with different safety records:

    raw_PRRs_with_exclusions %>%
    plot_safegram(vaccines = c("FLUA3", "FLU(H1N1)", "FLU4", "FLU3"), wrap_by = "var1", N_cutoff = 0)
SafeGrams of injectable flu vaccines, 2006-2017

SafeGrams of injectable flu vaccines, 2006-2017

As well as compare the nasal versus non-nasal flu vaccines:

with different safety records:

    raw_PRRs_with_exclusions %>%
    plot_safegram(vaccines = c("FLU(H1N1)", "FLUN(H1N1)", "FLU3", "FLUN3", "FLU4", "FLUN4"), wrap_by = "var1", N_cutoff = 0)
SafeGrams of various flu vaccines, 2006-2017

SafeGrams of various flu vaccines, 2006-2017

Afterword

SafeGram is a great way to create intuitive comparative visualisations of safety of drugs, vaccines and other products from passive reporting databases. While passive reporting suffers from its own weaknesses, such as underreporting, lack of definite knowledge of the extent of underreporting and other methodological malaises, the \(PRR\) approach is a good start to get useful data out of what many are all too happy to discard as useless at best. With the right methods and intuitive visuals, it can provide clinically actionable intelligence to decision-makers, clinicians and patients alike as to the safety of particular pharmaceuticalss.