library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
m <- read.csv("narms_public.csv")
sal <- subset(m, m$GENUS=="S")
The NARMS data set (loaded above) is a long list of individual bacterial isolates. Each row (isolate) has columns specifying it’s genus and serotype (e.g. salmonella, serotype Typhi) as well as it’s resistance to antibiotics. In the dataset above, I made a subset “sal” containing only the Salmonella bacterial isolates.
The resistance is based on the MIC or minimum inhibitory concentration (amount required to inhibit it). If the minimum concentration of a particular antibiotic required to inhibit a bacteria’s growth was above a certain threshold, it is considered “resistant”. There is an intermediate zone considered to be intermediate" resistance.
For each antibiotic, there are three columns: 1. the “concentration”, a number representing an amount of antibiotic, 2. the sign, which will be <=, <, >, =, etc, representing whether the “MIC” was greater or less than the concentration in the other column, and 3. a column with either S, I or R representing whether it was resistant, intermediate, or susceptible. They are named consistently according to an abbreviation of the antibiotic name for the column with the concentration, with .SIGN added for the sign column and .SIR added for the SIR column. For example, ceftriaxone (AXO) has three columns AXO, AXO.SIGN, and AXO.SIR.
An ifelse statement can be used to convert the first two columns into an SIR column, which can be checked against the third to confirm it worked:
sal$SIR.AXO <- ifelse(((sal$AXO<=1) & (sal$AXO.Sign=="=" | sal$AXO.Sign=="<" | sal$AXO.Sign=="<=")), c("S"), c("RI"))
table(sal$SIR.AXO,sal$AXO.SIR)
##
## I R S
## RI 0 1 724 0
## S 0 0 0 3784
In that case, I only made 2 options - S for susceptible if the MIC was less than or equal to 1, and RI for resistant/intermediate if it was greater than 1.
Other datasets I am working on have the breakpoints, but no SIR column. The same is true of the newer published versions of NARMS datasets. I would like to write a function that accepts the name of a dataframe, the two columns containing breakpoint and sign, and the breakpoint as a number, and uses these to generate the SIR or S and RI column. This can be done:
SIRcol <- function(df,abx,sign,bp) {
new.SIR <- ifelse(((abx<=1) & (sign=="=" | sign=="<" | sign=="<=")), c("S"), c("RI"))
table(new.SIR)
}
SIRcol(df = sal, abx = sal$AXO, sign = sal$AXO.Sign, bp = 1)
## new.SIR
## RI S
## 725 3784
I can also send the new column into the global environment to be manually attached to the old dataset:
SIRcol2 <- function(df,abx,sign,bp) {
new.SIR <<- ifelse(((abx<=1) & (sign=="=" | sign=="<" | sign=="<=")), c("S"), c("RI"))
table(new.SIR)
}
SIRcol2(df = sal, abx = sal$AXO, sign = sal$AXO.Sign, bp = 1)
## new.SIR
## RI S
## 725 3784
However, what I would like to do is actually have the new column join the original dataframe in the global environment, and ideally also have a naming argument so that the column could be named “SIR.AXO” by the user or something more specific than “new.SIR”, so it doesn’t have to be manually renamed.
Below are a few of the methods I’ve tried so far (not as code becuase they trip up the markdown):
SIRcol3 <- function(df,abx,sign,bp,ndf) {
new.SIR <- ifelse(((abx<=1) & (sign=="=" | sign=="<" | sign=="<=")), c("S"), c("RI"))
df$new.SIR <<- new.SIR
table(df$new.SIR)
}
SIRcol3(df = sal, abx = sal$AXO, sign = sal$AXO.Sign, bp = 1)
SIRcol4 <- function(df,abx,sign,bp,ndf) {
new.SIR <- ifelse(((abx<=1) & (sign=="=" | sign=="<" | sign=="<=")), c("S"), c("RI"))
ndf$new.SIR <- new.SIR
table(ndf$new.SIR)
}
SIRcol4(df = sal, abx = sal$AXO, sign = sal$AXO.Sign, bp = 1, ndf = sal)
SIRcol5 <- function(df,abx,sign,bp,ndf) {
new.SIR <- ifelse(((abx<=1) & (sign=="=" | sign=="<" | sign=="<=")), c("S"), c("RI"))
df[new.SIR] <- new.SIR
table(df$new.SIR)
}
SIRcol5(df = sal, abx = sal$AXO, sign = sal$AXO.Sign, bp = 1)