R Notebook: Provides reproducible analysis for Association of mutant sequence, barcode, and inferred fluorescence phenotype in the following manuscript:
Citation: Lippert LB, Hinton SR, Holston A, Romanowicz KJ, Plesa C. Characterizing Sequence-Function Relationships in Chimeric DcuS/EnvZ Histidine Kinases. In Prep. 2026.
GitHub Repository: https://github.com/PlesaLab/DcuSEnvZ
This pipeline processes barcode counts sequenced from twelve fluroescence-activated cell sorting (FACS) samples of the synTCS-MutLib strain in the no ligand condition. Sequence data was generated using the Illumina NextSeq platform using paired-end sequencing read amplicons. Raw sequencing data was pre-processed on a high-performance computer using the Makefile script available in the project GitHub repository. Here, pre-processed barcode-count files for all twelve no ligand samples are merged, adjusted abundances are computed using the protocol from Biswas et al., (2021), and a median activation score- here, “median bin”- is calculated for each observed barcode. This analysis is replicated for the Fumarate and Aspartate samples.
The following R packages must be installed prior to loading into the R session. See the Reproducibility tab for a complete list of packages and their versions used in this workflow.
# Make a vector of required packages
required.packages <- c("devtools", "knitr", "patchwork", "tidyverse", "ggplot2", "dplyr", "tidyr", "magrittr", "stringr", "seqinr")
# Load required packages
lapply(required.packages, library, character.only = TRUE)
This section is based on the R file: “Counts_to_Median_Bin_NoLigand.R”. It describes how to load all of the pre-existing barcode data necessary for downstream analysis. The end result is a .CSV file containing the total set of observed barcodes, their associated nucleotide and amino acid sequences, activation (“median bin”) scores, the lower and upper indices of their activation bins.
# Function to load barcode/BC reads
read_collapsed_file <- function(filename, sam_name) {
df <- read.table(file=filename, sep="\t", header=FALSE)
colnames(df) <- c("BC", paste0(sam_name, "reads"), "collapsedBCs")
return(df)
}
NL1_bc <- read_collapsed_file("./Final_BC/NL1_S1_collapse_d1.tsv", "NL1")
NL2_bc <- read_collapsed_file("./Final_BC/NL2_S2_collapse_d1.tsv", "NL2")
NL3_bc <- read_collapsed_file("./Final_BC/NL3_S3_collapse_d1.tsv", "NL3")
NL4_bc <- read_collapsed_file("./Final_BC/NL4_S4_collapse_d1.tsv", "NL4")
NL5_bc <- read_collapsed_file("./Final_BC/NL5_S5_collapse_d1.tsv", "NL5")
NL6_bc <- read_collapsed_file("./Final_BC/NL6_S6_collapse_d1.tsv", "NL6")
NL7_bc <- read_collapsed_file("./Final_BC/NL7_S7_collapse_d1.tsv", "NL7")
NL8_bc <- read_collapsed_file("./Final_BC/NL8_S8_collapse_d1.tsv", "NL8")
NL9_bc <- read_collapsed_file("./Final_BC/NL9_S9_collapse_d1.tsv", "NL9")
NL10_bc <- read_collapsed_file("./Final_BC/NL10_S10_collapse_d1.tsv", "NL10")
NL11_bc <- read_collapsed_file("./Final_BC/NL11_S11_collapse_d1.tsv", "NL11")
NL12_bc <- read_collapsed_file("./Final_BC/NL12_S12_collapse_d1.tsv", "NL12")
Put all BCs into one dataframe for each condition
NL_allBC <- NL1_bc %>%
select(BC) %>%
rbind(., NL2_bc %>%
select(BC)) %>%
rbind(., NL3_bc %>%
select(BC)) %>%
rbind(., NL4_bc %>%
select(BC)) %>%
rbind(., NL5_bc %>%
select(BC)) %>%
rbind(., NL6_bc %>%
select(BC)) %>%
rbind(., NL7_bc %>%
select(BC)) %>%
rbind(., NL8_bc %>%
select(BC)) %>%
rbind(., NL9_bc %>%
select(BC)) %>%
rbind(., NL10_bc %>%
select(BC)) %>%
rbind(., NL11_bc %>%
select(BC)) %>%
rbind(., NL12_bc %>%
select(BC)) %>%
distinct()
Add counts for barcodes
NL_allBC <- left_join(NL_allBC, NL1_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL2_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL3_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL4_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL5_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL6_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL7_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL8_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL9_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL10_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL11_bc %>% select(-collapsedBCs), by="BC")
NL_allBC <- left_join(NL_allBC, NL12_bc %>% select(-collapsedBCs), by="BC")
Filter all barcodes by length.
NL_allBC <- NL_allBC %>%
filter(str_length(BC) >= 24)
If not in a set, force to 0
NL_allBC$NL1reads[is.na(NL_allBC$NL1reads)] <- 0
NL_allBC$NL2reads[is.na(NL_allBC$NL2reads)] <- 0
NL_allBC$NL3reads[is.na(NL_allBC$NL3reads)] <- 0
NL_allBC$NL4reads[is.na(NL_allBC$NL4reads)] <- 0
NL_allBC$NL5reads[is.na(NL_allBC$NL5reads)] <- 0
NL_allBC$NL6reads[is.na(NL_allBC$NL6reads)] <- 0
NL_allBC$NL7reads[is.na(NL_allBC$NL7reads)] <- 0
NL_allBC$NL8reads[is.na(NL_allBC$NL8reads)] <- 0
NL_allBC$NL9reads[is.na(NL_allBC$NL9reads)] <- 0
NL_allBC$NL10reads[is.na(NL_allBC$NL10reads)] <- 0
NL_allBC$NL11reads[is.na(NL_allBC$NL11reads)] <- 0
NL_allBC$NL12reads[is.na(NL_allBC$NL12reads)] <- 0
rm(NL1_bc,NL2_bc,NL3_bc,NL4_bc,NL5_bc,NL6_bc,NL7_bc,NL8_bc,NL9_bc,NL10_bc,NL11_bc,NL12_bc)
Without doing this, the bins are weighted equally (pct. total pop = 8.3333%), when in actuality, the population was not perfectly distributed across bins. Population values were calculated by dividing the number of events recorded per bin by total population in a Python script, synTCS-MutLib_FACS_Bin_Population_Fractions.ipynb.
NL_bin1_NormVal = 8.3333333/9.41426834
NL_bin2_NormVal = 8.3333333/11.69436563
NL_bin3_NormVal = 8.3333333/9.03931901
NL_bin4_NormVal = 8.3333333/8.17794893
NL_bin5_NormVal = 8.3333333/10.11349818
NL_bin6_NormVal = 8.3333333/11.01540332
NL_bin7_NormVal = 8.3333333/9.92095663
NL_bin8_NormVal = 8.3333333/8.08674503
NL_bin9_NormVal = 8.3333333/8.84677746
NL_bin10_NormVal = 8.3333333/7.78273206
NL_bin11_NormVal = 8.3333333/4.73246859
NL_bin12_NormVal = 8.3333333/1.17551682
NL_allBC <- NL_allBC %>%
mutate(NL1reads_corrected = NL1reads / NL_bin1_NormVal,
NL2reads_corrected = NL2reads / NL_bin2_NormVal,
NL3reads_corrected = NL3reads / NL_bin3_NormVal,
NL4reads_corrected = NL4reads / NL_bin4_NormVal,
NL5reads_corrected = NL5reads / NL_bin5_NormVal,
NL6reads_corrected = NL6reads / NL_bin6_NormVal,
NL7reads_corrected = NL7reads / NL_bin7_NormVal,
NL8reads_corrected = NL8reads / NL_bin8_NormVal,
NL9reads_corrected = NL9reads / NL_bin9_NormVal,
NL10reads_corrected = NL10reads / NL_bin10_NormVal,
NL11reads_corrected = NL11reads / NL_bin11_NormVal,
NL12reads_corrected = NL12reads / NL_bin12_NormVal
)
Biswas et al., 2021: Compute a relative abundance table, R, by dividing the columns of C by their sums. The columns of R sum to 1.
NL1_total <- sum(NL_allBC$NL1reads_corrected)
NL2_total <- sum(NL_allBC$NL2reads_corrected)
NL3_total <- sum(NL_allBC$NL3reads_corrected)
NL4_total <- sum(NL_allBC$NL4reads_corrected)
NL5_total <- sum(NL_allBC$NL5reads_corrected)
NL6_total <- sum(NL_allBC$NL6reads_corrected)
NL7_total <- sum(NL_allBC$NL7reads_corrected)
NL8_total <- sum(NL_allBC$NL8reads_corrected)
NL9_total <- sum(NL_allBC$NL9reads_corrected)
NL10_total <- sum(NL_allBC$NL10reads_corrected)
NL11_total <- sum(NL_allBC$NL11reads_corrected)
NL12_total <- sum(NL_allBC$NL12reads_corrected)
NL_allBC_R <- NL_allBC %>%
mutate(NL1_norm=NL1reads_corrected/NL1_total,
NL2_norm=NL2reads_corrected/NL2_total,
NL3_norm=NL3reads_corrected/NL3_total,
NL4_norm=NL4reads_corrected/NL4_total,
NL5_norm=NL5reads_corrected/NL5_total,
NL6_norm=NL6reads_corrected/NL6_total,
NL7_norm=NL7reads_corrected/NL7_total,
NL8_norm=NL8reads_corrected/NL8_total,
NL9_norm=NL9reads_corrected/NL9_total,
NL10_norm=NL10reads_corrected/NL10_total,
NL11_norm=NL11reads_corrected/NL11_total,
NL12_norm=NL12reads_corrected/NL12_total) %>%
select(BC, NL1_norm, NL2_norm, NL3_norm, NL4_norm, NL5_norm, NL6_norm, NL7_norm, NL8_norm, NL9_norm, NL10_norm, NL11_norm, NL12_norm) %>%
dplyr::rename(barcode=BC)
# Check, sum of all values in each column should equal 1
sum(NL_allBC_R$NL10_norm)
## [1] 1
Calculate the total reads for each barcode across all bins
NL_allBC_total_counts <- NL_allBC %>%
mutate(BC_SorTotReads = NL1reads_corrected + NL2reads_corrected + NL3reads_corrected + NL4reads_corrected + NL5reads_corrected + NL6reads_corrected + NL7reads_corrected + NL8reads_corrected + NL9reads_corrected + NL10reads_corrected + NL11reads_corrected + NL12reads_corrected) %>%
select(BC, BC_SorTotReads) %>%
dplyr::rename(barcode=BC)
Read in barcode-nucleotide sequence mapping file.
consensus_gene <- read.csv(file="./input_files/consensus_gene.csv",head=TRUE,sep=",")
consensus_gene %>% select(description) %>% distinct() %>% nrow() # 951065 unique barcodes
## [1] 951065
consensus_gene2 <- consensus_gene %>%
select(description,sequence) %>%
dplyr::rename(barcode=description,NTseq=sequence)
# convert to strings
str(consensus_gene2)
## 'data.frame': 951065 obs. of 2 variables:
## $ barcode: chr "AAAAAACTGCCAAGGTAAAAAACT" "AAAAAAGTGACATGTCCCTTATTA" "AAAAACCCGTATGCGGAACTACAG" "AAAAACGCACAACCCAATAGTGTA" ...
## $ NTseq : chr "AGACATTCATTCCCTACCGCATGTTACGCAAACGTCCGATGAAATTGAGTACCACAGTGATCTTAATGGTCAGTGCGGTACTGTTCTCGGTGCTATTGGTGGTGCATCTGA"| __truncated__ "AGACATTCATTGCCCTACCGCATGTTACGCAAACGTCCGATGAAATTGAGTACCACAGTGATCTTAATGGTCAGTGCGGTACTGTTCTCGGTGCTATTGGTGGTGCATCTG"| __truncated__ "AGACATTCATTGCCCTACCGCATGTTACGCAAACGTCCGATGAAATTGAGTACCACAGTGATCTTAATGGTCAGTGCGGTACTGTTCTCGGTGCTATTGGTGGTGCATCTG"| __truncated__ "AGACATTCATTGCCCTACCGCATGTTACGCAAACGTCCGATGAAATTGAGTACCACAGTGATCTTAATGGTCAGTGCGGTACTGTTCTCGGTGCTATTGGTGGTGCATCTG"| __truncated__ ...
Merge from all bins and create a master list of unique variants and BCs
tgood_NL <- right_join(consensus_gene2, NL_allBC_R, join_by(barcode)) %>%
mutate(NTlen=nchar(as.character(NTseq)))
sum(is.na(tgood_NL$NTseq)) # the outputted number of barcodes in NL_allBC_R don't have a nucleotide sequence associated with them
## [1] 369627
# Normalization check, sum of all values in each column should still be 1
sum(tgood_NL$NL11_norm)
## [1] 1
Mapping check: how many barcodes have more than 1 variant?
consensus_gene_sum <- consensus_gene2 %>%
group_by(barcode) %>%
summarise(count=n())
consensus_gene_sum %>%
filter(count>1) %>%
nrow(.)
## [1] 0
Make a list of BCs with only 1 variant and filter the dataset to keep only barcodes which have been mapped.
bcgood_NL <- consensus_gene_sum %>%
filter(count==1) %>%
select(-count)
# Filter to only keep barcodes which appear
NL_allBC_R_NTfilter <- tgood_NL %>%
semi_join(bcgood_NL,by="barcode") %>%
left_join(NL_allBC_total_counts,by="barcode")
NL_allBC_R_NTfilter_totals <- NL_allBC_R_NTfilter %>%
group_by(barcode) %>%
summarise(NTseq=NTseq,
NL1_t=sum(NL1_norm),
NL2_t=sum(NL2_norm),
NL3_t=sum(NL3_norm),
NL4_t=sum(NL4_norm),
NL5_t=sum(NL5_norm),
NL6_t=sum(NL6_norm),
NL7_t=sum(NL7_norm),
NL8_t=sum(NL8_norm),
NL9_t=sum(NL9_norm),
NL10_t=sum(NL10_norm),
NL11_t=sum(NL11_norm),
NL12_t=sum(NL12_norm),
BC_SorTotReads=BC_SorTotReads)
Load file of synTCS-MutLib variants - amino acid sequences - and filter to only keep translated sequences
consensus_prot <- read.csv(file="./input_files/consensus_prot_with_PreSortBC.csv",head=TRUE,sep=",")
NL_allBC_R_AAfilter <- NL_allBC_R_NTfilter_totals %>%
left_join(consensus_prot %>% dplyr::rename(barcode=BC),by="barcode")
# Some sequences have mutations which place a stop codon at the beginning and some barcodes were not mapped to amino acid sequences; filter these out.
NL_allBC_R_AAfilter <- NL_allBC_R_AAfilter %>%
filter(!is.na(seq))
Replace read counts of NA and 0 with an arbitrary value of 0.1 for presort1 and presort2 libraries
NL_allBC_R_AAfilter$presort1reads[is.na(NL_allBC_R_AAfilter$presort1reads)] <- 0.1
NL_allBC_R_AAfilter$presort1reads[NL_allBC_R_AAfilter$presort1reads == 0] <- 0.1
NL_allBC_R_AAfilter$presort2reads[is.na(NL_allBC_R_AAfilter$presort2reads)] <- 0.1
NL_allBC_R_AAfilter$presort2reads[NL_allBC_R_AAfilter$presort2reads == 0] <- 0.1
Biswas et al., 2021: Divide each column of R element-wise by the input relative abundance vector (relative abundance of variants in the library before flow cytometry) to obtain a fold-change table, F.
NL_allBC_F <- NL_allBC_R_AAfilter %>%
mutate(NL1_fc=NL1_t/presort2_norm,
NL2_fc=NL2_t/presort2_norm,
NL3_fc=NL3_t/presort2_norm,
NL4_fc=NL4_t/presort2_norm,
NL5_fc=NL5_t/presort2_norm,
NL6_fc=NL6_t/presort2_norm,
NL7_fc=NL7_t/presort2_norm,
NL8_fc=NL8_t/presort2_norm,
NL9_fc=NL9_t/presort2_norm,
NL10_fc=NL10_t/presort2_norm,
NL11_fc=NL11_t/presort2_norm,
NL12_fc=NL12_t/presort2_norm)
Biswas et al., 2021: Divide each row of F by its sum to obtain a table of adjusted abundances, A. Each row of A sums to 1.
NL_allBC_A <- NL_allBC_F %>%
mutate(rowsum=NL1_fc+NL2_fc+NL3_fc+NL4_fc+NL5_fc+NL6_fc+NL7_fc+NL8_fc+NL9_fc+NL10_fc+NL11_fc+NL12_fc) %>%
mutate(NL1=NL1_fc/rowsum,
NL2=NL2_fc/rowsum,
NL3=NL3_fc/rowsum,
NL4=NL4_fc/rowsum,
NL5=NL5_fc/rowsum,
NL6=NL6_fc/rowsum,
NL7=NL7_fc/rowsum,
NL8=NL8_fc/rowsum,
NL9=NL9_fc/rowsum,
NL10=NL10_fc/rowsum,
NL11=NL11_fc/rowsum,
NL12=NL12_fc/rowsum)
# create functions to compute upper and lower index
maxcs = function(x, output){
return(max(which(c(x[1],x[2],x[3],x[4],x[5],x[6],x[7],x[8],x[9],x[10],x[11],x[12]) < 0.5)))
}
mincs = function(x, output){
return(min(which(c(x[1],x[2],x[3],x[4],x[5],x[6],x[7],x[8],x[9],x[10],x[11],x[12]) >= 0.5)))
}
# Compute cumulative sum across adjusted barcode abundances for all bins to estimate median bin
NL_allBC_CS <- NL_allBC_A %>%
rowwise() %>%
mutate(
cumulative_p = list(cumsum(c(NL1,NL2,NL3,NL4,NL5,NL6,NL7,NL8,NL9,NL10,NL11,NL12))),
lower_index = max(which(cumsum(c(NL1,NL2,NL3,NL4,NL5,NL6,NL7,NL8,NL9,NL10,NL11,NL12)) < 0.5)),
upper_index = min(which(cumsum(c(NL1,NL2,NL3,NL4,NL5,NL6,NL7,NL8,NL9,NL10,NL11,NL12)) >= 0.5)),
median = ifelse(
is.infinite(lower_index),
1,
lower_index + (0.5 - unlist(cumulative_p)[lower_index]) /
(unlist(cumulative_p)[upper_index] - unlist(cumulative_p)[lower_index])
)
)
## Warning: There were 2215 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `lower_index = max(...)`.
## ℹ In row 61.
## Caused by warning in `max()`:
## ! no non-missing arguments to max; returning -Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 2214 remaining warnings.
NL_allBC_CS_toCSV <- NL_allBC_CS %>%
select(barcode, NTseq, seq, lower_index, upper_index, median, BC_SorTotReads, presort1reads, presort2reads)
write.csv(NL_allBC_CS_toCSV,
"./output_files/DcuS_NoLigand_bin_distribution-byBC.csv", row.names = FALSE)
The session information is provided for full reproducibility.
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.1 (2024-06-14)
## os macOS 15.7.3
## system x86_64, darwin20
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Los_Angeles
## date 2026-05-19
## pandoc 3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
## quarto 1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## ade4 1.7-23 2025-02-14 [1] CRAN (R 4.4.1)
## bslib 0.10.0 2026-01-26 [1] CRAN (R 4.4.1)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
## cli 3.6.5 2025-04-23 [1] CRAN (R 4.4.1)
## devtools * 2.4.6 2025-10-03 [1] CRAN (R 4.4.1)
## dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.4.0)
## digest 0.6.39 2025-11-19 [1] CRAN (R 4.4.1)
## dplyr * 1.2.0 2026-02-03 [1] CRAN (R 4.4.1)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
## evaluate 1.0.5 2025-08-27 [1] CRAN (R 4.4.1)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
## forcats * 1.0.1 2025-09-25 [1] CRAN (R 4.4.1)
## fs 1.6.6 2025-04-12 [1] CRAN (R 4.4.1)
## generics 0.1.4 2025-05-09 [1] CRAN (R 4.4.1)
## ggplot2 * 4.0.2 2026-02-03 [1] CRAN (R 4.4.1)
## glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
## gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
## hms 1.1.4 2025-10-17 [1] CRAN (R 4.4.1)
## htmltools 0.5.9 2025-12-04 [1] CRAN (R 4.4.1)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
## jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.4.1)
## knitr * 1.51 2025-12-20 [1] CRAN (R 4.4.1)
## lifecycle 1.0.5 2026-01-08 [1] CRAN (R 4.4.1)
## lubridate * 1.9.5 2026-02-04 [1] CRAN (R 4.4.1)
## magrittr * 2.0.4 2025-09-12 [1] CRAN (R 4.4.1)
## MASS 7.3-65 2025-02-28 [1] CRAN (R 4.4.1)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## otel 0.2.0 2025-08-29 [1] CRAN (R 4.4.1)
## patchwork * 1.3.2 2025-08-25 [1] CRAN (R 4.4.1)
## pillar 1.11.1 2025-09-17 [1] CRAN (R 4.4.1)
## pkgbuild 1.4.8 2025-05-26 [1] CRAN (R 4.4.1)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
## pkgload 1.5.0 2026-02-03 [1] CRAN (R 4.4.1)
## purrr * 1.2.1 2026-01-09 [1] CRAN (R 4.4.1)
## R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.1)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
## Rcpp 1.1.1 2026-01-10 [1] CRAN (R 4.4.1)
## readr * 2.1.6 2025-11-14 [1] CRAN (R 4.4.1)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
## rlang 1.1.7 2026-01-09 [1] CRAN (R 4.4.1)
## rmarkdown 2.30 2025-09-28 [1] CRAN (R 4.4.1)
## rstudioapi 0.18.0 2026-01-16 [1] CRAN (R 4.4.1)
## S7 0.2.1 2025-11-14 [1] CRAN (R 4.4.1)
## sass 0.4.10 2025-04-11 [1] CRAN (R 4.4.1)
## scales 1.4.0 2025-04-24 [1] CRAN (R 4.4.1)
## seqinr * 4.2-36 2023-12-08 [1] CRAN (R 4.4.0)
## sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.4.1)
## stringi 1.8.7 2025-03-27 [1] CRAN (R 4.4.1)
## stringr * 1.6.0 2025-11-04 [1] CRAN (R 4.4.1)
## tibble * 3.3.1 2026-01-11 [1] CRAN (R 4.4.1)
## tidyr * 1.3.2 2025-12-19 [1] CRAN (R 4.4.1)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
## timechange 0.4.0 2026-01-29 [1] CRAN (R 4.4.1)
## tzdb 0.5.0 2025-03-15 [1] CRAN (R 4.4.1)
## usethis * 3.2.1 2025-09-06 [1] CRAN (R 4.4.1)
## vctrs 0.7.1 2026-01-23 [1] CRAN (R 4.4.1)
## withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
## xfun 0.56 2026-01-18 [1] CRAN (R 4.4.1)
## yaml 2.3.12 2025-12-10 [1] CRAN (R 4.4.1)
##
## [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
## * ── Packages attached to the search path.
##
## ──────────────────────────────────────────────────────────────────────────────
Biswas, S.; Khimulya, G.; Alley, E. C.; Esvelt, K. M.; Church, G. M. Low-N Protein Engineering with Data-Efficient Deep Learning. Nat. Methods 2021, 18 (4), 389–396. https://doi.org/10.1038/s41592-021-01100-y.