In this project, a High Throughput Screening Experiment is studied. 60 plates are analyzed, where 96 different strains are present in each of the plates. Out of 96 strains, the aim is to find improved strains via analyzing different variables affecting strains. Well type, strain location are some of the important factors affecting strains.
In this experimental setup, 20, 60 and 180 hits will be identified based on their priorities.
In a primary screen without replicates, every compound is measured only once. Consequently, we cannot directly estimate the data variability for each compound. Instead, we indirectly estimate data variability by making a strong assumption that every compound has the same variability as a negative reference in a plate in the screen.
#install.packages("readxl")
#install.packages("remotes")
#install.packages("platetools")
library(platetools)
library(readxl)
library(phenoScreen)
library(ggplot2)
library(tibble)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(GAD)
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
df <- read_excel("Sample_T1_Data.xlsx")
head(df)
## # A tibble: 6 x 7
## counter plate strain well_type robot stacker_position value
## <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl>
## 1 1 1 X1 Standard Well Bender 1 10236.
## 2 2 1 X2 Standard Well Bender 1 10370.
## 3 3 1 X3 Standard Well Bender 1 9651.
## 4 4 1 X4 Standard Well Bender 1 10172.
## 5 5 1 X5 Standard Well Bender 1 9493.
## 6 6 1 X6 Standard Well Bender 1 10216.
Initial Observation of Values via Histogram
Analysis of Variance results show that, each value is being highly impacted by their position within the plate.
aov.result <- aov(value ~ robot*stacker_position, data=df)
aov.result
## Call:
## aov(formula = value ~ robot * stacker_position, data = df)
##
## Terms:
## robot stacker_position robot:stacker_position Residuals
## Sum of Squares 132042 1850724747 354018 1211119566
## Deg. of Freedom 2 1 2 5754
##
## Residual standard error: 458.7843
## Estimated effects may be unbalanced
print(summary(aov.result))
## Df Sum Sq Mean Sq F value Pr(>F)
## robot 2 1.320e+05 6.602e+04 0.314 0.731
## stacker_position 1 1.851e+09 1.851e+09 8792.749 <2e-16 ***
## robot:stacker_position 2 3.540e+05 1.770e+05 0.841 0.431
## Residuals 5754 1.211e+09 2.105e+05
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
While, we know that, within each plate, some strains are naturally higher than others, which are parent strains and in each plate, parent strains are in the same spot. So, in order to remove this effect, within each plate, each strain will be subtracted from parent strains. Then, to remove any effect due to spatial positioning, spatial normalization will be applied.
Changing values based on Robot type & stacker position
gg_1 <- ggplot(df, aes(x=value, y=stacker_position)) +
geom_point(aes(col=robot), size=0.4) +
geom_smooth(method="loess", se=F) +
labs(subtitle="Changing values based on Robot type & stacker position",
y="stacker_position",
x="value",
title="Scatterplot")
plot(gg_1)
## `geom_smooth()` using formula 'y ~ x'

Process to subtract values of each plate from max strain of respective plate
df_appended %>% add_column(value_difference = NA)
## # A tibble: 5,760 x 14
## counter plate strain well_type robot stacker_position value a b
## <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <fct> <fct>
## 1 1 1 X1 Standard… Bend… 1 10236. 1 Stan…
## 2 2 1 X2 Standard… Bend… 1 10370. 1 Stan…
## 3 3 1 X3 Standard… Bend… 1 9651. 1 Stan…
## 4 4 1 X4 Standard… Bend… 1 10172. 1 Stan…
## 5 5 1 X5 Standard… Bend… 1 9493. 1 Stan…
## 6 6 1 X6 Standard… Bend… 1 10216. 1 Stan…
## 7 7 1 X7 Standard… Bend… 1 9885. 1 Stan…
## 8 8 1 X8 Standard… Bend… 1 10450. 1 Stan…
## 9 9 1 X9 Standard… Bend… 1 10154. 1 Stan…
## 10 10 1 X10 Standard… Bend… 1 10616. 1 Stan…
## # … with 5,750 more rows, and 5 more variables: c <fct>, d <fct>,
## # max_parent_strain <dbl>, iteration_id <int>, value_difference <lgl>
df_appended$value_difference = df_appended$value-df_appended$max_parent_strain
Building a linear regression model to understand which variables have impact in child strains’ values to eliminate variance caused by them in the next step.
Except the intercept, neither robot or stacker position are significant. As intercept refers to expected mean value of the response variable when Xs are 0, though these explanatory variables are never 0 and intercept is not rich to interpret on, in this case.
If we had built the model before subtracting parent strain maxes, we would have stacker position significant, but that does not mean anything since parent strains are always positioned in a specific spot and that causes bias.
model <- lm(df_appended$value_difference~df_appended$robot+
df_appended$stacker_position)
summary(model)
##
## Call:
## lm(formula = df_appended$value_difference ~ df_appended$robot +
## df_appended$stacker_position)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2550.29 -160.15 85.88 282.09 1668.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -338.720 16.475 -20.559 <2e-16 ***
## df_appended$robotC3P0 20.992 16.058 1.307 0.191
## df_appended$robotTerminator 5.026 16.058 0.313 0.754
## df_appended$stacker_position 1.408 1.137 1.239 0.215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 497.5 on 5756 degrees of freedom
## Multiple R-squared: 0.00059, Adjusted R-squared: 6.915e-05
## F-statistic: 1.133 on 3 and 5756 DF, p-value: 0.3343
ANOVA for the updated model, which is cleaned from spatial positioning.
model_aov <- aov(df_appended$value_difference~df_appended$robot+
df_appended$stacker_position)
summary(model)
##
## Call:
## lm(formula = df_appended$value_difference ~ df_appended$robot +
## df_appended$stacker_position)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2550.29 -160.15 85.88 282.09 1668.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -338.720 16.475 -20.559 <2e-16 ***
## df_appended$robotC3P0 20.992 16.058 1.307 0.191
## df_appended$robotTerminator 5.026 16.058 0.313 0.754
## df_appended$stacker_position 1.408 1.137 1.239 0.215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 497.5 on 5756 degrees of freedom
## Multiple R-squared: 0.00059, Adjusted R-squared: 6.915e-05
## F-statistic: 1.133 on 3 and 5756 DF, p-value: 0.3343
Visualizing hit detection from sample plates.
"Plate #1"
## [1] "Plate #1"
bhit_map(data = df_appended$value_difference[1:96],
well = num_to_well(df_appended$counter[1:96]),
plate = 96, threshold = 1)

"Plate #21"
## [1] "Plate #21"
bhit_map(data = df_appended$value_difference[(1+96*(21-1)):(96+96*(21-1))],
well = num_to_well(df_appended$counter[(1+96*(21-1)):(96+96*(21-1))-96*(21-1)]),
plate = 96, threshold = 1)

"Plate #41"
## [1] "Plate #41"
bhit_map(data = df_appended$value_difference[(1+96*(41-1)):(96+96*(41-1))],
well = num_to_well(df_appended$counter[(1+96*(41-1)):(96+96*(41-1))-96*(41-1)]),
plate = 96, threshold = 1)

Histogram of Value Differences with respect to plate’s max parent strain
hist(df_appended$value_difference, breaks = 50)

Plot of changing value_differences based on Robot type.
gg <- ggplot(df_appended, aes(x=value_difference, y=stacker_position)) +
geom_point(aes(col=robot), size=0.4) +
geom_smooth(method="loess", se=F) +
labs(subtitle="Changing value_differences based on Robot type",
y="stacker_position",
x="value_difference",
title="Scatterplot")
plot(gg)
## `geom_smooth()` using formula 'y ~ x'

20 HITS
hit_list = vector("list", length = 60)
for (i in 1:60){
hit_map = bhit_map(data = df_appended$value_difference[(1+96*(i-1)):(96+96*(i-1))],
well = num_to_well(df_appended$counter[(1+96*(i-1)):(96+96*(i-1))-96*(i-1)]),
plate = 96, threshold = 2.43)
x=data.frame(hit_map$data)[hit_map$data$hit=='hit','well']
hit_list[[i]] <- well_to_num(x)
hit_list[[i]] <- hit_list[[i]]+96*(i-1)
}
hits_20 <- unlist(hit_list)
hits_20
## [1] 992 1016 2067 2219 2251 2290 2524 2703 2854 3100 3301 3544 3636 3794 3850
## [16] 4453 4549 4924 5433 5508
hist(hits_20, breaks = 5)

20 HITS, Visualization
df_hits_20 = df_appended[hits_20,]
plot(ggplot(df_hits_20, aes(x=value_difference, y=stacker_position)) +
geom_point(aes(col=robot), size=6) +
labs(subtitle="Changing value_differences based on Robot type",
y="stacker_position",
x="value_difference",
title="20 Hits Distribution"))

60 HITS
hit_list = vector("list", length = 60)
for (i in 1:60){
hit_map = bhit_map(data = df_appended$value_difference[(1+96*(i-1)):(96+96*(i-1))],
well = num_to_well(df_appended$counter[(1+96*(i-1)):(96+96*(i-1))-96*(i-1)]),
plate = 96, threshold = 1.78)
x=data.frame(hit_map$data)[hit_map$data$hit=='hit','well']
hit_list[[i]] <- well_to_num(x)
hit_list[[i]] <- hit_list[[i]]+96*(i-1)
}
hits_60 <- unlist(hit_list)
hits_60
## [1] 55 66 182 347 390 467 832 914 992 1016 1045 1095 1118 1381 1761
## [16] 1815 1857 2067 2219 2220 2251 2290 2293 2372 2524 2640 2703 2845 2854 2872
## [31] 2901 2966 2985 3100 3301 3416 3520 3521 3544 3606 3636 3794 3850 4447 4453
## [46] 4475 4549 4917 4924 5001 5039 5111 5417 5433 5502 5508 5560 5593 5717
hist(hits_20, breaks = 10)

60 HITS, Visualization
df_hits_60 = df_appended[hits_60,]
plot(ggplot(df_hits_60, aes(x=value_difference, y=stacker_position)) +
geom_point(aes(col=robot), size=6) +
labs(subtitle="Changing value_differences based on Robot type",
y="stacker_position",
x="value_difference",
title="60 Hits Distribution"))

180 HITS
hit_list = vector("list", length = 60)
for (i in 1:60){
hit_map = bhit_map(data = df_appended$value_difference[(1+96*(i-1)):(96+96*(i-1))],
well = num_to_well(df_appended$counter[(1+96*(i-1)):(96+96*(i-1))-96*(i-1)]),
plate = 96, threshold = 1.325)
x=data.frame(hit_map$data)[hit_map$data$hit=='hit','well']
hit_list[[i]] <- well_to_num(x)
hit_list[[i]] <- hit_list[[i]]+96*(i-1)
}
hits_180 <- unlist(hit_list)
hits_180
## [1] 40 55 63 64 66 85 105 115 146 182 293 297 299 338 347
## [16] 390 467 468 483 527 539 566 772 832 914 919 974 985 992 994
## [31] 1006 1016 1045 1048 1095 1118 1135 1164 1212 1292 1381 1554 1562 1717 1761
## [46] 1815 1828 1844 1857 1882 1904 1949 1956 2046 2067 2177 2219 2220 2251 2269
## [61] 2290 2293 2332 2337 2372 2524 2545 2640 2674 2703 2728 2756 2845 2854 2869
## [76] 2872 2889 2901 2924 2962 2963 2964 2966 2985 3000 3012 3044 3045 3081 3094
## [91] 3100 3111 3181 3195 3243 3287 3301 3416 3422 3448 3503 3510 3520 3521 3535
## [106] 3538 3539 3544 3606 3636 3654 3794 3804 3850 3892 3898 4022 4135 4226 4287
## [121] 4300 4306 4321 4325 4350 4438 4447 4453 4454 4475 4481 4493 4549 4575 4635
## [136] 4712 4718 4787 4790 4836 4917 4924 5001 5039 5045 5080 5105 5111 5165 5174
## [151] 5225 5229 5264 5300 5320 5337 5360 5362 5392 5414 5417 5433 5434 5443 5458
## [166] 5459 5463 5495 5498 5502 5508 5535 5541 5560 5571 5576 5593 5645 5714 5717
hist(hits_180, breaks = 10)

180 HITS, Visualization
df_hits_180 = df_appended[hits_180,]
plot(ggplot(df_hits_180, aes(x=value_difference, y=stacker_position)) +
geom_point(aes(col=robot), size=4) +
labs(subtitle="Changing value_differences based on Robot type",
y="stacker_position",
x="value_difference",
title="180 Hits Distribution"))
