Analyze single-cell data

Allen Goodman and Shantanu Singh

2017-03-12

This vignette demonstrates how to analyze single-cell data from a morphological profiling experiment.

The images were analyzed using CellProfiler.

This vignette assumes that

library(dplyr)
library(ggplot2)
library(magrittr)
library(stringr)

Load data

First, load the data. The data is contained in 4 tables named Image, Cytoplasm, Cells, and Nuclei. The code below joins these tables to create a single table named object.

backend <- file.path(Sys.getenv("HOME"), "Downloads", "110000106771.sqlite")

db <- src_sqlite(path = backend)

image <- tbl(src = db, "image")

object <-
  tbl(src = db, "Cells") %>%
  inner_join(tbl(src = db, "Cytoplasm"),
                    by = c("TableNumber", "ImageNumber", "ObjectNumber")) %>%
  inner_join(tbl(src = db, "Nuclei"),
                    by = c("TableNumber", "ImageNumber", "ObjectNumber"))

object %<>% inner_join(image, by = c("TableNumber", "ImageNumber"))

In this table, the measurement columns start with Nuclei_, Cells_, or Cytoplasm_.

variables <-
  colnames(object) %>%
  stringr::str_subset("^Nuclei_|^Cells_|^Cytoplasm_")

How many variables?

print(length(variables))
## [1] 854

How many cells?

object %>%
  count() %>%
  knitr::kable(caption = "No. of cells")
n
526555

Let’s join the metadata

metadata <-
  readr::read_csv("~/Downloads/metadata.csv") %>%
  select(BARCODE, WELL, WELLTYPE_CODE) %>%
  rename(Image_Metadata_Barcode = BARCODE,
         Image_Metadata_Well = WELL,
         Image_Metadata_Type = WELLTYPE_CODE)

head(metadata)
## # A tibble: 6 × 3
##   Image_Metadata_Barcode Image_Metadata_Well Image_Metadata_Type
##                    <dbl>               <chr>               <chr>
## 1           110000106890                 A01               EMPTY
## 2           110000106890                 A02               EMPTY
## 3           110000106890                 A03               EMPTY
## 4           110000106890                 A04               EMPTY
## 5           110000106890                 A05               EMPTY
## 6           110000106890                 A06               EMPTY
metadata <-
  dplyr::copy_to(db,
                 metadata,
                 indexes = list("Image_Metadata_Type")
                 )
object %<>%
  inner_join(metadata)

Let’s filter the data down to a couple of wells and plot thehistogram of a single feature:

object %>%
  filter(Image_Metadata_Well %in% c("A01", "A24", "A23")) %>%
  select(Image_Metadata_Type,
         Image_Metadata_Well,
         Nuclei_Intensity_IntegratedIntensity_Hoechst) %>%
  collect() %>% {
    ggplot(., aes(Nuclei_Intensity_IntegratedIntensity_Hoechst,
                  fill=interaction(Image_Metadata_Type, Image_Metadata_Well))) +
      scale_x_log10() +
      geom_density(alpha = 0.5) +
      guides(fill = guide_legend(title = "Well"))
    }
plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-5

Feature selection

Next, lets filter the set of features based on various measures of quality

Remove features that have near-zero variance.

futile.logger::flog.info("start")
## INFO [2017-03-12 23:08:58] start
object <-
  cytominer::select(
    population = object,
    variables = variables,
    sample = object %>% filter(Image_Metadata_Well == "A01") %>% collect(),
    operation = "variance_threshold"
  )
## INFO [2017-03-12 23:09:10] excluded:
## INFO [2017-03-12 23:09:10]    Cells_AreaShape_EulerNumber
## INFO [2017-03-12 23:09:10]    Cells_Children_Cytoplasm_Count
## INFO [2017-03-12 23:09:10]    Cytoplasm_AreaShape_EulerNumber
## INFO [2017-03-12 23:09:10]    Nuclei_AreaShape_EulerNumber
## INFO [2017-03-12 23:09:10]    Nuclei_Children_Cells_Count
## INFO [2017-03-12 23:09:10]    Nuclei_Children_Cytoplasm_Count
variables <-
  colnames(object) %>%
  str_subset("^Nuclei_|^Cells_|^Cytoplasm_")

futile.logger::flog.info("end")
## INFO [2017-03-12 23:09:11] end

Filter based on correlation between features. The morphological features extracted contain several highly correlated groups. We want to to prune the set of features, retaining only one feature from each of these highly correlated sets. The function correlation_threshold provides an approximate (greedy) solution to this problem. After excluding the features, no pair of features have a correlation greater than cutoff indicated below.

futile.logger::flog.info("start")
## INFO [2017-03-12 23:09:11] start
object <-
  cytominer::select(
    population = object,
    variables = variables,
    sample = object %>% filter(Image_Metadata_Well == "A01") %>% collect(),
    operation = "correlation_threshold",
    cutoff = 0.95)
## INFO [2017-03-12 23:09:17] excluded:
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_MaxFeretDiameter
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_MeanRadius
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MedianIntensity_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_StdIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_StdIntensityEdge_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_StdIntensityEdge_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Intensity_UpperQuartileIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_UpperQuartileIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Location_Center_X
## INFO [2017-03-12 23:09:17]    Cells_Location_Center_Y
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_FirstClosestObjectNumber_5
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_FirstClosestObjectNumber_Adjacent
## INFO [2017-03-12 23:09:17]    Cells_Number_Object_Number
## INFO [2017-03-12 23:09:17]    Cells_Parent_AllNuclei
## INFO [2017-03-12 23:09:17]    Cells_Parent_Nuclei
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_CellMask_1of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_CellMask_2of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_CellMask_3of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_CellMask_4of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_MeanFrac_CellMask_1of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_MeanFrac_CellMask_2of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_MeanFrac_CellMask_3of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_MeanFrac_CellMask_4of4
## INFO [2017-03-12 23:09:17]    Cells_Texture_AngularSecondMoment_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Contrast_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Contrast_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Correlation_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Correlation_CellMask_10_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Correlation_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Correlation_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Correlation_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceEntropy_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InfoMeas1_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumAverage_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumAverage_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumAverage_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumEntropy_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Variance_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Variance_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Center_Y
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_MaxFeretDiameter
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_MinFeretDiameter
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Perimeter
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Zernike_3_3
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Zernike_5_5
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Zernike_7_7
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Zernike_8_8
## INFO [2017-03-12 23:09:17]    Cytoplasm_AreaShape_Zernike_9_9
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MeanIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MeanIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MedianIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MinIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_StdIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_UpperQuartileIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_UpperQuartileIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_CenterMassIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_Center_X
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_Center_Y
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Location_MaxIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Cytoplasm_Number_Object_Number
## INFO [2017-03-12 23:09:17]    Cytoplasm_Parent_Cells
## INFO [2017-03-12 23:09:17]    Cytoplasm_Parent_Nuclei
## INFO [2017-03-12 23:09:17]    Cytoplasm_RadialDistribution_MeanFrac_CellMask_2of4
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_DifferenceEntropy_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_Entropy_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_InverseDifferenceMoment_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_InverseDifferenceMoment_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_InverseDifferenceMoment_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumAverage_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumAverage_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumAverage_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_CellMask_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_SumEntropy_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Nuclei_AreaShape_Center_X
## INFO [2017-03-12 23:09:17]    Nuclei_AreaShape_Center_Y
## INFO [2017-03-12 23:09:17]    Nuclei_AreaShape_MaxFeretDiameter
## INFO [2017-03-12 23:09:17]    Nuclei_AreaShape_MeanRadius
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MaxIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MaxIntensityEdge_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MaxIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MeanIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MeanIntensityEdge_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MedianIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MinIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_MinIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_StdIntensity_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_UpperQuartileIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Intensity_UpperQuartileIntensity_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Location_CenterMassIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Location_Center_X
## INFO [2017-03-12 23:09:17]    Nuclei_Location_Center_Y
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_X_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_X_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_X_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_Y_Alexa568
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_Y_CellMask
## INFO [2017-03-12 23:09:17]    Nuclei_Location_MaxIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Nuclei_Neighbors_FirstClosestObjectNumber_1
## INFO [2017-03-12 23:09:17]    Nuclei_Neighbors_SecondClosestObjectNumber_1
## INFO [2017-03-12 23:09:17]    Nuclei_Number_Object_Number
## INFO [2017-03-12 23:09:17]    Nuclei_Parent_AllNuclei
## INFO [2017-03-12 23:09:17]    Nuclei_Texture_Entropy_CellMask_10_0
## INFO [2017-03-12 23:09:17]    Nuclei_Texture_Entropy_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Nuclei_Texture_InfoMeas2_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_MinFeretDiameter
## INFO [2017-03-12 23:09:17]    Cells_Correlation_Correlation_Alexa568_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MassDisplacement_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MeanIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MADIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Center_Y
## INFO [2017-03-12 23:09:17]    Cells_Location_CenterMassIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_AngleBetweenNeighbors_5
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_FirstClosestDistance_5
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_SecondClosestDistance_5
## INFO [2017-03-12 23:09:17]    Cells_Location_MaxIntensity_Y_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Neighbors_SecondClosestObjectNumber_5
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_Alexa568_4of4
## INFO [2017-03-12 23:09:17]    Cells_RadialDistribution_FracAtD_Hoechst_4of4
## INFO [2017-03-12 23:09:17]    Cells_Texture_AngularSecondMoment_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_AngularSecondMoment_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Contrast_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceEntropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceEntropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_AngularSecondMoment_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceEntropy_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceVariance_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_DifferenceVariance_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_AngularSecondMoment_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Alexa568_10_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_InverseDifferenceMoment_Alexa568_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Entropy_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumVariance_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_SumVariance_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cells_Texture_Variance_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Area
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Center_X
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Eccentricity
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_MajorAxisLength
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_MinorAxisLength
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Zernike_0_0
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Zernike_2_2
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Zernike_4_4
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Zernike_6_6
## INFO [2017-03-12 23:09:17]    Cells_AreaShape_Zernike_9_7
## INFO [2017-03-12 23:09:17]    Cytoplasm_Correlation_Correlation_Alexa568_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_LowerQuartileIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_LowerQuartileIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MedianIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MADIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MaxIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_Intensity_MaxIntensityEdge_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MedianIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensityEdge_Alexa568
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensityEdge_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensityEdge_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensity_CellMask
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MinIntensity_Hoechst
## INFO [2017-03-12 23:09:17]    Cells_Intensity_MeanIntensity_Alexa568
## INFO [2017-03-12 23:09:17]    Cytoplasm_RadialDistribution_FracAtD_Alexa568_1of4
## INFO [2017-03-12 23:09:17]    Cytoplasm_RadialDistribution_FracAtD_Alexa568_2of4
## INFO [2017-03-12 23:09:17]    Cytoplasm_RadialDistribution_FracAtD_Alexa568_4of4
## INFO [2017-03-12 23:09:17]    Cytoplasm_RadialDistribution_MeanFrac_Alexa568_2of4
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_DifferenceEntropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_DifferenceEntropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_AngularSecondMoment_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_Contrast_Hoechst_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_Contrast_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_Entropy_Alexa568_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_Entropy_CellMask_3_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_DifferenceEntropy_Hoechst_10_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_DifferenceEntropy_Hoechst_5_0
## INFO [2017-03-12 23:09:17]    Cytoplasm_Texture_InverseDifferenceMoment_Alexa568_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_InverseDifferenceMoment_CellMask_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_Entropy_Hoechst_10_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_Entropy_Hoechst_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_Entropy_Alexa568_5_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_SumVariance_Alexa568_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_SumVariance_CellMask_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_Variance_Alexa568_3_0
## INFO [2017-03-12 23:09:18]    Cytoplasm_Texture_Variance_CellMask_3_0
## INFO [2017-03-12 23:09:18]    Nuclei_AreaShape_MinFeretDiameter
## INFO [2017-03-12 23:09:18]    Nuclei_Correlation_Correlation_Alexa568_CellMask
## INFO [2017-03-12 23:09:18]    Cytoplasm_Intensity_IntegratedIntensityEdge_Hoechst
## INFO [2017-03-12 23:09:18]    Cells_Intensity_MaxIntensity_Alexa568
## INFO [2017-03-12 23:09:18]    Cytoplasm_Intensity_StdIntensityEdge_CellMask
## INFO [2017-03-12 23:09:18]    Cells_Intensity_StdIntensity_Alexa568
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_LowerQuartileIntensity_Alexa568
## INFO [2017-03-12 23:09:18]    Cells_Intensity_StdIntensity_CellMask
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_LowerQuartileIntensity_CellMask
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_MeanIntensity_Alexa568
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_MinIntensityEdge_Hoechst
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_MedianIntensity_Alexa568
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_MeanIntensity_CellMask
## INFO [2017-03-12 23:09:18]    Nuclei_Intensity_MADIntensity_Hoechst
## INFO [2017-03-12 23:09:18]    Nuclei_Texture_InfoMeas1_Alexa568_3_0
## INFO [2017-03-12 23:09:18]    Nuclei_Texture_InfoMeas1_Hoechst_5_0
## INFO [2017-03-12 23:09:18]    Nuclei_Texture_SumAverage_CellMask_3_0
## INFO [2017-03-12 23:09:18]    Nuclei_Texture_Entropy_Alexa568_10_0
variables <-
  colnames(object) %>%
  str_subset("^Nuclei_|^Cells_|^Cytoplasm_")

futile.logger::flog.info("end")
## INFO [2017-03-12 23:09:18] end

Normalize with reference to control

We need to normalize the data so that

The default for doing this is standardization. Here, we take all the cells from control wells in the experiment and compute normalizations parameters from that (in this case, just the mean and s.d.) and then apply it to the whole dataset (i.e. the population)

futile.logger::flog.info("start")
## INFO [2017-03-12 23:09:18] start
object %<>% collect(n = Inf)

object <-
  cytominer::normalize(
    population = object,
    variables = variables,
    strata =  c("Image_Metadata_Barcode"),
    sample = object %>% filter(Image_Metadata_Well == "A01")
  )

futile.logger::flog.info("end")
## INFO [2017-03-12 23:13:16] end

In some cases, we may have features that have no variance at all (e.g. Euler number). If these features have not already been removed by this stage, the standardization step will results in all values for that feature being NA ( because s.d. = 0). Lets remove them:

futile.logger::flog.info("start")
## INFO [2017-03-12 23:13:16] start
object <-
  cytominer::select(
      population = object,
      variables = variables,
      operation = "drop_na_columns"
  )

variables <-
  colnames(object) %>%
  str_subset("^Nuclei_|^Cells_|^Cytoplasm_")

futile.logger::flog.info("end")
## INFO [2017-03-12 23:13:21] end

Transform

We may want to tranform the data so that assumptions we may later make about the data distribution are satisfied (e.g. Gaussianity). The default here is generalized_log.

futile.logger::flog.info("start")

object <-
  cytominer::transform(
    population = object,
    variables = variables
  )

futile.logger::flog.info("end")

Summarize measurements per well

Now let’s summarize the data by grouping by well and computing averages.

futile.logger::flog.info("start")
## INFO [2017-03-12 23:13:21] start
profiles <-
  cytominer::aggregate(
    population = object,
    variables = variables,
    strata = c("Image_Metadata_Barcode", "Image_Metadata_Well"),
    operation = "mean"
  )

profiles %<>%
  collect()

futile.logger::flog.info("end")
## INFO [2017-03-12 23:13:23] end

How many wells?

profiles %>%
  count() %>%
  knitr::kable(caption = "No. of wells")
n
384

Let’s plot the relationship between a pair of variables from this summarized data

p <-
  ggplot(profiles, aes(Cells_Intensity_IntegratedIntensity_Hoechst, Nuclei_AreaShape_Area)) +
  geom_point()

print(p)
plot of chunk unnamed-chunk-13

plot of chunk unnamed-chunk-13