The data (‘Mice Protein Expression Data Set’) was collected online (UCI Machine Learning), consisting of 1080 observations over 82 variables. These data consisted of continuous numerical protein expression data, and categorical data including mouse identification and variable exposure. After data exploration, visualization and data preprocessing, two modes of statistical analysis were employed. The first was linear regression analysis. Prior to fitting any regression, a scatter plot assessing the bi variate relationship between the relevant proteins was inspected. In the three relationships considered, the scatter plots demonstrated evidence of a positive linear relationship. A linear regression model was fitted to predict the dependent variable, ITSN1_N, using measurements of DYRK1A_N over all classes of mice. The overall regression model was statistically significant, F(1,1063)=6779.2, p<.001, and explained 86.5% of the variability in ITSN1_N.A linear regression model was fitted to predict the dependent variable, pERK_N, using measurements of DYRK1A_N over all classes of mice. The overall regression model was statistically significant, F(1,1063)=6976.3, p<.001, and explained 86.8% of the variability in pERK_N. A linear regression model was fitted to predict the dependent variable, pERK_N, using measurements of BRAF_N over all classes of mice. The overall regression model was statistically significant, F(1,1063)=4529.1, p<.001, and explained 81.0% of the variability in pERK_N. For all three regression models the 95% CI for the intercept did not pass through 0, which simply indicated the presence of a background, or base, level of protein. For the class c-CS-s, 7 paired t-tests were performed. For ARC_N and BRAF_N (n = 120), the mean difference was found to be -0.256 (SD = 0.101), with t(df=119) = 27.724, p < 0.001, 95% [0.238, 0.274]. There was a statistically significant mean difference. For BRAF_N and DYRK1A_N (n = 120), the mean difference was found to be -0.047 (SD = 0.058), with t(df=119) = 8.871, p < 0.001, 95% [0.037, 0.058]. There was a statistically significant mean difference. For DYRK1A_N and ITSN1_N (n = 120), the mean difference was found to be -0.189 (SD = 0.061), with t(df=119) = 34.216, p < 0.001, 95% [0.178, 0.200]. There was a statistically significant mean difference. For ITSN1_N and pERK_N (n = 120), the mean difference was found to be 0.006 (SD = 0.094), with t(df=119) = -0.696, p > 0.005, 95% [-0.023, 0.011].There was no statistically significant mean difference. For pERK_N and pNUMB_N (n = 120), the mean difference was found to be 0.200 (SD = 0.132), with t(df=119) = -16.569, p < 0.001, 95% [-0.224, -0.176]. There was a statistically significant mean difference. For pNUMB_N and S6_N (n = 120), the mean difference was found to be -0.048 (SD = 0.136), with t(df=119) = 3.892, p < 0.001, 95% [0.024, 0.073]. There was a statistically significant mean difference. For S6_N and SOD1_N (n = 120), the mean difference was found to be 0.125 (SD = 0.131), with t(df=119) = -10.45, p < 0.001, 95% [-0.148, -0.101]. There was a statistically significant mean difference. QQ plots were not used due to all sample and sub sample sizes (n > 30, normality assumed). The granovo.ds()
function was used to provide dependent sample assessment plots and summary stats.
The first statistical analysis will involve linear regression, with the following hypothesis testing:
The second statistical analysis will involve the paired-sample t-test, with the following hypothesis:
The chosen data set ‘Mice Protein Expression Data Set’ was generated from experiments by Higuera et al1 and Ahmed et al2. These projects aimed to understand the impacts of Down Syndrome on learning through analysis of protein expression in mice. Down Syndrome (DS) has a prevalence globally of 1 in a 1000 live human births, and is the most common genetically defined cause of intellectual disabilities1,3. DS in humans is caused by the presence of an additional chromosome 21, referred to as trisomy3. Protein expression is disrupted by human trisomy 21, leading to the physical and intellectual manifestations associated with DS. Due to its prevalence and health implications, a strong imperative exists to further understand and treat the condition.
Davisson et al., successfully manipulated a mouse genome to produce several models of DS in rodents4. Known as ‘Ts65Dn’ this mouse is the best-characterized of the DS rodent models5. In generating the ‘Mice Protein Expression Data Set’ Higuera et al employed Ts65Dn and normal mice in experimental and control groups, exposing them to a range of variables. The rodents were then euthanized and their cortex protein levels were analysed in a quantitative fashion. From three binary variables, eight classes of mice were used to generate the data set. The eight classes are summarized in Figure 1.
The work of Higuera et al and Ahmed et al aimed to assess the efficacy of pharmacotherapies for treatment of DS - an avenue of treatment that has never been successfully implemented. Ahmed et al statistically evaluated the ‘Mice Protein Expression Data Set’ through the use of the Wilcoxon test, with the subsequent production of ‘self-organizing maps’ to assess proteins critical to learning. Higuera et al also used pair-wise comparisons to evaluate the significance of difference between pairs of the classes, specifically to assess protein dynamics.
In this analysis of the ‘Mice Protein Expression Data Set’ linear relationships between protein expression levels will be statistically evaluated, followed by pair sample t-tests on protein expression for mice in the class ‘c-CS-s’.
Fig 1. Classes of mice. (A) There are eight classes of mice based on genotype (control, c, and trisomy, t), stimulation to learn (Context-Shock, CS, and Shock-Context, SC) and treatment (saline, s, and memantine, m). Learning outcome indicates the response to learning for each class. (B) Number of mice in each class. (C) Format of protein expression data: rows are individual mice, and columns, P 1 … P77, are the measured levels of the 77 proteins6.
The libraries rmarkdown
, GGally
, car
, ggplot2
, dplyr
, gsheet
, gridExtra
,htmltools
, reshape2
, granova
, psychometric
, Hmisc
, qwraps
& outliers
were employed during the analysis.
The Excel file ‘Data_Cortex_Nuclear.xls’ was downloaded from UCI’s machine learning repository7 into Google Drive (s3644119 at RMIT University) and the imported into the R Studio interactive environment as “ds”.
url <- "https://docs.google.com/spreadsheets/d/158zPd4XCYoaXNzHA2OJPkOmzSWYjwFoEuOiMzV1O-rU/edit?usp=sharing"
ds <- gsheet2tbl(url)
The initial data checking is summarized below (functioning code in Appendix).
Function | Purpose / Outcome |
---|---|
summary(ds) |
Confirming the successful importation of the data set. Size was 1080 observations over 82 variables. The 77 protein attributes are continuous numerical data, with NaN values present. |
colnames(ds) |
Confirming the column names present in the data set. The first column is MouseID (12 - 15 observations per mouse, per protein). The final four columns contain the categorical data about each mouse. |
MouseID <- ds$MouseID |
Each mouseID is appended with the suffix ’_n’ (where 1 <- n <- 15). For example MouseID ‘309’ is recorded as 309_1, 309_2, … 309_15. |
The native data set will be preprocessed in four iterations for use in our analysis. These data were collected such that they are all on the same scale (approximately 0 - 4) and are intended for direct comparison without any transformation (for example a value of 0.1 for protein A and 1.0 for protein B shows that protein B is expressed 10 times that of protein A). The only transformation performed on these data was column subtraction, for pair t-test analysis (see Data Preprocessing - Iteration 3).
During data collection, per each individual mouse, per each protein, 12 - 15 expression measurements were taken in vitro. As mentioned in the summary above, these were denoted as ’_n’ (where 1 <- n <- 15). For the purpose of this investigation, the _n notation is not required, and will be stripped from the data. The function table(head(ds$MouseID, 150))
samples the first 10 mice in the data set, showing successful removal of _n notation.
MouseID <- gsub("\\_.*", "", ds$MouseID)
ds$MouseID <- MouseID
table(head(ds$MouseID, 150))
309 311 320 321 322 3415 3499 3507 3520 3521
15 15 15 15 15 15 15 15 15 15
Through the use of self-organizing feature maps, Higuera et al were able to determine the most discriminant proteins in comparison of the control mice classes and the trisomic mice classes1. For this analysis, these tabulated results were assessed in a semi-quantitative fashion8 and the proteins ARC_N, BRAF_N, DYRK1A_N, ITSN1_N, pERK_N, pNUMB_N, S6_N & SOD1_N were selected. A vector of these identifiers was created as target_attributes_proteins
. These target proteins, along with the categorical tags Genotype, Treatment, Behavior and class were added to another vector, target_attributes_all
. A new data set, ds_filtered
was then created, with the native data set observations for the variables of the target_attributes_all
vector.
target_attributes_proteins <- c("ARC_N", "BRAF_N", "DYRK1A_N", "ITSN1_N", "pERK_N",
"pNUMB_N", "S6_N", "SOD1_N")
target_attributes_all <- c("MouseID", "ARC_N", "BRAF_N", "DYRK1A_N", "ITSN1_N",
"pERK_N", "pNUMB_N", "S6_N", "SOD1_N", "Genotype", "Treatment", "Behavior",
"class")
ds_filtered <- ds[, target_attributes_all]
The categorical values in ds_filtered
were checked with unique(ds_filtered[,10:13])
, not typographical errors were present, as below.
categorical_counts <- ds_filtered[, 10:13]
unique(categorical_counts)
The categorical variable counts are summarized in the below (functions executed in Appendix).
Function | Counts | Total |
---|---|---|
table(ds_filtered$Genotype) |
Control = 570, Ts65Dn = 510 | 1080 |
table(ds_filtered$Treatment) |
Memantine = 570, Saline = 510 | 1080 |
table(ds_filtered$Behavior) |
C/S = 525, S/C = 555 | 1080 |
table(ds_filtered$class) |
c-CS-m = 150, c-CS-s = 135, c-SC-m = 150, c-SC-s = 135, t-CS-m = 135, t-CS-s = 105, t-SC-m = 135, t-SC-s = 135 | 1080 |
The coordinate locations for NaN values are displayed below (the result output was transposed for readability). The expression ds_filtered[c(988, 989, 990),c(1, 10, 11, 12, 13)]
was used to identify the mouse/mice and corresponding categories.
t(which(is.na(ds_filtered), arr.ind = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
row 988 989 990 988 989 990 988 989 990 988 989 990 988 989 990 988 989 990
col 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 9 9 9
ds_filtered[c(988, 989, 990), c(1, 10, 11, 12, 13)]
Through use of table(ds_filtered$class)
we can see that n = 105 measurements were taken for class ‘t-SC-s’. As outlined in figure 1, ‘t-SC-c’ corresponds to mice with the Ts65Dn genotype (trisomy, t), not stimulated to learn (Shock Context, SC) and saline treatment (saline, s). The mean value for class ‘t-SC-s’ (n = 105), will be used to fill the NaN values.
tSCs_filter <- ds_filtered$class == "t-SC-s"
NaN_column_mask <- c(3, 4, 5, 6, 7, 9)
NaN_3426_tSCs <- ds_filtered[tSCs_filter, NaN_column_mask]
summary_3426_tSCs <- summary(NaN_3426_tSCs)
summary_3426_tSCs
BRAF_N DYRK1A_N ITSN1_N pERK_N pNUMB_N SOD1_N
Min. :0.1940 Min. :0.1633 Min. :0.3284 Min. :0.1492 Min. :0.2352 Min. :0.3841
1st Qu.:0.2456 1st Qu.:0.2781 1st Qu.:0.4715 1st Qu.:0.2807 1st Qu.:0.2974 1st Qu.:0.5534
Median :0.2844 Median :0.3479 Median :0.5519 Median :0.3552 Median :0.3591 Median :0.6467
Mean :0.2867 Mean :0.3375 Mean :0.5491 Mean :0.3577 Mean :0.3577 Mean :0.7214
3rd Qu.:0.3264 3rd Qu.:0.3916 3rd Qu.:0.6249 3rd Qu.:0.4412 3rd Qu.:0.4072 3rd Qu.:0.8075
Max. :0.4306 Max. :0.5006 Max. :0.8362 Max. :0.5562 Max. :0.5203 Max. :1.6105
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
write.csv(ds_filtered, file = "mouse_data_filtered.csv")
ds_filled_NaN <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1FKFINItxyosgFhwiihpT3MSuMTxR-LaYPNG_XNctwlU/edit?usp=sharing")
The filtered mouse data was exported using write.csv
and added to Google Sheets. The NaN values were filled as follows BRAF_N = 0.2867, DYRK1A_N = 0.3375, ITSN1_N = 0.5491, pERK_N = 0.3577, pNUMB_N = 0.3577 and SOD1_N = 0.7214. The data set was then imported as ds_Filled_NaN
. These data were checked for NaN as follows.
NaN_check <- ds_filled_NaN[tSCs_filter, NaN_column_mask]
summary_NaN_check <- summary(NaN_check)
summary_NaN_check
BRAF_N DYRK1A_N ITSN1_N pERK_N pNUMB_N SOD1_N
Min. :0.1940 Min. :0.1633 Min. :0.3284 Min. :0.1492 Min. :0.2352 Min. :0.3841
1st Qu.:0.2467 1st Qu.:0.2785 1st Qu.:0.4755 1st Qu.:0.2823 1st Qu.:0.2996 1st Qu.:0.5536
Median :0.2867 Median :0.3466 Median :0.5491 Median :0.3560 Median :0.3577 Median :0.6571
Mean :0.2867 Mean :0.3375 Mean :0.5491 Mean :0.3577 Mean :0.3577 Mean :0.7214
3rd Qu.:0.3252 3rd Qu.:0.3900 3rd Qu.:0.6243 3rd Qu.:0.4404 3rd Qu.:0.4070 3rd Qu.:0.8067
Max. :0.4306 Max. :0.5006 Max. :0.8362 Max. :0.5562 Max. :0.5203 Max. :1.6105
Mean Fill Summary | BRAF_N | DYRK1A_N | ITSN1_N | pERK_N | pNUMB_N | SOD1_N |
---|---|---|---|---|---|---|
Native Data Mean | 0.2867 | 0.3375 | 0.5491 | 0.3577 | 0.3577 | 0.7214 |
Preprocessed Data Mean | 0.2867 | 0.3375 | 0.5491 | 0.3577 | 0.3577 | 0.7214 |
Delta Mean | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
The preprocessed data will first be visualized with a scatter plot matrix using the ggpairs
function. The upper section of the plot will be an xy contour, the lower an xy scatter and the diagonal density plots. A scatter plot matrix is a suitable at this stage as it gives a quantitative overview of these data and provides direction for further analysis.
scatter_matrix <- ggpairs(data = ds_filled_NaN, columns = target_attributes_proteins,
mapping = aes(colour = class), diag = list(continuous = wrap("densityDiag",
alpha = I(0.1)), mapping = ggplot2::aes(fill = class)), upper = list(continuous = wrap("density",
alpha = I(0.5)), combo = "box"), lower = list(continuous = wrap("points",
alpha = I(0.4), size = 0.1)))
scatter_matrix_adjusted <- scatter_matrix + theme(panel.spacing = grid::unit(0,
"lines"), axis.text = element_text(size = rel(0.5)), strip.text = element_text(face = "bold",
size = 7), strip.text.x = element_text(margin = margin(0.1, 0, 0.1, 0, "cm")),
strip.text.y = element_text(margin = margin(0, 0.1, 0, 0.1, "cm")))
scatter_matrix_adjusted + theme(panel.border = element_rect(fill = NA, colour = "grey30",
size = 0.2))
The resultant plot, shows six of the proteins to be correlated (qualitatively) in a linear fashion;
This will provide direction for subsequent statistical analysis.
Other trends noticeable include inversely proportional relationships (e.g., SOD1_N vs BRAF_N), weak correlation (e.g., pNUMB_N vs BRAF_N) and no correlation (e.g., pNUMB_N vs ARC_N). Also of note, in the columns ARC_N, BRAF_N, DYRK1A_N, ITSN1_N and pERK_N, a series of points (amber colour) are seen to lay outside the main clusters in every individual scatter plot. Bar charts for two target proteins exhibiting these outliers were plotted.
The resultant plots show clusters of outliers for the class, ‘c-CS-s’.
BRAF_N_Hist <- ggplot(data = ds_filled_NaN, aes(x = ds_filled_NaN$class, y = BRAF_N,
fill = ds_filled_NaN$class)) + geom_boxplot() + theme(legend.title = element_blank(),
axis.title = element_text(size = 8)) + labs(title = NULL, x = "BRAF_N Expression",
y = NULL) + guides(fill = FALSE) + coord_flip()
pERK_N_Hist <- ggplot(data = ds_filled_NaN, aes(x = ds_filled_NaN$class, y = pERK_N,
fill = ds_filled_NaN$class)) + geom_boxplot() + theme(legend.title = element_blank(),
axis.title = element_text(size = 8)) + labs(title = NULL, x = "pERK_N Expression",
y = NULL) + guides(fill = FALSE) + coord_flip()
grid.arrange(BRAF_N_Hist, pERK_N_Hist, nrow = 2)
The most extreme outlier for each protein was determined using the outlier()
function, in conjunction with filtering.
ds_filled_NaN[ds_filled_NaN$ARC_N == outlier(ds_filled_NaN$ARC_N), c(1, 2, 10,
11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$BRAF_N == outlier(ds_filled_NaN$BRAF_N), c(1, 3,
10, 11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$DYRK1A_N == outlier(ds_filled_NaN$DYRK1A_N), c(1,
4, 10, 11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$ITSN1_N == outlier(ds_filled_NaN$ITSN1_N), c(1,
5, 10, 11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$pERK_N == outlier(ds_filled_NaN$pERK_N), c(1, 6,
10, 11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$pNUMB_N == outlier(ds_filled_NaN$pNUMB_N), c(1,
7, 10, 11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$S6_N == outlier(ds_filled_NaN$S6_N), c(1, 8, 10,
11, 12, 13)]
ds_filled_NaN[ds_filled_NaN$SOD1_N == outlier(ds_filled_NaN$SOD1_N), c(1, 9,
10, 11, 12, 13)]
Max. Outlier Summary | ARC_N | BRAF_N | DYRK1A_N | ITSN1_N | pERK_N | pNUMB_N | S6_N | SOD1_N |
---|---|---|---|---|---|---|---|---|
Value | 0.0673 | 2.1334 | 2.5164 | 2.6027 | 3.5667 | 0.6311 | 0.8226 | 1.8729 |
MouseID | 3415 | 3484 | 3484 | 3484 | 3484 | 3497 | 3483 | 3411 |
Genotype | Control | Control | Control | Control | Control | Control | Ts65Dn | Ts65Dn |
Treatment | Memantine | Saline | Saline | Saline | Saline | Saline | Saline | Memantine |
Behaviour | C/S | C/S | C/S | C/S | C/S | C/S | C/S | S/C |
Class | c-CS-m | c-CS-m | c-CS-m | c-CS-s | c-CS-s | c-CS-s | t-CS-s | t-SC-m |
ds_filled_NaN <- ds_filled_NaN[!(ds_filled_NaN$MouseID == 3484),]
The bar plots were an excellent source of qualitative information. As demonstrated for ITSN1_N and SOD1_N, below.
Protein | Qualitative Observation |
---|---|
ITSN1_N | The distributions and corresponding mean values occur in a series of pairs. It appears that first two components of each class (genotype and learning stimuli) are deterministic in the expression of ITSN1_N. |
SOD1_N | Expression of SOD1_N is determined predominantly by the learning stimuli applied. In the four classes of CS, the expression levels have tight distributions and similar means. In the four classes of SC, the expression levels have wide distributions, with varied means. |
Of the 6 previously mentioned pairs of qualitative linear relationships, three will be used for statistical analysis.
For each pair of target proteins, all classes will be analyzed together. If linear relationships are statistically significant, this will indicate the dependent relationships are maintained over the changes in genotype, treatment and behaviour.
target_attributes_linear <- c("BRAF_N", "DYRK1A_N", "ITSN1_N", "pERK_N")
ITSN1_N_VS_DYRK1A_N <- lm(ITSN1_N ~ DYRK1A_N, data = ds_filled_NaN)
ITSN1_N_VS_DYRK1A_N %>% summary()
Call:
lm(formula = ITSN1_N ~ DYRK1A_N, data = ds_filled_NaN)
Residuals:
Min 1Q Median 3Q Max
-0.54552 -0.04070 -0.00268 0.03774 0.24657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.178032 0.005445 32.70 <2e-16 ***
DYRK1A_N 1.037442 0.012600 82.34 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0643 on 1063 degrees of freedom
Multiple R-squared: 0.8645, Adjusted R-squared: 0.8643
F-statistic: 6779 on 1 and 1063 DF, p-value: < 2.2e-16
ITSN1_N_VS_DYRK1A_N %>% anova()
Analysis of Variance Table
Response: ITSN1_N
Df Sum Sq Mean Sq F value Pr(>F)
DYRK1A_N 1 28.0244 28.0244 6779.2 < 2.2e-16 ***
Residuals 1063 4.3943 0.0041
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ITSN1_N_VS_DYRK1A_N %>% summary() %>% coef()
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1780317 0.005445164 32.69538 7.787868e-163
DYRK1A_N 1.0374418 0.012600150 82.33567 0.000000e+00
ITSN1_N_VS_DYRK1A_N %>% confint()
2.5 % 97.5 %
(Intercept) 0.1673472 0.1887162
DYRK1A_N 1.0127178 1.0621658
2 * pt(q = 82.33567, df = 1063, lower.tail = FALSE)
[1] 0
As p < .05, we reject H0. The 95% CI for b was found to [1.013, 1.062]. This 95% CI does not capture H0, therefore it was rejected.
There was a statistically significant positive relationship between the expression of DYRK1A_N and ITSN1_N.
ITSN1_N_VS_DYRK1A_N_XY <- ggplot(data = ds_filled_NaN, aes(x = DYRK1A_N, y = ITSN1_N,
colour = class))
ITSN1_N_VS_DYRK1A_N_XY + geom_point(alpha = I(0.9), size = 0.8) + stat_smooth(method = "lm",
col = "black", size = 0.5) + labs(title = "ITSN1_N versus DYRK1A_N Protein Expression Scatter Plot",
y = "ITSN1_N", x = "DYRK1A_N") + theme(legend.title = element_blank(), axis.title = element_text(size = 8),
title = element_text(size = 9)) + annotate("text", x = 0.6, y = 1.25, label = "R Squared") +
annotate("text", x = 0.6, y = 1.15, label = format(summary(lm(ITSN1_N ~
DYRK1A_N, data = ds_filled_NaN))$r.squared, digits = 3))
pERK_N_VS_DYRK1A_N <- lm(pERK_N ~ DYRK1A_N, data = ds_filled_NaN)
pERK_N_VS_DYRK1A_N %>% summary()
Call:
lm(formula = pERK_N ~ DYRK1A_N, data = ds_filled_NaN)
Residuals:
Min 1Q Median 3Q Max
-0.94073 -0.05411 -0.00875 0.05405 0.31963
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.048086 0.007251 -6.631 5.28e-11 ***
DYRK1A_N 1.401485 0.016779 83.524 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08562 on 1063 degrees of freedom
Multiple R-squared: 0.8678, Adjusted R-squared: 0.8677
F-statistic: 6976 on 1 and 1063 DF, p-value: < 2.2e-16
pERK_N_VS_DYRK1A_N %>% anova()
Analysis of Variance Table
Response: pERK_N
Df Sum Sq Mean Sq F value Pr(>F)
DYRK1A_N 1 51.143 51.143 6976.3 < 2.2e-16 ***
Residuals 1063 7.793 0.007
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
pERK_N_VS_DYRK1A_N %>% summary() %>% coef()
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.048086 0.007251214 -6.631441 5.280216e-11
DYRK1A_N 1.401485 0.016779364 83.524345 0.000000e+00
pERK_N_VS_DYRK1A_N %>% confint()
2.5 % 97.5 %
(Intercept) -0.06231432 -0.03385768
DYRK1A_N 1.36856094 1.43440981
2 * pt(q = 83.524345, df = 1063, lower.tail = FALSE)
[1] 0
As p < .05, we reject H0. The 95% CI for b was found to [1.369, 1.434]. This 95% CI does not capture H0, therefore it was rejected.
There was a statistically significant positive relationship between the expression of pERK_N and DYRK1A_N.
pERK_N_VS_DYRK1A_N_XY <- ggplot(data = ds_filled_NaN, aes(x = DYRK1A_N, y = pERK_N,
colour = class))
pERK_N_VS_DYRK1A_N_XY + geom_point(alpha = I(0.9), size = 0.8) + stat_smooth(method = "lm",
col = "black", size = 0.5) + labs(title = "pERK_N versus DYRK1A_N Protein Expression Scatter Plot",
y = "pERK_N", x = "DYRK1A_N") + theme(legend.title = element_blank(), axis.title = element_text(size = 8),
title = element_text(size = 9)) + annotate("text", x = 0.6, y = 1.3, label = "R Squared") +
annotate("text", x = 0.6, y = 1.2, label = format(summary(lm(pERK_N ~ DYRK1A_N,
data = ds_filled_NaN))$r.squared, digits = 3))
pERK_N_VS_BRAF_N <- lm(pERK_N ~ BRAF_N, data = ds_filled_NaN)
pERK_N_VS_BRAF_N %>% summary()
Call:
lm(formula = pERK_N ~ BRAF_N, data = ds_filled_NaN)
Residuals:
Min 1Q Median 3Q Max
-0.30232 -0.06317 -0.00958 0.04958 0.42462
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.066629 0.009219 -7.228 9.37e-13 ***
BRAF_N 1.629637 0.024215 67.299 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1027 on 1063 degrees of freedom
Multiple R-squared: 0.8099, Adjusted R-squared: 0.8097
F-statistic: 4529 on 1 and 1063 DF, p-value: < 2.2e-16
pERK_N_VS_BRAF_N %>% anova()
Analysis of Variance Table
Response: pERK_N
Df Sum Sq Mean Sq F value Pr(>F)
BRAF_N 1 47.733 47.733 4529.1 < 2.2e-16 ***
Residuals 1063 11.203 0.011
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
pERK_N_VS_BRAF_N %>% summary() %>% coef()
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06662867 0.009218592 -7.22764 9.369867e-13
BRAF_N 1.62963697 0.024214953 67.29879 0.000000e+00
pERK_N_VS_BRAF_N %>% confint()
2.5 % 97.5 %
(Intercept) -0.08471737 -0.04853996
BRAF_N 1.58212243 1.67715150
2 * pt(q = 67.29879, df = 1063, lower.tail = FALSE)
[1] 0
As p < .05, we reject H0. The 95% CI for b was found to [1.582, 1.677]. This 95% CI does not capture H0, therefore it was rejected.
There was a statistically significant positive relationship between the expression of pERK_N and BRAF_N.
The following columns were created using the mutate()
function. All classes were then dropped except ‘c-CS-s’ (this corresponds to the control group with normal genotype, saline treatment and CS learning).
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_ARC_BRAF = BRAF_N - ARC_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_BRAF_DYRK1A = DYRK1A_N - BRAF_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_DYRK1A_ITSN1 = ITSN1_N - DYRK1A_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_ITSN1_pERK = pERK_N - ITSN1_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_pERK_pNUMB = pNUMB_N - pERK_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_pNUMB_S6 = S6_N - pNUMB_N)
ds_filled_NaN <- ds_filled_NaN %>% mutate(d_S6_SOD1 = SOD1_N - S6_N)
ds_cCSs <- ds_filled_NaN[(ds_filled_NaN$class == "c-CS-s"), ]
Analysis was conducted for 7 pairs of protein expression levels in all mice from the class ‘c-CS-s’. These were considered dependent (paired) samples as the mice from each individual class were all exposed to the same set of variables.
qt(p = 0.025, df = 119)
[1] -1.9801
t.test(ds_cCSs$d_ARC_BRAF, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_ARC_BRAF
t = 27.724, df = 119, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.2377760 0.2743528
sample estimates:
mean of x
0.2560644
granova.ds(data.frame(ds_cCSs$ARC_N, ds_cCSs$BRAF_N), xlab = "ARC_N", ylab = "BRAF_N")
Summary Stats
n 120.000
mean(x) 0.114
mean(y) 0.370
mean(D=x-y) -0.256
SD(D) 0.101
ES(D) -2.531
r(x,y) -0.096
r(x+y,d) -0.983
LL 95%CI -0.274
UL 95%CI -0.238
t(D-bar) -27.724
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = 27.72 is more extreme than + 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [0.24 0.27], which does not contain capture H0). There was a statistically significant mean difference between the expression of ARC_N and BRAF_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_BRAF_DYRK1A, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_BRAF_DYRK1A
t = 8.8709, df = 119, p-value = 8.741e-15
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.03654614 0.05754950
sample estimates:
mean of x
0.04704782
granova.ds(data.frame(ds_cCSs$BRAF_N, ds_cCSs$DYRK1A_N), xlab = "BRAF_N", ylab = "DYRK1A_N")
Summary Stats
n 120.000
mean(x) 0.370
mean(y) 0.417
mean(D=x-y) -0.047
SD(D) 0.058
ES(D) -0.810
r(x,y) 0.828
r(x+y,d) 0.028
LL 95%CI -0.058
UL 95%CI -0.037
t(D-bar) -8.871
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = 8.87 is more extreme than + 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [0.04 0.06], which does not contain capture H0). There was a statistically significant mean difference between the expression of BRAF_N and DYRK1A_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_DYRK1A_ITSN1, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_DYRK1A_ITSN1
t = 34.216, df = 119, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.1784135 0.2003316
sample estimates:
mean of x
0.1893725
granova.ds(data.frame(ds_cCSs$DYRK1A_N, ds_cCSs$ITSN1_N), xlab = "DYRK1A_N",
ylab = "ITSN1_N")
Summary Stats
n 120.000
mean(x) 0.417
mean(y) 0.606
mean(D=x-y) -0.189
SD(D) 0.061
ES(D) -3.123
r(x,y) 0.887
r(x+y,d) -0.493
LL 95%CI -0.200
UL 95%CI -0.178
t(D-bar) -34.216
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = 34.22 is more extreme than + 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [0.18 0.20], which does not contain capture H0). There was a statistically significant mean difference between the expression of DYRK1A_N and ITSN1_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_ITSN1_pERK, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_ITSN1_pERK
t = -0.69643, df = 119, p-value = 0.4875
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.02289979 0.01098278
sample estimates:
mean of x
-0.005958507
granova.ds(data.frame(ds_cCSs$ITSN1_N, ds_cCSs$pERK_N), xlab = "ITSN1_N", ylab = "pERK_N")
Summary Stats
n 120.000
mean(x) 0.606
mean(y) 0.600
mean(D=x-y) 0.006
SD(D) 0.094
ES(D) 0.064
r(x,y) 0.835
r(x+y,d) -0.458
LL 95%CI -0.011
UL 95%CI 0.023
t(D-bar) 0.696
df.t 119.000
pval.t 0.488
The t* values are ± 1.98. As t = -0.70, t is less extreme than than -1.98, H0 cannot be rejected on critical value. The 95% CI of the mean difference is found to be [-0.02 0.01], H0 cannot be rejected on 95% CI. As p > 0.05, we fail to reject H0. There was not a statistically significant mean difference between the expression of ITSN1_N and pERK_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_pERK_pNUMB, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_pERK_pNUMB
t = -16.569, df = 119, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.2236380 -0.1758912
sample estimates:
mean of x
-0.1997646
granova.ds(data.frame(ds_cCSs$pERK_N, ds_cCSs$pNUMB_N), xlab = "pERK_N", ylab = "pNUMB_N")
Summary Stats
n 120.000
mean(x) 0.600
mean(y) 0.400
mean(D=x-y) 0.200
SD(D) 0.132
ES(D) 1.513
r(x,y) 0.667
r(x+y,d) 0.793
LL 95%CI 0.176
UL 95%CI 0.224
t(D-bar) 16.569
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = -16.57 is more extreme than - 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [-0.22 -0.18], which does not contain capture H0). There was a statistically significant mean difference between the expression of pERK_N and pNUMB_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_pNUMB_S6, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_pNUMB_S6
t = 3.892, df = 119, p-value = 0.0001644
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.02379530 0.07308288
sample estimates:
mean of x
0.04843909
granova.ds(data.frame(ds_cCSs$pNUMB_N, ds_cCSs$S6_N), xlab = "pNUMB_N", ylab = "S6_N")
Summary Stats
n 120.000
mean(x) 0.400
mean(y) 0.449
mean(D=x-y) -0.048
SD(D) 0.136
ES(D) -0.355
r(x,y) 0.123
r(x+y,d) -0.513
LL 95%CI -0.073
UL 95%CI -0.024
t(D-bar) -3.892
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = 3.89 is more extreme than + 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [0.02 0.07], which does not contain capture H0). There was a statistically significant mean difference between the expression of pNUMB_N and S6_N for the c-CS-s (n = 120).
t.test(ds_cCSs$d_S6_SOD1, mu = 0, alternative = "two.sided")
One Sample t-test
data: ds_cCSs$d_S6_SOD1
t = -10.45, df = 119, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.1482443 -0.1010126
sample estimates:
mean of x
-0.1246285
granova.ds(data.frame(ds_cCSs$S6_N, ds_cCSs$SOD1_N), xlab = "S6_N", ylab = "SOD1_N")
Summary Stats
n 120.000
mean(x) 0.449
mean(y) 0.324
mean(D=x-y) 0.125
SD(D) 0.131
ES(D) 0.954
r(x,y) 0.101
r(x+y,d) 0.709
LL 95%CI 0.101
UL 95%CI 0.148
t(D-bar) 10.450
df.t 119.000
pval.t 0.000
The t* values are ± 1.98. As t = -10.45 is more extreme than - 1.98, H0 should be rejected (additionally the 95% CI of the mean difference is found to be [-0.15 -0.10], which does not contain capture H0). There was a statistically significant mean difference between the expression of S6_N and SOD1_N for the c-CS-s (n = 120).
The following was determined over the duration of the investigation,
This is signficant as it shows the relationship between these expression of proteins is nto affected by the variables of the experiment.
The strengths of the investigation included high sample size (overall and in sub-categories), use of various visualizations for data exploration and interprettation, different statistical analysis methods. Limitations included the complex nature of the data set. For example only 7 paired t-tests could reasonably be performed. For the class c-CS-s, this doesn’t represent every possible combination. Comparisons between classes inside individual target proteins would have also been a rich source of data.
The ‘Mice Protein Expression Data Set’, [82 attributes by 1080 observations] was imported into the RStudio interactive environment, explored, cleaned and analysed. Analysis was performed using paired t-tests and linear regression techniques. A series of visualizations, including scatter plot matrices, box plots and dependent sample assessment plots were used. It was determined that several of the proteins had statistically significant linear relationships. Only one pair of classes tested had no statistically significant mean difference, indicating a high degree of expression variablitity for over the class c-CS-s.
summary(ds)
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N
Length:1080 Min. :0.1453 Min. :0.2454 Min. :0.1152 Min. :1.331 Min. :1.738
Class :character 1st Qu.:0.2881 1st Qu.:0.4734 1st Qu.:0.2874 1st Qu.:2.057 1st Qu.:3.156
Mode :character Median :0.3664 Median :0.5658 Median :0.3166 Median :2.297 Median :3.761
Mean :0.4258 Mean :0.6171 Mean :0.3191 Mean :2.297 Mean :3.844
3rd Qu.:0.4877 3rd Qu.:0.6980 3rd Qu.:0.3482 3rd Qu.:2.528 3rd Qu.:4.440
Max. :2.5164 Max. :2.6027 Max. :0.4972 Max. :3.758 Max. :8.483
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N pERK_N
Min. :0.06324 Min. :0.06404 Min. :1.344 Min. :0.1128 Min. :0.429 Min. :0.1492
1st Qu.:0.20575 1st Qu.:0.16459 1st Qu.:2.480 1st Qu.:0.1908 1st Qu.:1.204 1st Qu.:0.3374
Median :0.23118 Median :0.18230 Median :3.327 Median :0.2106 Median :1.356 Median :0.4436
Mean :0.23317 Mean :0.18185 Mean :3.537 Mean :0.2126 Mean :1.429 Mean :0.5459
3rd Qu.:0.25726 3rd Qu.:0.19742 3rd Qu.:4.482 3rd Qu.:0.2346 3rd Qu.:1.561 3rd Qu.:0.6633
Max. :0.53905 Max. :0.31707 Max. :7.464 Max. :0.3062 Max. :6.113 Max. :3.5667
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
pJNK_N PKCA_N pMEK_N pNR1_N pNR2A_N pNR2B_N
Min. :0.05211 Min. :0.1914 Min. :0.05682 Min. :0.5002 Min. :0.2813 Min. :0.3016
1st Qu.:0.28124 1st Qu.:0.2818 1st Qu.:0.24429 1st Qu.:0.7435 1st Qu.:0.5903 1st Qu.:1.3813
Median :0.32133 Median :0.3130 Median :0.27739 Median :0.8211 Median :0.7196 Median :1.5637
Mean :0.31351 Mean :0.3179 Mean :0.27503 Mean :0.8258 Mean :0.7269 Mean :1.5620
3rd Qu.:0.34871 3rd Qu.:0.3523 3rd Qu.:0.30345 3rd Qu.:0.8985 3rd Qu.:0.8486 3rd Qu.:1.7485
Max. :0.49343 Max. :0.4740 Max. :0.45800 Max. :1.4082 Max. :1.4128 Max. :2.7240
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
pPKCAB_N pRSK_N AKT_N BRAF_N CAMKII_N CREB_N
Min. :0.5678 Min. :0.09594 Min. :0.06442 Min. :0.1439 Min. :0.2130 Min. :0.1136
1st Qu.:1.1683 1st Qu.:0.40414 1st Qu.:0.59682 1st Qu.:0.2643 1st Qu.:0.3309 1st Qu.:0.1618
Median :1.3657 Median :0.44060 Median :0.68247 Median :0.3267 Median :0.3603 Median :0.1796
Mean :1.5253 Mean :0.44285 Mean :0.68224 Mean :0.3785 Mean :0.3634 Mean :0.1805
3rd Qu.:1.8859 3rd Qu.:0.48210 3rd Qu.:0.75969 3rd Qu.:0.4136 3rd Qu.:0.3939 3rd Qu.:0.1957
Max. :3.0614 Max. :0.65096 Max. :1.18217 Max. :2.1334 Max. :0.5862 Max. :0.3196
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
ELK_N ERK_N GSK3B_N JNK_N MEK_N TRKA_N
Min. :0.4977 Min. :1.132 Min. :0.1511 Min. :0.0463 Min. :0.1472 Min. :0.1987
1st Qu.:0.9444 1st Qu.:1.992 1st Qu.:1.0231 1st Qu.:0.2204 1st Qu.:0.2471 1st Qu.:0.6171
Median :1.0962 Median :2.401 Median :1.1598 Median :0.2449 Median :0.2734 Median :0.7050
Mean :1.1734 Mean :2.474 Mean :1.1726 Mean :0.2416 Mean :0.2728 Mean :0.6932
3rd Qu.:1.3236 3rd Qu.:2.873 3rd Qu.:1.3097 3rd Qu.:0.2633 3rd Qu.:0.3008 3rd Qu.:0.7742
Max. :2.8029 Max. :5.198 Max. :2.4758 Max. :0.3872 Max. :0.4154 Max. :1.0016
NA's :18 NA's :3 NA's :3 NA's :3 NA's :7 NA's :3
RSK_N APP_N Bcatenin_N SOD1_N MTOR_N P38_N
Min. :0.1074 Min. :0.2356 Min. :1.135 Min. :0.2171 Min. :0.2011 Min. :0.2279
1st Qu.:0.1496 1st Qu.:0.3663 1st Qu.:1.827 1st Qu.:0.3196 1st Qu.:0.4104 1st Qu.:0.3520
Median :0.1667 Median :0.4020 Median :2.115 Median :0.4441 Median :0.4525 Median :0.4078
Mean :0.1684 Mean :0.4048 Mean :2.147 Mean :0.5426 Mean :0.4525 Mean :0.4153
3rd Qu.:0.1845 3rd Qu.:0.4419 3rd Qu.:2.424 3rd Qu.:0.6958 3rd Qu.:0.4880 3rd Qu.:0.4663
Max. :0.3051 Max. :0.6327 Max. :3.681 Max. :1.8729 Max. :0.6767 Max. :0.9333
NA's :3 NA's :3 NA's :18 NA's :3 NA's :3 NA's :3
pMTOR_N DSCR1_N AMPKA_N NR2B_N pNUMB_N RAPTOR_N
Min. :0.1666 Min. :0.1553 Min. :0.2264 Min. :0.1848 Min. :0.1856 Min. :0.1948
1st Qu.:0.6835 1st Qu.:0.5309 1st Qu.:0.3266 1st Qu.:0.5149 1st Qu.:0.3128 1st Qu.:0.2761
Median :0.7608 Median :0.5767 Median :0.3585 Median :0.5635 Median :0.3474 Median :0.3049
Mean :0.7590 Mean :0.5852 Mean :0.3684 Mean :0.5653 Mean :0.3571 Mean :0.3158
3rd Qu.:0.8415 3rd Qu.:0.6344 3rd Qu.:0.4008 3rd Qu.:0.6145 3rd Qu.:0.3927 3rd Qu.:0.3473
Max. :1.1249 Max. :0.9164 Max. :0.7008 Max. :0.9720 Max. :0.6311 Max. :0.5267
NA's :3 NA's :3 NA's :3 NA's :3 NA's :3 NA's :3
TIAM1_N pP70S6_N NUMB_N P70S6_N pGSK3B_N pPKCG_N
Min. :0.2378 Min. :0.1311 Min. :0.1180 Min. :0.3441 Min. :0.09998 Min. :0.5988
1st Qu.:0.3720 1st Qu.:0.2811 1st Qu.:0.1593 1st Qu.:0.8267 1st Qu.:0.14925 1st Qu.:1.2968
Median :0.4072 Median :0.3777 Median :0.1782 Median :0.9313 Median :0.16021 Median :1.6646
Mean :0.4186 Mean :0.3945 Mean :0.1811 Mean :0.9431 Mean :0.16121 Mean :1.7066
3rd Qu.:0.4560 3rd Qu.:0.4811 3rd Qu.:0.1972 3rd Qu.:1.0451 3rd Qu.:0.17174 3rd Qu.:2.1130
Max. :0.7221 Max. :1.1292 Max. :0.3166 Max. :1.6800 Max. :0.25321 Max. :3.3820
NA's :3 NA's :3
CDK5_N S6_N ADARB1_N AcetylH3K9_N RRP1_N BAX_N
Min. :0.1812 Min. :0.1302 Min. :0.5291 Min. :0.05253 Min. :-0.06201 Min. :0.07233
1st Qu.:0.2726 1st Qu.:0.3167 1st Qu.:0.9305 1st Qu.:0.10357 1st Qu.: 0.14902 1st Qu.:0.16817
Median :0.2938 Median :0.4010 Median :1.1283 Median :0.15042 Median : 0.16210 Median :0.18074
Mean :0.2924 Mean :0.4292 Mean :1.1974 Mean :0.21648 Mean : 0.16663 Mean :0.17931
3rd Qu.:0.3125 3rd Qu.:0.5349 3rd Qu.:1.3802 3rd Qu.:0.26965 3rd Qu.: 0.17741 3rd Qu.:0.19158
Max. :0.8174 Max. :0.8226 Max. :2.5399 Max. :1.45939 Max. : 0.61238 Max. :0.24114
ARC_N ERBB4_N nNOS_N Tau_N GFAP_N GluR3_N
Min. :0.06725 Min. :0.1002 Min. :0.09973 Min. :0.09623 Min. :0.08611 Min. :0.1114
1st Qu.:0.11084 1st Qu.:0.1470 1st Qu.:0.16645 1st Qu.:0.16799 1st Qu.:0.11277 1st Qu.:0.1957
Median :0.12163 Median :0.1564 Median :0.18267 Median :0.18863 Median :0.12046 Median :0.2169
Mean :0.12152 Mean :0.1565 Mean :0.18130 Mean :0.21049 Mean :0.12089 Mean :0.2219
3rd Qu.:0.13196 3rd Qu.:0.1654 3rd Qu.:0.19857 3rd Qu.:0.23394 3rd Qu.:0.12772 3rd Qu.:0.2460
Max. :0.15875 Max. :0.2087 Max. :0.26074 Max. :0.60277 Max. :0.21362 Max. :0.3310
GluR4_N IL1B_N P3525_N pCASP9_N PSD95_N SNCA_N
Min. :0.07258 Min. :0.2840 Min. :0.2074 Min. :0.8532 Min. :1.206 Min. :0.1012
1st Qu.:0.10889 1st Qu.:0.4756 1st Qu.:0.2701 1st Qu.:1.3756 1st Qu.:2.079 1st Qu.:0.1428
Median :0.12355 Median :0.5267 Median :0.2906 Median :1.5227 Median :2.242 Median :0.1575
Mean :0.12656 Mean :0.5273 Mean :0.2913 Mean :1.5483 Mean :2.235 Mean :0.1598
3rd Qu.:0.14195 3rd Qu.:0.5770 3rd Qu.:0.3116 3rd Qu.:1.7131 3rd Qu.:2.420 3rd Qu.:0.1733
Max. :0.53700 Max. :0.8897 Max. :0.4437 Max. :2.5862 Max. :2.878 Max. :0.2576
Ubiquitin_N pGSK3B_Tyr216_N SHH_N BAD_N BCL2_N pS6_N
Min. :0.7507 Min. :0.5774 Min. :0.1559 Min. :0.0883 Min. :0.08066 Min. :0.06725
1st Qu.:1.1163 1st Qu.:0.7937 1st Qu.:0.2064 1st Qu.:0.1364 1st Qu.:0.11555 1st Qu.:0.11084
Median :1.2366 Median :0.8499 Median :0.2240 Median :0.1523 Median :0.12947 Median :0.12163
Mean :1.2393 Mean :0.8488 Mean :0.2267 Mean :0.1579 Mean :0.13476 Mean :0.12152
3rd Qu.:1.3631 3rd Qu.:0.9162 3rd Qu.:0.2417 3rd Qu.:0.1740 3rd Qu.:0.14823 3rd Qu.:0.13196
Max. :1.8972 Max. :1.2046 Max. :0.3583 Max. :0.2820 Max. :0.26151 Max. :0.15875
NA's :213 NA's :285
pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N
Min. :0.08542 Min. :0.2586 Min. :0.07969 Min. :0.1055 Min. :0.1018 Min. :0.5865
1st Qu.:0.11351 1st Qu.:0.3981 1st Qu.:0.12585 1st Qu.:0.1551 1st Qu.:0.1651 1st Qu.:1.0814
Median :0.12652 Median :0.4485 Median :0.15824 Median :0.1749 Median :0.1940 Median :1.3174
Mean :0.13105 Mean :0.4461 Mean :0.16961 Mean :0.1831 Mean :0.2054 Mean :1.3378
3rd Qu.:0.14365 3rd Qu.:0.4908 3rd Qu.:0.19788 3rd Qu.:0.2045 3rd Qu.:0.2352 3rd Qu.:1.5858
Max. :0.25653 Max. :0.7596 Max. :0.47976 Max. :0.3607 Max. :0.4139 Max. :2.1298
NA's :75 NA's :180 NA's :210 NA's :270
Genotype Treatment Behavior class
Length:1080 Length:1080 Length:1080 Length:1080
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
colnames(ds)
[1] "MouseID" "DYRK1A_N" "ITSN1_N" "BDNF_N" "NR1_N"
[6] "NR2A_N" "pAKT_N" "pBRAF_N" "pCAMKII_N" "pCREB_N"
[11] "pELK_N" "pERK_N" "pJNK_N" "PKCA_N" "pMEK_N"
[16] "pNR1_N" "pNR2A_N" "pNR2B_N" "pPKCAB_N" "pRSK_N"
[21] "AKT_N" "BRAF_N" "CAMKII_N" "CREB_N" "ELK_N"
[26] "ERK_N" "GSK3B_N" "JNK_N" "MEK_N" "TRKA_N"
[31] "RSK_N" "APP_N" "Bcatenin_N" "SOD1_N" "MTOR_N"
[36] "P38_N" "pMTOR_N" "DSCR1_N" "AMPKA_N" "NR2B_N"
[41] "pNUMB_N" "RAPTOR_N" "TIAM1_N" "pP70S6_N" "NUMB_N"
[46] "P70S6_N" "pGSK3B_N" "pPKCG_N" "CDK5_N" "S6_N"
[51] "ADARB1_N" "AcetylH3K9_N" "RRP1_N" "BAX_N" "ARC_N"
[56] "ERBB4_N" "nNOS_N" "Tau_N" "GFAP_N" "GluR3_N"
[61] "GluR4_N" "IL1B_N" "P3525_N" "pCASP9_N" "PSD95_N"
[66] "SNCA_N" "Ubiquitin_N" "pGSK3B_Tyr216_N" "SHH_N" "BAD_N"
[71] "BCL2_N" "pS6_N" "pCFOS_N" "SYP_N" "H3AcK18_N"
[76] "EGR1_N" "H3MeK4_N" "CaNA_N" "Genotype" "Treatment"
[81] "Behavior" "class"
MouseID
[1] "309_1" "309_2" "309_3" "309_4" "309_5" "309_6" "309_7" "309_8"
[9] "309_9" "309_10" "309_11" "309_12" "309_13" "309_14" "309_15" "311_1"
[17] "311_2" "311_3" "311_4" "311_5" "311_6" "311_7" "311_8" "311_9"
[25] "311_10" "311_11" "311_12" "311_13" "311_14" "311_15" "320_1" "320_2"
[33] "320_3" "320_4" "320_5" "320_6" "320_7" "320_8" "320_9" "320_10"
[41] "320_11" "320_12" "320_13" "320_14" "320_15" "321_1" "321_2" "321_3"
[49] "321_4" "321_5" "321_6" "321_7" "321_8" "321_9" "321_10" "321_11"
[57] "321_12" "321_13" "321_14" "321_15" "322_1" "322_2" "322_3" "322_4"
[65] "322_5" "322_6" "322_7" "322_8" "322_9" "322_10" "322_11" "322_12"
[73] "322_13" "322_14" "322_15" "3415_1" "3415_2" "3415_3" "3415_4" "3415_5"
[81] "3415_6" "3415_7" "3415_8" "3415_9" "3415_10" "3415_11" "3415_12" "3415_13"
[89] "3415_14" "3415_15" "3499_1" "3499_2" "3499_3" "3499_4" "3499_5" "3499_6"
[97] "3499_7" "3499_8" "3499_9" "3499_10" "3499_11" "3499_12" "3499_13" "3499_14"
[105] "3499_15" "3507_1" "3507_2" "3507_3" "3507_4" "3507_5" "3507_6" "3507_7"
[113] "3507_8" "3507_9" "3507_10" "3507_11" "3507_12" "3507_13" "3507_14" "3507_15"
[121] "3520_1" "3520_2" "3520_3" "3520_4" "3520_5" "3520_6" "3520_7" "3520_8"
[129] "3520_9" "3520_10" "3520_11" "3520_12" "3520_13" "3520_14" "3520_15" "3521_1"
[137] "3521_2" "3521_3" "3521_4" "3521_5" "3521_6" "3521_7" "3521_8" "3521_9"
[145] "3521_10" "3521_11" "3521_12" "3521_13" "3521_14" "3521_15" "294_1" "294_2"
[153] "294_3" "294_4" "294_5" "294_6" "294_7" "294_8" "294_9" "294_10"
[161] "294_11" "294_12" "294_13" "294_14" "294_15" "3412_1" "3412_2" "3412_3"
[169] "3412_4" "3412_5" "3412_6" "3412_7" "3412_8" "3412_9" "3412_10" "3412_11"
[177] "3412_12" "3412_13" "3412_14" "3412_15" "3413_1" "3413_2" "3413_3" "3413_4"
[185] "3413_5" "3413_6" "3413_7" "3413_8" "3413_9" "3413_10" "3413_11" "3413_12"
[193] "3413_13" "3413_14" "3413_15" "3419_1" "3419_2" "3419_3" "3419_4" "3419_5"
[201] "3419_6" "3419_7" "3419_8" "3419_9" "3419_10" "3419_11" "3419_12" "3419_13"
[209] "3419_14" "3419_15" "3420_1" "3420_2" "3420_3" "3420_4" "3420_5" "3420_6"
[217] "3420_7" "3420_8" "3420_9" "3420_10" "3420_11" "3420_12" "3420_13" "3420_14"
[225] "3420_15" "3500_1" "3500_2" "3500_3" "3500_4" "3500_5" "3500_6" "3500_7"
[233] "3500_8" "3500_9" "3500_10" "3500_11" "3500_12" "3500_13" "3500_14" "3500_15"
[241] "3503_1" "3503_2" "3503_3" "3503_4" "3503_5" "3503_6" "3503_7" "3503_8"
[249] "3503_9" "3503_10" "3503_11" "3503_12" "3503_13" "3503_14" "3503_15" "362_1"
[257] "362_2" "362_3" "362_4" "362_5" "362_6" "362_7" "362_8" "362_9"
[265] "362_10" "362_11" "362_12" "362_13" "362_14" "362_15" "364_1" "364_2"
[273] "364_3" "364_4" "364_5" "364_6" "364_7" "364_8" "364_9" "364_10"
[281] "364_11" "364_12" "364_13" "364_14" "364_15" "365_1" "365_2" "365_3"
[289] "365_4" "365_5" "365_6" "365_7" "365_8" "365_9" "365_10" "365_11"
[297] "365_12" "365_13" "365_14" "365_15" "3477_1" "3477_2" "3477_3" "3477_4"
[305] "3477_5" "3477_6" "3477_7" "3477_8" "3477_9" "3477_10" "3477_11" "3477_12"
[313] "3477_13" "3477_14" "3477_15" "3478_1" "3478_2" "3478_3" "3478_4" "3478_5"
[321] "3478_6" "3478_7" "3478_8" "3478_9" "3478_10" "3478_11" "3478_12" "3478_13"
[329] "3478_14" "3478_15" "3479_1" "3479_2" "3479_3" "3479_4" "3479_5" "3479_6"
[337] "3479_7" "3479_8" "3479_9" "3479_10" "3479_11" "3479_12" "3479_13" "3479_14"
[345] "3479_15" "3480_1" "3480_2" "3480_3" "3480_4" "3480_5" "3480_6" "3480_7"
[353] "3480_8" "3480_9" "3480_10" "3480_11" "3480_12" "3480_13" "3480_14" "3480_15"
[361] "3484_1" "3484_2" "3484_3" "3484_4" "3484_5" "3484_6" "3484_7" "3484_8"
[369] "3484_9" "3484_10" "3484_11" "3484_12" "3484_13" "3484_14" "3484_15" "3497_1"
[377] "3497_2" "3497_3" "3497_4" "3497_5" "3497_6" "3497_7" "3497_8" "3497_9"
[385] "3497_10" "3497_11" "3497_12" "3497_13" "3497_14" "3497_15" "50810A_1" "50810A_2"
[393] "50810A_3" "50810A_4" "50810A_5" "50810A_6" "50810A_7" "50810A_8" "50810A_9" "50810A_10"
[401] "50810A_11" "50810A_12" "50810A_13" "50810A_14" "50810A_15" "50810D_1" "50810D_2" "50810D_3"
[409] "50810D_4" "50810D_5" "50810D_6" "50810D_7" "50810D_8" "50810D_9" "50810D_10" "50810D_11"
[417] "50810D_12" "50810D_13" "50810D_14" "50810D_15" "50810F_1" "50810F_2" "50810F_3" "50810F_4"
[425] "50810F_5" "50810F_6" "50810F_7" "50810F_8" "50810F_9" "50810F_10" "50810F_11" "50810F_12"
[433] "50810F_13" "50810F_14" "50810F_15" "3422_1" "3422_2" "3422_3" "3422_4" "3422_5"
[441] "3422_6" "3422_7" "3422_8" "3422_9" "3422_10" "3422_11" "3422_12" "3422_13"
[449] "3422_14" "3422_15" "3423_1" "3423_2" "3423_3" "3423_4" "3423_5" "3423_6"
[457] "3423_7" "3423_8" "3423_9" "3423_10" "3423_11" "3423_12" "3423_13" "3423_14"
[465] "3423_15" "3424_1" "3424_2" "3424_3" "3424_4" "3424_5" "3424_6" "3424_7"
[473] "3424_8" "3424_9" "3424_10" "3424_11" "3424_12" "3424_13" "3424_14" "3424_15"
[481] "3481_1" "3481_2" "3481_3" "3481_4" "3481_5" "3481_6" "3481_7" "3481_8"
[489] "3481_9" "3481_10" "3481_11" "3481_12" "3481_13" "3481_14" "3481_15" "3488_1"
[497] "3488_2" "3488_3" "3488_4" "3488_5" "3488_6" "3488_7" "3488_8" "3488_9"
[505] "3488_10" "3488_11" "3488_12" "3488_13" "3488_14" "3488_15" "3489_1" "3489_2"
[513] "3489_3" "3489_4" "3489_5" "3489_6" "3489_7" "3489_8" "3489_9" "3489_10"
[521] "3489_11" "3489_12" "3489_13" "3489_14" "3489_15" "3490_1" "3490_2" "3490_3"
[529] "3490_4" "3490_5" "3490_6" "3490_7" "3490_8" "3490_9" "3490_10" "3490_11"
[537] "3490_12" "3490_13" "3490_14" "3490_15" "3516_1" "3516_2" "3516_3" "3516_4"
[545] "3516_5" "3516_6" "3516_7" "3516_8" "3516_9" "3516_10" "3516_11" "3516_12"
[553] "3516_13" "3516_14" "3516_15" "J2292_1" "J2292_2" "J2292_3" "J2292_4" "J2292_5"
[561] "J2292_6" "J2292_7" "J2292_8" "J2292_9" "J2292_10" "J2292_11" "J2292_12" "J2292_13"
[569] "J2292_14" "J2292_15" "3414_1" "3414_2" "3414_3" "3414_4" "3414_5" "3414_6"
[577] "3414_7" "3414_8" "3414_9" "3414_10" "3414_11" "3414_12" "3414_13" "3414_14"
[585] "3414_15" "3416_1" "3416_2" "3416_3" "3416_4" "3416_5" "3416_6" "3416_7"
[593] "3416_8" "3416_9" "3416_10" "3416_11" "3416_12" "3416_13" "3416_14" "3416_15"
[601] "3417_1" "3417_2" "3417_3" "3417_4" "3417_5" "3417_6" "3417_7" "3417_8"
[609] "3417_9" "3417_10" "3417_11" "3417_12" "3417_13" "3417_14" "3417_15" "3429_1"
[617] "3429_2" "3429_3" "3429_4" "3429_5" "3429_6" "3429_7" "3429_8" "3429_9"
[625] "3429_10" "3429_11" "3429_12" "3429_13" "3429_14" "3429_15" "3504_1" "3504_2"
[633] "3504_3" "3504_4" "3504_5" "3504_6" "3504_7" "3504_8" "3504_9" "3504_10"
[641] "3504_11" "3504_12" "3504_13" "3504_14" "3504_15" "3505_1" "3505_2" "3505_3"
[649] "3505_4" "3505_5" "3505_6" "3505_7" "3505_8" "3505_9" "3505_10" "3505_11"
[657] "3505_12" "3505_13" "3505_14" "3505_15" "3522_1" "3522_2" "3522_3" "3522_4"
[665] "3522_5" "3522_6" "3522_7" "3522_8" "3522_9" "3522_10" "3522_11" "3522_12"
[673] "3522_13" "3522_14" "3522_15" "361_1" "361_2" "361_3" "361_4" "361_5"
[681] "361_6" "361_7" "361_8" "361_9" "361_10" "361_11" "361_12" "361_13"
[689] "361_14" "361_15" "363_1" "363_2" "363_3" "363_4" "363_5" "363_6"
[697] "363_7" "363_8" "363_9" "363_10" "363_11" "363_12" "363_13" "363_14"
[705] "363_15" "293_1" "293_2" "293_3" "293_4" "293_5" "293_6" "293_7"
[713] "293_8" "293_9" "293_10" "293_11" "293_12" "293_13" "293_14" "293_15"
[721] "3411_1" "3411_2" "3411_3" "3411_4" "3411_5" "3411_6" "3411_7" "3411_8"
[729] "3411_9" "3411_10" "3411_11" "3411_12" "3411_13" "3411_14" "3411_15" "3418_1"
[737] "3418_2" "3418_3" "3418_4" "3418_5" "3418_6" "3418_7" "3418_8" "3418_9"
[745] "3418_10" "3418_11" "3418_12" "3418_13" "3418_14" "3418_15" "3501_1" "3501_2"
[753] "3501_3" "3501_4" "3501_5" "3501_6" "3501_7" "3501_8" "3501_9" "3501_10"
[761] "3501_11" "3501_12" "3501_13" "3501_14" "3501_15" "3502_1" "3502_2" "3502_3"
[769] "3502_4" "3502_5" "3502_6" "3502_7" "3502_8" "3502_9" "3502_10" "3502_11"
[777] "3502_12" "3502_13" "3502_14" "3502_15" "3530_1" "3530_2" "3530_3" "3530_4"
[785] "3530_5" "3530_6" "3530_7" "3530_8" "3530_9" "3530_10" "3530_11" "3530_12"
[793] "3530_13" "3530_14" "3530_15" "3534_1" "3534_2" "3534_3" "3534_4" "3534_5"
[801] "3534_6" "3534_7" "3534_8" "3534_9" "3534_10" "3534_11" "3534_12" "3534_13"
[809] "3534_14" "3534_15" "3605_1" "3605_2" "3605_3" "3605_4" "3605_5" "3605_6"
[817] "3605_7" "3605_8" "3605_9" "3605_10" "3605_11" "3605_12" "3605_13" "3605_14"
[825] "3605_15" "3606_1" "3606_2" "3606_3" "3606_4" "3606_5" "3606_6" "3606_7"
[833] "3606_8" "3606_9" "3606_10" "3606_11" "3606_12" "3606_13" "3606_14" "3606_15"
[841] "18899_1" "18899_2" "18899_3" "18899_4" "18899_5" "18899_6" "18899_7" "18899_8"
[849] "18899_9" "18899_10" "18899_11" "18899_12" "18899_13" "18899_14" "18899_15" "3476_1"
[857] "3476_2" "3476_3" "3476_4" "3476_5" "3476_6" "3476_7" "3476_8" "3476_9"
[865] "3476_10" "3476_11" "3476_12" "3476_13" "3476_14" "3476_15" "3483_1" "3483_2"
[873] "3483_3" "3483_4" "3483_5" "3483_6" "3483_7" "3483_8" "3483_9" "3483_10"
[881] "3483_11" "3483_12" "3483_13" "3483_14" "3483_15" "3498_1" "3498_2" "3498_3"
[889] "3498_4" "3498_5" "3498_6" "3498_7" "3498_8" "3498_9" "3498_10" "3498_11"
[897] "3498_12" "3498_13" "3498_14" "3498_15" "50810B_1" "50810B_2" "50810B_3" "50810B_4"
[905] "50810B_5" "50810B_6" "50810B_7" "50810B_8" "50810B_9" "50810B_10" "50810B_11" "50810B_12"
[913] "50810B_13" "50810B_14" "50810B_15" "50810C_1" "50810C_2" "50810C_3" "50810C_4" "50810C_5"
[921] "50810C_6" "50810C_7" "50810C_8" "50810C_9" "50810C_10" "50810C_11" "50810C_12" "50810C_13"
[929] "50810C_14" "50810C_15" "50810E_1" "50810E_2" "50810E_3" "50810E_4" "50810E_5" "50810E_6"
[937] "50810E_7" "50810E_8" "50810E_9" "50810E_10" "50810E_11" "50810E_12" "50810E_13" "50810E_14"
[945] "50810E_15" "3421_1" "3421_2" "3421_3" "3421_4" "3421_5" "3421_6" "3421_7"
[953] "3421_8" "3421_9" "3421_10" "3421_11" "3421_12" "3421_13" "3421_14" "3421_15"
[961] "3425_1" "3425_2" "3425_3" "3425_4" "3425_5" "3425_6" "3425_7" "3425_8"
[969] "3425_9" "3425_10" "3425_11" "3425_12" "3425_13" "3425_14" "3425_15" "3426_1"
[977] "3426_2" "3426_3" "3426_4" "3426_5" "3426_6" "3426_7" "3426_8" "3426_9"
[985] "3426_10" "3426_11" "3426_12" "3426_13" "3426_14" "3426_15" "3491_1" "3491_2"
[993] "3491_3" "3491_4" "3491_5" "3491_6" "3491_7" "3491_8" "3491_9" "3491_10"
[ reached getOption("max.print") -- omitted 80 entries ]
table(ds_filtered$Genotype)
Control Ts65Dn
570 510
table(ds_filtered$Treatment)
Memantine Saline
570 510
table(ds_filtered$Behavior)
C/S S/C
525 555
table(ds_filtered$class)
c-CS-m c-CS-s c-SC-m c-SC-s t-CS-m t-CS-s t-SC-m t-SC-s
150 135 150 135 135 105 135 135
Higuera, C., Gardiner, K. J., & Cios, K. J. (2015). Self-Organizing Feature Maps Identyfy Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE, 10(6).↩
Ahmed, M. M., Dhanasekaran, A. R., Block, A., Tong, S., Costa, A. C. S., Stasko, M., & Gardiner, K. J. (2014). Protein Dynamics Associated with Failed and Rescues Learning in the TS65Dn Mouse Model of Down Syndrome. PLoS ONE, 10(3).↩
Costa, A., Scott-McKean, J., & Stasko, M. (2008). Acute injections of the NMDA receptor antagonist memantine rescue performance deficits of the Ts65Dn mouse model of Down syndrome on a fear conditioning test. Neuropsychopharmacology, 33(7), 1624-1632.↩
Davisson, M., Schmidt, C., Reeves, R., Irving, N., Akeson, E., Harris, B., & Bronson, R. (1993). Segmental trisomy as a mouse model for Down syndrome. Prog Clin Biol Res, 384, 117-133.↩
Mitra, A., Blank, M., & Madison, D. (2012). Developmentally altered inhibition in Ts65Dn, a mouse model of Down syndrome. Brain Research, 1440, 1-8.↩
http://journals.plos.org/plosone/article/figure/image?size=medium&id=10.1371/journal.pone.0129126.g001↩
https://docs.google.com/spreadsheets/d/1scXPvhOh3kmANCAhf1X_CLwJQckOJA-WMP_f9lHVri4/edit?usp=sharing, CONTROL MICE PROTEINS http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0129126.s003, TRISOMIC MICE PROTEINS, http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0129126.s004↩