Memory Test Drug Trial Analysis

Background

An experiment on the effects of anti-anxiety medicine on memory recall when being primed with happy or sad memories. The participants were done on novel Islanders whom mimic real-life humans in response to external factors.

Drugs of interest (known-as) [Dosage 1, 2, 3]:

A - Alprazolam (Xanax, Long-term) [1mg/3mg/5mg]

T - Triazolam (Halcion, Short-term) [0.25mg/0.5mg/0.75mg]

S- Sugar Tablet (Placebo) [1 tab/2tabs/3tabs]

Dosages follow a 1:1 ratio to ensure validity
Happy or Sad memories were primed 10 minutes prior to testing
Participants tested every day for 1 week to mimic addiction

Participants - all genders above 25+ years old to ensure a fully developed pre-frontal cortex, a region responsible for higher level cognition and memory recall.

Libraries

pacman::p_load(tidyverse, GGally, ggExtra, reshape2, AICcmodavg, RColorBrewer)

memtest <- read.csv('Islander_data.csv', header = TRUE)

Data characteristics

summary(memtest)

##   first_name         last_name              age        Happy_Sad_group   
##  Length:198         Length:198         Min.   :24.00   Length:198        
##  Class :character   Class :character   1st Qu.:30.00   Class :character  
##  Mode  :character   Mode  :character   Median :37.00   Mode  :character  
##                                        Mean   :39.53                     
##                                        3rd Qu.:48.00                     
##                                        Max.   :83.00                     
##      Dosage         Drug           Mem_Score_Before Mem_Score_After 
##  Min.   :1.00   Length:198         Min.   : 27.20   Min.   : 27.10  
##  1st Qu.:1.00   Class :character   1st Qu.: 46.52   1st Qu.: 47.17  
##  Median :2.00   Mode  :character   Median : 54.80   Median : 56.75  
##  Mean   :1.99                      Mean   : 57.97   Mean   : 60.92  
##  3rd Qu.:3.00                      3rd Qu.: 68.40   3rd Qu.: 73.25  
##  Max.   :3.00                      Max.   :110.00   Max.   :120.00  
##       Diff        
##  Min.   :-40.400  
##  1st Qu.: -3.175  
##  Median :  1.700  
##  Mean   :  2.955  
##  3rd Qu.:  5.925  
##  Max.   : 49.000

str(memtest)

## 'data.frame':    198 obs. of  9 variables:
##  $ first_name      : chr  "Bastian" "Evan" "Florencia" "Holly" ...
##  $ last_name       : chr  "Carrasco" "Carrasco" "Carrasco" "Carrasco" ...
##  $ age             : int  25 52 29 50 52 37 35 38 29 36 ...
##  $ Happy_Sad_group : chr  "H" "S" "H" "S" ...
##  $ Dosage          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Drug            : chr  "A" "A" "A" "A" ...
##  $ Mem_Score_Before: num  63.5 41.6 59.7 51.7 47 66.4 44.1 76.3 56.2 54.8 ...
##  $ Mem_Score_After : num  61.2 40.7 55.1 51.2 47.1 58.1 56 74.8 45 75.9 ...
##  $ Diff            : num  -2.3 -0.9 -4.6 -0.5 0.1 -8.3 11.9 -1.5 -11.2 21.1 ...

sum(is.na(memtest))

## [1] 0

There is no missing value from our data set but we need to convert variables into categorical format for further analysis.

memtest$Happy_Sad_group <- as.factor(memtest$Happy_Sad_group)
memtest$Drug <- as.factor(memtest$Drug)

Exploratory Data Analysis

Descriptive statistics

table(memtest$Dosage, memtest$Drug)

##    
##      A  S  T
##   1 23 22 22
##   2 22 22 22
##   3 22 22 21

The table shows the number of participants for each drug and dosage. It can be seen that the participants in the clinical trial were grouped into equal groups.

table(memtest$Dosage, memtest$Drug, memtest$Happy_Sad_group)

## , ,  = H
## 
##    
##      A  S  T
##   1 11 11 11
##   2 11 11 11
##   3 11 11 11
## 
## , ,  = S
## 
##    
##      A  S  T
##   1 12 11 11
##   2 11 11 11
##   3 11 11 10

The participants in each group were allocated evenly among the the happy and sad memory groups.

Data Visualization

memtest %>% ggplot(aes(x = age, color = 4)) + 
            geom_histogram(bins = 45, fill = '#2196f3', position = "dodge", show.legend = FALSE) +
            labs(x = 'Age', y = 'Number of participants') + 
            ggtitle('Distribution of participants by age') +
            theme(plot.title = element_text(hjust = 0.5))

ggpairs(memtest[,c('age','Dosage','Mem_Score_Before','Mem_Score_After','Diff')],
        title = 'Correlation between the variables',
        aes(color = memtest$Happy_Sad_group, alpha = 0.25),
        columnLabels = c('age','Dosage','Mem_Score_Before','Mem_Score_After','Diff'),
        lower = list(continuous = "smooth")) +
        theme(plot.title = element_text(hjust = 0.5))

The age distribution shows a non normal distribution among the participants. The correlation plot shows positive correlation between the memory score before and after the test also we can observe a slight positive correlation between the memory score after and the difference between the scores.

hist <- ggplot(memtest, aes(x = Mem_Score_Before, y = Mem_Score_After, color = Happy_Sad_group)) +
        geom_point() +
        scale_y_continuous(breaks = seq(25, 125, by = 10)) +
        scale_x_continuous(breaks = seq(25, 125, by = 10)) +
        labs(x = 'Memory score before', y = 'Memory score after') +
        ggtitle('Memory score by block comarison')
        ggMarginal(hist, type = 'histogram', xparams = list(fill = 4), yparams = list(fill = 3))

happy_grp <- select(memtest, first_name, last_name, Mem_Score_Before, Mem_Score_After, Happy_Sad_group) %>% filter(Happy_Sad_group == 'H') %>% melt(id.vars = c('first_name', 'last_name', 'Happy_Sad_group'))
sad_grp <- select(memtest, first_name, last_name, Mem_Score_Before, Mem_Score_After, Happy_Sad_group) %>% filter(Happy_Sad_group == 'S') %>% melt(id.vars = c('first_name', 'last_name', 'Happy_Sad_group'))
        ggplot(happy_grp, aes(x = variable, y = value, color = variable)) +
        geom_point(cex = 1.5, pch = 1.0, position = position_jitter(w = 0.1, h = 0)) +
        geom_boxplot(alpha = 0.5) +
        geom_point(data = sad_grp ,cex = 1.5, pch = 1.0, position = position_jitter(w = 0.1, h = 0)) +
        geom_boxplot(data = sad_grp, alpha = 0.5) +
        facet_wrap(~Happy_Sad_group) +
        labs(x = 'Memory score before vs after', y = 'Score') +
        ggtitle('Memory score by block before and after the test comarison') +
        theme(plot.title = element_text(hjust = 0.5))

The correlation between the scores before and after the test among the happy and sad blocks. On the second plot can be seen that the means of the groups are negligible between the two blocks indicating that the happy and sad memories that were primed 10 minutes prior to testing has no significant contribution.

memtest %>% ggplot(aes(x = Mem_Score_Before, y = Mem_Score_After, color = Dosage)) +
            geom_point() + 
            facet_wrap(~Drug) + 
            labs(x = 'Memory score before', y = 'Memory score after') +
            ggtitle('Memory score by drugs and dosage comparison') +
            theme(plot.title = element_text(hjust = 0.5))

The correlation between the two different drugs and the control group among the two blocks.

memtest %>% ggplot(aes(x = Dosage, y = Mem_Score_Before, group = Dosage, color = Dosage)) +
            geom_point(cex = 1.5, pch = 1.0, position = position_jitter(w = 0.1, h = 0)) +
            geom_boxplot(alpha = 0.5) +
            facet_wrap(~Drug) +
            labs(x = 'Dosage', y = 'Memory score before') +
            ggtitle('Memory score by drug groups and dosage before the test') +
            theme(plot.title = element_text(hjust = 0.5))

memtest %>% ggplot(aes(x = Dosage, y = Mem_Score_After, group = Dosage, color = Dosage)) +
            geom_point(cex = 1.5, pch = 1, position = position_jitter(w = 0.1, h = 0)) +
            geom_boxplot(alpha = 0.5) +
            facet_wrap(~Drug) +
            labs(x = 'Dosage', y = 'Memory score after') +
            ggtitle('Memory score by drug groups and dosage after the test') +
            theme(plot.title = element_text(hjust = 0.5))

The Alprazolam has the biggest effect on memory score with the increase of dosage there is a significant increase on scores. The sugar and Triazolam didn’t show any significant increase in scores only a slight improvement with the Triazolam.

A_ordered <- memtest[order(memtest$Mem_Score_Before, decreasing = TRUE),] %>% filter(Drug == 'A') %>% select(first_name, last_name, Mem_Score_Before, Mem_Score_After) %>%
  head(n = 10)
S_ordered <- memtest[order(memtest$Mem_Score_Before, decreasing = TRUE),] %>% filter(Drug == 'S') %>% select(first_name, last_name, Mem_Score_Before, Mem_Score_After) %>% 
  head(n = 10)
T_ordered <- memtest[order(memtest$Mem_Score_Before, decreasing = TRUE),] %>% filter(Drug == 'T') %>% select(first_name, last_name, Mem_Score_Before, Mem_Score_After) %>% 
  head(n = 10)

melt(A_ordered, id.vars = c('first_name', 'last_name')) %>%
  ggplot(aes(x = value, y = first_name)) + 
  geom_line() + 
  geom_point(aes(color = variable), size = 3) + 
  theme(legend.position = 'bottom', plot.title = element_text(hjust = 0.5)) +
  ggtitle('Top 10 performer before vs after - Drug A') +
  labs(x = 'Performance score', y = 'Top 10 performer')

melt(S_ordered, id.vars = c('first_name', 'last_name')) %>%
  ggplot(aes(x = value, y = first_name)) + 
  geom_line() + 
  geom_point(aes(color = variable), size = 3) + 
  theme(legend.position = 'bottom', plot.title = element_text(hjust = 0.5)) +
  ggtitle('Top 10 performer before vs after - Drug S') +
  labs(x = 'Performance score', y = 'Top 10 performer')

melt(T_ordered, id.vars = c('first_name', 'last_name')) %>%
  ggplot(aes(x = value, y = first_name)) + 
  geom_line() + 
  geom_point(aes(color = variable), size = 3) + 
  theme(legend.position = 'bottom', plot.title = element_text(hjust = 0.5)) +
  ggtitle('Top 10 performer before vs after - Drug T') +
  labs(x = 'Performance score', y = 'Top 10 performer')

The orange dots show the top 10 performers score before they received their dosage compared with the score after the dosage shown as blue for each Drug.

Statistical analysis

Performing an ANNOVA test to estimate how the quantitative dependent variable changes according to the independent variables. Our null hypothesis (\(H_0\)) of the ANOVA is no difference in means and the alternative hypothesis (\(H_a\)) is that the means are differ from each other. We convert the dosage into a factorial variable before performing the test.

memtest$Dosage <- as.factor(memtest$Dosage)

anova_two_way <- aov(formula = Diff~Drug+Dosage, data = memtest)
summary(anova_two_way)

##              Df Sum Sq Mean Sq F value  Pr(>F)    
## Drug          2   4305  2152.4  24.082 4.6e-10 ***
## Dosage        2   1231   615.4   6.885 0.00129 ** 
## Residuals   193  17250    89.4                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova_interaction <- aov(formula = Diff~Drug*Dosage, data = memtest)
summary(anova_interaction)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Drug          2   4305  2152.4  33.596 3.28e-13 ***
## Dosage        2   1231   615.4   9.606 0.000106 ***
## Drug:Dosage   4   5141  1285.3  20.062 8.74e-14 ***
## Residuals   189  12109    64.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova_inter_confounding <- aov(formula = Diff~Drug*Dosage+Happy_Sad_group, data = memtest)
summary(anova_inter_confounding)

##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## Drug              2   4305  2152.4  33.443 3.75e-13 ***
## Dosage            2   1231   615.4   9.562 0.000111 ***
## Happy_Sad_group   1      8     8.0   0.124 0.725429    
## Drug:Dosage       4   5142  1285.5  19.973 1.01e-13 ***
## Residuals       188  12100    64.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova_three_inter <- aov(formula = Diff~Drug*Dosage*Happy_Sad_group, data = memtest)
summary(anova_three_inter)

##                              Df Sum Sq Mean Sq F value   Pr(>F)    
## Drug                          2   4305  2152.4  33.957 3.07e-13 ***
## Dosage                        2   1231   615.4   9.709 9.90e-05 ***
## Happy_Sad_group               1      8     8.0   0.126   0.7234    
## Drug:Dosage                   4   5142  1285.5  20.280 8.33e-14 ***
## Drug:Happy_Sad_group          2    106    53.2   0.839   0.4338    
## Dosage:Happy_Sad_group        2      9     4.7   0.075   0.9280    
## Drug:Dosage:Happy_Sad_group   4    574   143.6   2.266   0.0639 .  
## Residuals                   180  11409    63.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the above summary we can decide whether any of the group means differ from the overall mean. We conducted four different ANOVA test, first a two-way ANOVA, secondly we included the interaction effect, third time the Happy and Sad grouping was included and lastly the interaction between the three independent variables were tested.

From the summary we can conclude that the Happy_Sad_group variable and it’s interaction with the other variables is not significant. It has a low sum of squares and a high p-value which means that it’s not effecting the variation in the dependent variable. The only significant variables are the Drug, Dosage and the interaction between them.

Finding the best fit model that explains the variation in the model, we use the Akaike Information Criterion that calculates the information value on each model. We choose the model with the lowest value as it explains more variation in the data.

model.set <- list(anova_two_way, anova_interaction, anova_inter_confounding, anova_three_inter)
model.names <- c('anova_two_way', 'anova_interaction', 'anova_inter_confounding', 'anova_three_inter')

aictab(model.set, modnames = model.names)

## 
## Model selection based on AICc:
## 
##                          K    AICc Delta_AICc AICcWt Cum.Wt      LL
## anova_interaction       10 1397.53       0.00   0.74   0.74 -688.18
## anova_inter_confounding 11 1399.63       2.10   0.26   0.99 -688.10
## anova_three_inter       19 1406.85       9.32   0.01   1.00 -682.29
## anova_two_way            6 1458.86      61.33   0.00   1.00 -723.21

The ANOVA model with the interaction between the independent variables Drug and Dosage has the lowest AIC score and AIC weight meaning that it explains 74% of the total variation in the dependent variable.

Checking for homoscedascity

We plot the model diagnostic

par(mfrow = c(2,2))
plot(anova_interaction)

par(mfrow = c(1,1))

The residuals plot shows the unexplained variance in the data. The Residuals vs Fitted looks good, it’s centered around zero. The Q-Q plot shows the regression between theoretical and the actual residuals of the model. It shows some issues at both tails but seems sufficient. The Scale-Location is centered around at 1 meaning no large outliers.

From the ANOVA test we concluded that there are differences between the group means but we don’t know which groups are statistically different from one other. We perform the Tukey’s HSD post-hoc test to do a pairwise comparison between each group and determine which groups are statistically significant.

TukeyHSD(anova_interaction, ordered = TRUE, conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
##     factor levels have been ordered
## 
## Fit: aov(formula = Diff ~ Drug * Dosage, data = memtest)
## 
## $Drug
##           diff       lwr       upr     p adj
## S-T  0.4164802 -2.887694  3.720654 0.9523138
## A-T 10.0578416  6.765925 13.349758 0.0000000
## A-S  9.6413614  6.362128 12.920595 0.0000000
## 
## $Dosage
##         diff        lwr      upr     p adj
## 2-1 1.709833 -1.5694003 4.989067 0.4359693
## 3-1 5.943694  2.6517782 9.235611 0.0000932
## 3-2 4.233861  0.9296874 7.538035 0.0078887
## 
## $`Drug:Dosage`
##               diff         lwr       upr     p adj
## T:3-S:3  0.4216450 -7.24131446  8.084605 1.0000000
## T:1-S:3  0.9045455 -6.66878568  8.477877 0.9999887
## S:2-S:3  1.3909091 -6.18242205  8.964240 0.9996978
## A:1-S:3  2.4498024 -5.04075764  9.940362 0.9829030
## T:2-S:3  3.2954545 -4.27787659 10.868786 0.9092533
## S:1-S:3  4.5318182 -3.04151296 12.105149 0.6298208
## A:2-S:3  8.0272727  0.45394159 15.600604 0.0286613
## A:3-S:3 24.7863636 17.21303250 32.359695 0.0000000
## T:1-T:3  0.4829004 -7.18005905  8.145860 0.9999999
## S:2-T:3  0.9692641 -6.69369541  8.632224 0.9999823
## A:1-T:3  2.0281573 -5.55300962  9.609324 0.9954813
## T:2-T:3  2.8738095 -4.78914995 10.536769 0.9604468
## S:1-T:3  4.1101732 -3.55278632 11.773133 0.7560868
## A:2-T:3  7.6056277 -0.05733177 15.268587 0.0534433
## A:3-T:3 24.3647186 16.70175914 32.027678 0.0000000
## S:2-T:1  0.4863636 -7.08696750  8.059695 0.9999999
## A:1-T:1  1.5452569 -5.94530309  9.035817 0.9992873
## T:2-T:1  2.3909091 -5.18242205  9.964240 0.9863640
## S:1-T:1  3.6272727 -3.94605841 11.200604 0.8530021
## A:2-T:1  7.1227273 -0.45060387 14.696058 0.0833773
## A:3-T:1 23.8818182 16.30848704 31.455149 0.0000000
## A:1-S:2  1.0588933 -6.43166673  8.549453 0.9999583
## T:2-S:2  1.9045455 -5.66878568  9.477877 0.9970566
## S:1-S:2  3.1409091 -4.43242205 10.714240 0.9298886
## A:2-S:2  6.6363636 -0.93696750 14.209695 0.1379000
## A:3-S:2 23.3954545 15.82212341 30.968786 0.0000000
## T:2-A:1  0.8456522 -6.64490783  8.336212 0.9999927
## S:1-A:1  2.0820158 -5.40854420  9.572576 0.9941342
## A:2-A:1  5.5774704 -1.91308965 13.068030 0.3257899
## A:3-A:1 22.3365613 14.84600126 29.827121 0.0000000
## S:1-T:2  1.2363636 -6.33696750  8.809695 0.9998752
## A:2-T:2  4.7318182 -2.84151296 12.305149 0.5726311
## A:3-T:2 21.4909091 13.91757795 29.064240 0.0000000
## A:2-S:1  3.4954545 -4.07787659 11.068786 0.8772990
## A:3-S:1 20.2545455 12.68121432 27.827877 0.0000000
## A:3-A:2 16.7590909  9.18575977 24.332422 0.0000000

The Tukey’s HSD test shows that the interaction of Alprazolam and Triazolam, Alprazolam and Sugar are significant however the Triazolam and Sugar comparison found to be not significant with a p-value of 0.95. The significant groups are:

A:3-S:3
A:3-T:3
A:3-T:1
A:3-S:2
A:3-A:1
A:3-T:2
A:3-S:1
A:3-A:2

Residual diagnostic

plot(TukeyHSD(anova_interaction, conf.level = 0.95),las = 2.5)

The significant group-wise differences are where the confidence interval doesn’t include zero, meaning that the p-value is <0.05.

ggplot(memtest, aes(x = Dosage, y = Diff, color = Drug)) +
  geom_point(cex = 1.5, pch = 1.0, position = position_jitter(w = 0.1, h = 0)) +
  geom_boxplot(alpha = 0.2) +
  stat_summary(fun.data = 'mean_se', geom = 'errorbar', width = 0.2, color = 'red') +
  stat_summary(fun.data = 'mean_se', geom = 'pointrange', color = 'red') +
  facet_wrap(~Drug) + 
  labs(x = 'Dosage from 1-3', y = 'Difference in memory score') +
  ggtitle('Mean difference of Drugs by dosage with standard error') +
  theme(plot.title = element_text(hjust = 0.5))

The mean difference between the Drugs and Dosages shown above with standard errors for each group.

m <- matrix(data = 0, nrow = 3, ncol = 3, dimnames = list(c('1','2','3'),c('A','T','S')))
for (i in 1:3){
  for (j in c('A', 'T', 'S')){
    x <- memtest %>% group_by(Dosage) %>% filter(Dosage == i, Drug == j) %>% summarise(mean = mean(Diff))
    m[i,j] <- x[[2]]
  }
}
m <- as.data.frame(m)
ggplot(m, aes(x = seq(1,3, by = 1))) + 
  geom_line(aes(y = A), color = 1, linewidth = 0.5) + 
  geom_line(aes(y = T), color = 2, linewidth = 0.5) +
  geom_line(aes(y = S), color = 3, linewidth = 0.5) +
  ggtitle('Interaction plot of A,T,S by dosage') +
  labs(x = "Dosage", y = "Mean Difference") +
  theme(plot.title = element_text(hjust = 0.5))

##            A         T          S
## 1  0.3043478 -1.240909  2.3863636
## 2  5.8818182  1.150000 -0.7545455
## 3 22.6409091 -1.723810 -2.1454545

The interaction plot between the two drugs and the placebo shown above. The Alprazolam has the largest range of variance but it didn’t interact with the Triazolam while the control group (Sugar) has interactions with both group.