Exercise 1: Conceptual questions

What is the primary impact on running a regression model and ignoring the fact that the data has repeated measures in it?

Answer A:

Biased estimates: The model’s estimates of the relationship between variables may be inaccurate, either underestimating or overestimating the true effect.
Incorrect standard errors: The standard errors, which indicate the precision of the estimates, will be incorrect. This leads to:
- Invalid p-values: These reflect the significance of the results, but will be unreliable with incorrect standard errors.
- Unreliable confidence intervals: These represent the range within which the true effect is likely to lie, but will be inaccurate if standard errors are wrong.

If the pattern of a correlogram shows that the correlation stay relatively constant across the difference in time of your residuals, what correlation structure would be appropriate to model that behavior?

Answer B:

If the correlogram shows: constant correlation across time lags for your residuals, the appropriate correlation structure to model this behavior would be simple correlation structure or uniform correlation.
This structure assumes that the correlation (ρ) between any two residuals is constant regardless of the time difference (lag) between them.

Load in the nlme package and type ?corStruct. Provide a list of the correlation structure names. Keep this in mind that while we’ve explored AR1 (corExp) and CS (corCompSymm), we can always try others to get a better fit.

# Load the nlme package
library(nlme)

# Get help for corStruct function
?corStruct

## starting httpd help server ... done

#  manually list the commonly used correlation structures.
correlation_structures <- c("corAR1", "corARMA", "corCAR1", "corCompSymm", "corExp",
                            "corGaus", "corLin", "corRatio", "corSpher", "corSymm")

# Print  list 
print(correlation_structures)

##  [1] "corAR1"      "corARMA"     "corCAR1"     "corCompSymm" "corExp"     
##  [6] "corGaus"     "corLin"      "corRatio"    "corSpher"    "corSymm"

Exercise 2: Brain Sites of Short and Long Term Memory

In the late 1980s and early 1990s, it was hypothesized that the hippocompal region of brain is a potential site for short term memory and not long term memory. To test this, 18 monkeys were given training to discriminate 100 pairs of objects. After their training, 11 monkey were randomly selected and went through a procedure that had access to their hippocampal formations blocked. The remaining 7 were left untreated.

The interesting bit here is that not all 100 pairs of objects were trained at the same time. 20 were trained to all the monkeys 2 weeks prior to the blockage, 20 were trained 4 weeks prior, and 20 at 8,12, and 16 weeks prior. After the blockage to the hippocampus all monkeys were then given a test to see how many pairs of objects they could discriminate. If there hypothesis was true, there should be a difference between the two groups of monkeys when looking at their performance of objecst trained 2 and 4 weeks prior (short term memory) while there should not be much of a difference for the objects for later times prior (long term memory).

A plot of the data set is below. The reponse is the NewPercentCorrect variables which is the percentage of correct objects for each of the 20 items at each training week. Use the data set to answer the following questions

Based on the graph, it doesn’t appear that the trend is linear to model Week as a numeric variable. Additionally the question of interest is to examine what is going on between the treatment versus control at each training week (prior to hippocampal blockage) so treating it as a factor will be helpful. Based on the graph do you think an interaction term is needed between Treatment and Week? Explain your view point.
Use the modeling codes from the Prelive Assignment to obtain a correlogram of the residuals fitted by a basic MLR model. Be sure to include Week as a factor variable and not numeric. Comment on whether it is obvious on whether a specific correlation structure would work well or not.
Fit two repeated measures models, one compound symmetry and one AR1. Compare the AIC values. You can optionally try the unstructured one if it will run.
Based on the best AIC fit, use similar codes from the pre live assignment to compare the treatment groups at each Week using a Bonferroni correction. Does the data conclude that the researchers hypothesis appears reasonable? Point to which tests support your answer.

Answers Exercise 2:

A.

It is reasonable to include an interaction term between treatment and week to capture the varying effect of treatment across different times. 1. Visually distinct trends: The graph shows that the trends for the treatment and control groups differ significantly across weeks. The control group exhibits a peak at week 4 followed by a decline, while the treated group’s performance fluctuates. 2. Inconsistent treatment effect: This visual difference suggests that the effect of the treatment is not consistent over time. In other words, the treatment might have a different impact on the percentage of correct responses depending on the specific week of training. 3. Omitted interaction effect: If an interaction term is not included in the model, this varying effect of treatment across weeks would be missed. The model would only be able to capture an average treatment effect across all weeks, which might not accurately represent the true relationship between treatment and performance. 4. Therefore: Including an interaction term between treatment and week allows the model to capture the differential effect of the treatment on the percentage of correct responses depending on the week of training, providing a more accurate representation of the data.

I analyzed the plot and observed that: * Non-linear trend: Both groups’ performance slopes change, indicating a non-linear relationship between weeks and scores. * Treated group: Performance drops significantly around Week 2 and fluctuates afterwards. * Control group: Performance stays relatively stable.

These contrasting patterns suggest the treatment’s effect varies across weeks. The difference in slopes, especially the treated group’s drop, highlights a week-dependent relationship with performance. Therefore, including an interaction term between Treatment and Week in the statistical model is crucial to capture this complex relationship.

B.

# Load libraries
library(tidyverse)
library(reshape2) # For correlogram

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(Hmisc)  # For correlogram

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)

# Correlogram function
cor.gram <- function(mymodel, resid.col = "NewPerCorrect") {
  # Get residuals
  r.raw <- residuals(mymodel)
  
  # Create data frame with subject and week
  r.frame <- data.frame(
    Subject = data$Monkey,
    Week = data$Week
  )
  
  # Add residuals as a column
  r.frame[, resid.col] <- r.raw
  
  # Cast data to wide format (week as columns)
  r.wide <- acast(r.frame, Subject ~ Week, value.var = resid.col)
  
  # Calculate correlation matrix
  r.cor <- rcorr(r.wide)$r
  
  # Create data frame for correlogram
  r.tall <- data.frame(
    rows = rownames(r.cor)[row(r.cor)],
    vars = colnames(r.cor)[col(r.cor)],
    values = c(r.cor)
  )
  
  # Calculate distance between residuals
  r.tall$d <- abs(as.numeric(r.tall$rows) - as.numeric(r.tall$vars))
  
  # Aggregate values by distance
  r.tall <- aggregate(values ~ d, data = r.tall, function(x) { tanh(mean(atanh(x))) })
  
  return(r.tall)
}

# MLR model with Week as factor
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)

# Calculate and plot correlogram
mycorgram <- cor.gram(mymodel)

ggplot(mycorgram, aes(x = d, y = values)) +
  geom_point() +
  geom_smooth(method = "loess", size = 1.5) +
  ggtitle("Correlogram of Monkey Memory MLR Residuals") +
  xlab("Distance between residuals (Week)") +
  ylim(0, 1)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

  # Load libraries
library(nlme)

# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)

# MLR model (already defined)
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)

# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corCompSymm(form = ~ 1 | Monkey))

# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corAR1(form = ~ 1 | Monkey))

# Unstructured model (may not run due to insufficient data)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corAR1(form = ~ 1 | Monkey))

# Compare AIC values
print(AIC(mymodel))

## [1] 725.2303

print(AIC(CSmodel))

## [1] 706.9572

print(AIC(AR1model))

## [1] 709.2408

print(AIC(UNmodel))

## [1] 709.2408

print(mymodel)

## 
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
## 
## Coefficients:
##           (Intercept)       TreatmentTreated                   Week  
##                81.788                -18.505                 -1.043  
## TreatmentTreated:Week  
##                 1.387

print(CSmodel)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -347.4786
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##             81.788123            -18.505353             -1.043118 
## TreatmentTreated:Week 
##              1.387027 
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##       Rho 
## 0.3504484 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.32539

print(AR1model)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -348.6204
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##            80.7556220           -16.8676715            -0.8650237 
## TreatmentTreated:Week 
##             1.1175535 
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##      Phi 
## 0.415961 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546

print(UNmodel)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -348.6204
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##            80.7556220           -16.8676715            -0.8650237 
## TreatmentTreated:Week 
##             1.1175535 
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##      Phi 
## 0.415961 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546

Interpretation:

The correlogram shows how the correlation between residuals changes as the distance (difference in weeks) between them changes. Ideally, the correlogram should be close to a flat line at zero, indicating no correlation between residuals. However, the provided image suggests that there might be some correlation between the residuals, especially for smaller distances (closer weeks). This indicates a potential violation of the independence assumption of MLR, which can affect the accuracy of the model’s results.

While the specific correlation structure is not immediately clear from the correlogram, considering structures like compound symmetry (CS) or autoregressive (AR1) might be appropriate to account for potential dependencies in the residuals when fitting the model.

C:

# Load libraries
library(tidyverse)
library(reshape2) # For correlogram
library(Hmisc)  # For correlogram

# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)

# Correlogram function
cor.gram <- function(mymodel, resid.col = "NewPerCorrect") {
    # Get residuals
    r.raw <- residuals(mymodel)
    
    # Create data frame with subject and week
    r.frame <- data.frame(
        Subject = data$Monkey,
        Week = data$Week
    )
    
    # Add residuals as a column
    r.frame[, resid.col] <- r.raw
    
    # Cast data to wide format (week as columns)
    r.wide <- acast(r.frame, Subject ~ Week, value.var = resid.col)
    
    # Calculate correlation matrix
    r.cor <- rcorr(r.wide)$r
    
    # Create data frame for correlogram
    r.tall <- data.frame(
        rows = rownames(r.cor)[row(r.cor)],
        vars = colnames(r.cor)[col(r.cor)],
        values = c(r.cor)
    )
    
    # Calculate distance between residuals
    r.tall$d <- abs(as.numeric(r.tall$rows) - as.numeric(r.tall$vars))
    
    # Aggregate values by distance
    r.tall <- aggregate(values ~ d, data = r.tall, function(x) { tanh(mean(atanh(x))) })
    
    return(r.tall)
}

# MLR model with Week as factor
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)

# Calculate and plot correlogram
mycorgram <- cor.gram(mymodel)

ggplot(mycorgram, aes(x = d, y = values)) +
    geom_point() +
    geom_smooth(method = "loess", size = 1.5) +
    ggtitle("Correlogram of Monkey Memory MLR Residuals") +
    xlab("Distance between residuals (Week)") +
    ylim(0, 1)

## `geom_smooth()` using formula = 'y ~ x'

# Load libraries
library(nlme)

# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)

# MLR model (already defined)
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)

# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corCompSymm(form = ~ 1 | Monkey))

# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week, 
                data = data, 
                correlation = corAR1(form = ~ 1 | Monkey))

# Unstructured model (may not run due to insufficient data)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corAR1(form = ~ 1 | Monkey))

# Compare AIC values
print(AIC(mymodel))

## [1] 725.2303

print(AIC(CSmodel))

## [1] 706.9572

print(AIC(AR1model))

## [1] 709.2408

print(AIC(UNmodel))

## [1] 709.2408

print(mymodel)

## 
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
## 
## Coefficients:
##           (Intercept)       TreatmentTreated                   Week  
##                81.788                -18.505                 -1.043  
## TreatmentTreated:Week  
##                 1.387

print(CSmodel)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -347.4786
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##             81.788123            -18.505353             -1.043118 
## TreatmentTreated:Week 
##              1.387027 
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##       Rho 
## 0.3504484 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.32539

print(AR1model)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -348.6204
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##            80.7556220           -16.8676715            -0.8650237 
## TreatmentTreated:Week 
##             1.1175535 
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##      Phi 
## 0.415961 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546

print(UNmodel)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##   Log-restricted-likelihood: -348.6204
## 
## Coefficients:
##           (Intercept)      TreatmentTreated                  Week 
##            80.7556220           -16.8676715            -0.8650237 
## TreatmentTreated:Week 
##             1.1175535 
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##      Phi 
## 0.415961 
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546

Plotting:

summary(mymodel)

## 
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.584  -8.377  -1.181   8.137  30.215 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            81.7881     4.2729  19.141  < 2e-16 ***
## TreatmentTreated      -18.5054     5.4660  -3.386  0.00107 ** 
## Week                   -1.0431     0.4343  -2.402  0.01847 *  
## TreatmentTreated:Week   1.3870     0.5556   2.497  0.01444 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.16 on 86 degrees of freedom
## Multiple R-squared:  0.1274, Adjusted R-squared:  0.09695 
## F-statistic: 4.185 on 3 and 86 DF,  p-value: 0.008152

summary(CSmodel)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##        AIC      BIC    logLik
##   706.9572 721.6833 -347.4786
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##       Rho 
## 0.3504484 
## 
## Coefficients:
##                           Value Std.Error   t-value p-value
## (Intercept)            81.78812  4.587637 17.827940  0.0000
## TreatmentTreated      -18.50535  5.868525 -3.153322  0.0022
## Week                   -1.04312  0.354381 -2.943491  0.0042
## TreatmentTreated:Week   1.38703  0.453326  3.059667  0.0030
## 
##  Correlation: 
##                       (Intr) TrtmnT Week  
## TreatmentTreated      -0.782              
## Week                  -0.649  0.507       
## TreatmentTreated:Week  0.507 -0.649 -0.782
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.14509622 -0.62862098 -0.08864107  0.61064724  2.26745238 
## 
## Residual standard error: 13.32539 
## Degrees of freedom: 90 total; 86 residual

summary(AR1model)

## Generalized least squares fit by REML
##   Model: NewPerCorrect ~ Treatment * Week 
##   Data: data 
##        AIC      BIC    logLik
##   709.2408 723.9668 -348.6204
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Monkey 
##  Parameter estimate(s):
##      Phi 
## 0.415961 
## 
## Coefficients:
##                           Value Std.Error   t-value p-value
## (Intercept)            80.75562  5.188659 15.563872  0.0000
## TreatmentTreated      -16.86767  6.637355 -2.541324  0.0128
## Week                   -0.86502  0.485895 -1.780268  0.0786
## TreatmentTreated:Week   1.11755  0.621559  1.797984  0.0757
## 
##  Correlation: 
##                       (Intr) TrtmnT Week  
## TreatmentTreated      -0.782              
## Week                  -0.799  0.625       
## TreatmentTreated:Week  0.625 -0.799 -0.782
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.20974059 -0.64675208 -0.09514133  0.60074148  2.31265412 
## 
## Residual standard error: 13.43546 
## Degrees of freedom: 90 total; 86 residual

# Predicted values for each model
mymodel_pred <- predict(mymodel, newdata = data)
CSmodel_pred <- predict(CSmodel, newdata = data)
AR1model_pred <- predict(AR1model, newdata = data)

# Plot actual vs. predicted values for each model
plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (MLR) :Plot 1")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")

plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (CS) :Plot 2")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")

plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (AR1): Plot 3")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")

# Plot residuals vs. fitted values for each model
plot(mymodel_pred, data$NewPerCorrect - mymodel_pred, main = "Residuals vs. Fitted (MLR): Plot 4")
abline(h = 0, lwd = 2, col = "red")

plot(CSmodel_pred, data$NewPerCorrect - CSmodel_pred, main = "Residuals vs. Fitted (CS): PLot 5")
abline(h = 0, lwd = 2, col = "red")

plot(AR1model_pred, data$NewPerCorrect - AR1model_pred, main = "Residuals vs. Fitted (AR1): Plot 6")
abline(h = 0, lwd = 2, col = "red")

# Extract predictions from each model
mymodel_pred <- predict(mymodel)
CSmodel_pred <- predict(CSmodel)
AR1model_pred <- predict(AR1model)
# Unstructured model prediction might not be available

# Optional: Plot the raw data
plot(data$Week, data$NewPerCorrect, xlab = "Week : Plot 7", ylab = "NewPerCorrect")

# Add lines for predicted values from each model
lines(data$Week, mymodel_pred, col = "blue", lty = 2, lwd = 1)
lines(data$Week, CSmodel_pred, col = "green", lty = 2, lwd = 1)
lines(data$Week, AR1model_pred, col = "red", lty = 2, lwd = 1)
legend("topright", legend = c("Raw Data", "MLR", "CS", "AR1"), col=c("black", "blue", "green", "red"), fill = NA)

# Load necessary libraries
library(nlme)
library(ggplot2)

# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)

# MLR model
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)

# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corCompSymm(form = ~ 1 | Monkey))

# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week, 
                data = data, 
                correlation = corAR1(form = ~ 1 | Monkey))

# Unstructured model (modeled as AR1 here)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corAR1(form = ~ 1 | Monkey))

# AIC values
aic_values <- data.frame(
  Model = c("MLR", "CS", "AR1", "UN"),
  AIC = c(AIC(mymodel), AIC(CSmodel), AIC(AR1model), AIC(UNmodel))
)

# Plot
ggplot(aic_values, aes(x = Model, y = AIC, fill = Model)) +
  geom_bar(stat = "identity") +
  ggtitle("Comparison of AIC Values for Different Models: Plot 8") +
  xlab("Model") +
  ylab("AIC Value") +
  theme_minimal()

Analysis

Correlogram: The plot presented is a correlogram of the residuals from a Multiple Linear Regression (MLR) analysis of Monkey Memory: - Horizontal Axis: Represents the distance between the residuals in terms of weeks. - Vertical Axis: Indicates the correlation values of the residuals. - Points: Each point reflects the correlation coefficient between residuals at a given distance apart. - Line with Shading: The blue line possibly represents a loess smoothing of the correlation coefficients, and the shaded area might indicate a confidence interval around the smooth. The correlogram shows high initial correlation between residuals that declines with increasing time gaps, typical for time-series data. However, the fluctuating correlation, not fitting a simple autoregressive or compound symmetry pattern, suggests the need for a nuanced model beyond standard MLR to appropriately capture the data’s correlation structure.

Observations: - The MLR model, labeled as mymodel, has the highest AIC value of 725.2303. This suggests that it is not the most suitable fit for the data compared to the other models. - The Compound Symmetry (CS) model shows a lower AIC (706.9572) than both the MLR and AR1 models. This indicates that the CS model might be a better fit for the data. - The Autoregressive (AR1) model, with an AIC of 709.2408, is slightly less preferable than the CS model based on the AIC value. - The unstructured model, UNmodel, displays the same AIC as the AR1 model. However, there is a concern that it may not have executed correctly due to potentially insufficient data. From this analysis, I conclude that the Compound Symmetry (CS) model appears to be the most suitable fit for the dataset among the models compared, considering the AIC values. Nevertheless, a couple of points are worth noting: - The small difference in AIC values between the CS and AR1 models implies that both could be reasonable choices, depending on the context. - The final selection of the model should also consider other factors such as the interpretability of the model structure and the specific objectives of the research. It’s important to remember that the choice of the most appropriate correlation structure in statistical modeling often involves balancing quantitative metrics like AIC with qualitative factors such as the context of the study and the ease of interpreting the model.

Comparison of AIC Values foir Different Models:” Based on the AIC comparison, UN model emerges as the most efficient, followed by CS and MLR, with AR1 being the least preferred. Remember, AIC is one piece of the puzzle, and further evaluation is needed for a definitive conclusion.

Further Plotting

Plot 1: This plot illustrates the actual values versus the predictions made by the Multiple Linear Regression (MLR) model: - Scatter Plot Points: They represent the actual NewPerCorrect values observed over different weeks. - Prediction Line: The MLR model has predicted a flat line, indicating no change in NewPerCorrect values over time according to the model. - Week Axis: It shows the timeline of the data collection, with weeks on the horizontal axis. - NewPerCorrect Axis: This axis, on the vertical, shows the percentage correct. From this plot, I can observe that the MLR model’s predictions do not fluctuate with the actual data; instead, they suggest a constant value across all weeks. This could indicate that the MLR model isn’t capturing any potential trends or changes in the data over time. It suggests that the factors included in the MLR model do not explain the variability in the weekly data, which could imply the need for a more complex model or the inclusion of other explanatory variables.

Plot 2: In this plot, we’re examining the actual versus predicted values from the Compound Symmetry (CS) model: - Scatter Plot Points: These represent the actual NewPerCorrect values across different weeks. The spread indicates variability in the dataset. - Prediction Line: The CS model provides a horizontal blue line, which shows that the model predicts a constant NewPerCorrect value, regardless of the time point. - Week Axis: The horizontal axis shows the progression of weeks, suggesting the longitudinal nature of the dataset. - NewPerCorrect Axis: The vertical axis shows the NewPerCorrect values. My analysis of this plot is that the CS model does not capture any trend or seasonality in the data since it predicts a flat line. The actual data points vary around this line, indicating that the model’s constant prediction is not fitting the weekly variations in the data. This could mean that either the true relationship is not captured by the model, or that the variability in the data is due to random fluctuations rather than a systematic pattern.

Plot 3: This plot displays actual vs. predicted values from the Autoregressive (AR1) model: - Scatter Plot Points: The actual NewPerCorrect values are spread out over the weeks, with considerable variance from the highest to the lowest points. - Prediction Line: The AR1 model’s predictions are represented by a straight, horizontal blue line, which does not seem to reflect any changes or trends in the data over time. - Week Axis: The horizontal axis labeled data$Week indicates the time series aspect of the data, spread across different weeks. - NewPerCorrect Axis: The vertical axis shows the NewPerCorrect values, both observed and predicted. In brief, the AR1 model’s predictions suggest a constant level of NewPerCorrect across all weeks, which does not align with the fluctuation observed in the actual data. This might indicate that the AR1 model does not capture the variability in the data adequately.

Plot 4: This plot displays the residuals from a Multiple Linear Regression (MLR) model against its fitted values: - Horizontal Axis (Fitted Values): Displays the predictions from the MLR model, reflecting the expected NewPerCorrect values. - Vertical Axis (Residuals): Shows the residuals, which are the differences between the actual data points and the model’s predictions. - Scatter Plot Points: Each represents a residual for a given predicted value. The spread of these points gives us an indication of how well the model fits the data at different levels of the predicted values. - Zero Line: The red line represents a residual of zero, where the model’s prediction would be exactly correct. Analyzing this plot, the residuals appear to be randomly scattered around the zero line without showing any apparent patterns, suggesting that the MLR model’s assumptions are not violated. However, there is some spread in the residuals, indicating potential outliers or moments where the model does not capture the data’s variability very well. Overall, the MLR model seems to provide an even distribution of residuals across the range of predictions.

Plot 5: In the provided plot, which is a residual versus fitted plot for the Compound Symmetry (CS) model: - Horizontal Axis (Fitted Values): The fitted values from the CS model are shown, representing the model’s predictions of the NewPerCorrect values. - Vertical Axis (Residuals): The residuals displayed indicate the differences between the actual and predicted values. - Scatter Plot Points: Each point represents a residual for its corresponding predicted value. The distribution of points gives us an idea of how well the model fits the data. - Zero Line: The red line across zero suggests where the model predictions would be perfect, with no residual. My observation is that the residuals are relatively evenly scattered above and below the zero line, with no clear pattern or structure. This suggests that the CS model does not exhibit systematic bias in the residuals. However, similar to the AR1 model, there are some points with relatively large residuals, indicating potential outliers or instances where the model’s predictions are less accurate.

Plot 6: This plot is a residual versus fitted plot for the AR1 (Autoregressive of order 1) model: - Horizontal Axis (Fitted Values): It represents the predicted values from the AR1 model, expected NewPerCorrect values. - Vertical Axis (Residuals): Indicates the residuals, which are the differences between the actual and predicted NewPerCorrect values. - Scatter Plot Points: Each point shows a residual for a corresponding prediction. The distribution of these points suggests the variation in the model’s prediction accuracy. - Zero Line: The red line at zero marks where the predicted values match the actual values exactly. From this plot, I see that the residuals for the AR1 model are randomly dispersed around the zero line, suggesting no clear pattern of errors. This randomness is a good sign, indicating that the model’s assumptions are likely being met. However, there are some points with larger residuals, indicating predictions that deviate more from the actual values. These could be outliers or instances where the model does not fit as well. Overall, the AR1 model seems to have a consistent spread of residuals across its predictions.

Plot 1: This plot illustrates the actual values versus the predictions made by the Multiple Linear Regression (MLR) model: - Scatter Plot Points: They represent the actual NewPerCorrect values observed over different weeks. - Prediction Line: The MLR model has predicted a flat line, indicating no change in NewPerCorrect values over time according to the model. - Week Axis: It shows the timeline of the data collection, with weeks on the horizontal axis. - NewPerCorrect Axis: This axis, on the vertical, shows the percentage correct. From this plot, I can observe that the MLR model’s predictions do not fluctuate with the actual data; instead, they suggest a constant value across all weeks. This could indicate that the MLR model isn’t capturing any potential trends or changes in the data over time. It suggests that the factors included in the MLR model do not explain the variability in the weekly data, which could imply the need for a more complex model or the inclusion of other explanatory variables.

Plot 2: In this plot, we’re examining the actual versus predicted values from the Compound Symmetry (CS) model: - Scatter Plot Points: These represent the actual NewPerCorrect values across different weeks. The spread indicates variability in the dataset. - Prediction Line: The CS model provides a horizontal blue line, which shows that the model predicts a constant NewPerCorrect value, regardless of the time point. - Week Axis: The horizontal axis shows the progression of weeks, suggesting the longitudinal nature of the dataset. - NewPerCorrect Axis: The vertical axis shows the NewPerCorrect values. My analysis of this plot is that the CS model does not capture any trend or seasonality in the data since it predicts a flat line. The actual data points vary around this line, indicating that the model’s constant prediction is not fitting the weekly variations in the data. This could mean that either the true relationship is not captured by the model, or that the variability in the data is due to random fluctuations rather than a systematic pattern.

Plot 3: This plot displays actual vs. predicted values from the Autoregressive (AR1) model: - Scatter Plot Points: The actual NewPerCorrect values are spread out over the weeks, with considerable variance from the highest to the lowest points. - Prediction Line: The AR1 model’s predictions are represented by a straight, horizontal blue line, which does not seem to reflect any changes or trends in the data over time. - Week Axis: The horizontal axis labeled data$Week indicates the time series aspect of the data, spread across different weeks. - NewPerCorrect Axis: The vertical axis shows the NewPerCorrect values, both observed and predicted. In brief, the AR1 model’s predictions suggest a constant level of NewPerCorrect across all weeks, which does not align with the fluctuation observed in the actual data. This might indicate that the AR1 model does not capture the variability in the data adequately.

Plot 4: This plot displays the residuals from a Multiple Linear Regression (MLR) model against its fitted values: - Horizontal Axis (Fitted Values): Displays the predictions from the MLR model, reflecting the expected NewPerCorrect values. - Vertical Axis (Residuals): Shows the residuals, which are the differences between the actual data points and the model’s predictions. - Scatter Plot Points: Each represents a residual for a given predicted value. The spread of these points gives us an indication of how well the model fits the data at different levels of the predicted values. - Zero Line: The red line represents a residual of zero, where the model’s prediction would be exactly correct. Analyzing this plot, the residuals appear to be randomly scattered around the zero line without showing any apparent patterns, suggesting that the MLR model’s assumptions are not violated. However, there is some spread in the residuals, indicating potential outliers or moments where the model does not capture the data’s variability very well. Overall, the MLR model seems to provide an even distribution of residuals across the range of predictions.

Plot 5: In the provided plot, which is a residual versus fitted plot for the Compound Symmetry (CS) model: - Horizontal Axis (Fitted Values): The fitted values from the CS model are shown, representing the model’s predictions of the NewPerCorrect values. - Vertical Axis (Residuals): The residuals displayed indicate the differences between the actual and predicted values. - Scatter Plot Points: Each point represents a residual for its corresponding predicted value. The distribution of points gives us an idea of how well the model fits the data. - Zero Line: The red line across zero suggests where the model predictions would be perfect, with no residual. My observation is that the residuals are relatively evenly scattered above and below the zero line, with no clear pattern or structure. This suggests that the CS model does not exhibit systematic bias in the residuals. However, similar to the AR1 model, there are some points with relatively large residuals, indicating potential outliers or instances where the model’s predictions are less accurate.

Plot 6: This plot is a residual versus fitted plot for the AR1 (Autoregressive of order 1) model: - Horizontal Axis (Fitted Values): It represents the predicted values from the AR1 model, expected NewPerCorrect values. - Vertical Axis (Residuals): Indicates the residuals, which are the differences between the actual and predicted NewPerCorrect values. - Scatter Plot Points: Each point shows a residual for a corresponding prediction. The distribution of these points suggests the variation in the model’s prediction accuracy. - Zero Line: The red line at zero marks where the predicted values match the actual values exactly. From this plot, I see that the residuals for the AR1 model are randomly dispersed around the zero line, suggesting no clear pattern of errors. This randomness is a good sign, indicating that the model’s assumptions are likely being met. However, there are some points with larger residuals, indicating predictions that deviate more from the actual values. These could be outliers or instances where the model does not fit as well. Overall, the AR1 model seems to have a consistent spread of residuals across its predictions.

Plot 7: The line plot overlays the fitted regression lines from the MLR, CS, and AR1 models on the actual data points across different weeks. Here’s my concise analysis: - The raw data shows variability in the ‘NewPerCorrect’ values across the weeks, which all models aim to capture. - The MLR model’s line suggests a constant trend without accounting for the within-subject correlation. - The CS and AR1 models, indicated by the dashed lines, seem to follow a similar trajectory, implying that both models lead to comparable predictions across the weeks. - All models appear to underfit the data at the higher ‘NewPerCorrect’ values, particularly around weeks 2 to 4 and weeks 10 to 12, where there are noticeable deviations from the fitted lines. From this visualization, I can see that while the MLR, CS, and AR1 models provide an overall approximation of the trend in data, there may still be room for improving the models to capture the variability more accurately, especially for the outliers.

Plot 8: The bar chart compares the AIC values of four different models: AR1, CS, MLR, and UN. The AIC values serve as a metric for model comparison, with lower values indicating a model that balances goodness of fit with complexity. Here’s my brief analysis: - The AR1 model has the highest AIC value, suggesting it may not be the best fit among the options given its complexity and the data at hand. - The CS model shows a significantly lower AIC value, indicating a better fit compared to AR1. - The MLR model, while simpler, has an AIC value that is competitive with the CS model, which suggests it might be a parsimonious choice. - The UN model, which is the most flexible, doesn’t necessarily provide the best fit as indicated by its AIC value, which is close to that of the AR1 model. This visualization helps me conclude that, based on AIC alone, the CS model seems to provide a good balance between fit and complexity for this dataset, while the UN model, despite its flexibility, does not offer a substantial improvement to justify its complexity.

D.

# Load the multcomp library
library(multcomp)

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

# Fit the model with the best AIC
CSmodel <- gls(NewPerCorrect ~ Treatment * Week, 
               data = data, 
               correlation = corCompSymm(form = ~ 1 | Monkey))

# Create a contrast matrix for all pairwise comparisons
contrasts <- glht(CSmodel, linfct = mcp(Treatment = "Tukey"))

## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
## default contrast might be inappropriate

# Adjust for multiple testing using Bonferroni correction
summary(contrasts, test = adjusted("bonferroni"))

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: gls(model = NewPerCorrect ~ Treatment * Week, data = data, correlation = corCompSymm(form = ~1 | 
##     Monkey))
## 
## Linear Hypotheses:
##                        Estimate Std. Error z value Pr(>|z|)   
## Treated - Control == 0  -18.505      5.869  -3.153  0.00161 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- bonferroni method)

Analysis:

The output from the Tukey contrasts with Bonferroni adjustment indicates a significant difference between the treated and control groups, as denoted by the ’**’ sign next to the p-value (p = 0.00161), which is less than the standard alpha level of 0.05. This significant result after adjusting for multiple comparisons strengthens the evidence that the treatment has an effect. Given this result, it appears reasonable to conclude that the researcher’s hypothesis, which presumably was about the efficacy of the treatment, is supported by the data. The negative estimate (-18.505) suggests that the treatment group has a lower mean of ‘NewPerCorrect’ compared to the control group. This statistically significant difference points to the treatment having a distinct impact on the outcome measured weekly.

Unit 7 HW

MSDS 6372: Jessica McPhaul Linke: https://rpubs.com/Texaschikkita/Unit_7_HW

February 26, 2024

Exercise 1: Conceptual questions

Exercise 2: Brain Sites of Short and Long Term Memory

A.

B.

C:

Plotting:

Analysis

Further Plotting

D.