Answer A:
Answer B:
If the correlogram shows: constant correlation across time lags for your residuals, the appropriate correlation structure to model this behavior would be simple correlation structure or uniform correlation.
This structure assumes that the correlation (ρ) between any two residuals is constant regardless of the time difference (lag) between them.
?corStruct
. Provide a list of the correlation structure
names. Keep this in mind that while we’ve explored AR1 (corExp) and CS
(corCompSymm), we can always try others to get a better fit.# Load the nlme package
library(nlme)
# Get help for corStruct function
?corStruct
## starting httpd help server ... done
# manually list the commonly used correlation structures.
correlation_structures <- c("corAR1", "corARMA", "corCAR1", "corCompSymm", "corExp",
"corGaus", "corLin", "corRatio", "corSpher", "corSymm")
# Print list
print(correlation_structures)
## [1] "corAR1" "corARMA" "corCAR1" "corCompSymm" "corExp"
## [6] "corGaus" "corLin" "corRatio" "corSpher" "corSymm"
In the late 1980s and early 1990s, it was hypothesized that the hippocompal region of brain is a potential site for short term memory and not long term memory. To test this, 18 monkeys were given training to discriminate 100 pairs of objects. After their training, 11 monkey were randomly selected and went through a procedure that had access to their hippocampal formations blocked. The remaining 7 were left untreated.
The interesting bit here is that not all 100 pairs of objects were trained at the same time. 20 were trained to all the monkeys 2 weeks prior to the blockage, 20 were trained 4 weeks prior, and 20 at 8,12, and 16 weeks prior. After the blockage to the hippocampus all monkeys were then given a test to see how many pairs of objects they could discriminate. If there hypothesis was true, there should be a difference between the two groups of monkeys when looking at their performance of objecst trained 2 and 4 weeks prior (short term memory) while there should not be much of a difference for the objects for later times prior (long term memory).
A plot of the data set is below. The reponse is the
NewPercentCorrect
variables which is the percentage of
correct objects for each of the 20 items at each training week. Use the
data set to answer the following questions
Based on the graph, it doesn’t appear that the trend is linear to model Week as a numeric variable. Additionally the question of interest is to examine what is going on between the treatment versus control at each training week (prior to hippocampal blockage) so treating it as a factor will be helpful. Based on the graph do you think an interaction term is needed between Treatment and Week? Explain your view point.
Use the modeling codes from the Prelive Assignment to obtain a correlogram of the residuals fitted by a basic MLR model. Be sure to include Week as a factor variable and not numeric. Comment on whether it is obvious on whether a specific correlation structure would work well or not.
Fit two repeated measures models, one compound symmetry and one AR1. Compare the AIC values. You can optionally try the unstructured one if it will run.
Based on the best AIC fit, use similar codes from the pre live assignment to compare the treatment groups at each Week using a Bonferroni correction. Does the data conclude that the researchers hypothesis appears reasonable? Point to which tests support your answer.
Answers Exercise 2:
It is reasonable to include an interaction term between treatment and week to capture the varying effect of treatment across different times. 1. Visually distinct trends: The graph shows that the trends for the treatment and control groups differ significantly across weeks. The control group exhibits a peak at week 4 followed by a decline, while the treated group’s performance fluctuates. 2. Inconsistent treatment effect: This visual difference suggests that the effect of the treatment is not consistent over time. In other words, the treatment might have a different impact on the percentage of correct responses depending on the specific week of training. 3. Omitted interaction effect: If an interaction term is not included in the model, this varying effect of treatment across weeks would be missed. The model would only be able to capture an average treatment effect across all weeks, which might not accurately represent the true relationship between treatment and performance. 4. Therefore: Including an interaction term between treatment and week allows the model to capture the differential effect of the treatment on the percentage of correct responses depending on the week of training, providing a more accurate representation of the data.
I analyzed the plot and observed that: * Non-linear trend: Both groups’ performance slopes change, indicating a non-linear relationship between weeks and scores. * Treated group: Performance drops significantly around Week 2 and fluctuates afterwards. * Control group: Performance stays relatively stable.
These contrasting patterns suggest the treatment’s effect varies across weeks. The difference in slopes, especially the treated group’s drop, highlights a week-dependent relationship with performance. Therefore, including an interaction term between Treatment and Week in the statistical model is crucial to capture this complex relationship.
# Load libraries
library(tidyverse)
library(reshape2) # For correlogram
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(Hmisc) # For correlogram
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)
# Correlogram function
cor.gram <- function(mymodel, resid.col = "NewPerCorrect") {
# Get residuals
r.raw <- residuals(mymodel)
# Create data frame with subject and week
r.frame <- data.frame(
Subject = data$Monkey,
Week = data$Week
)
# Add residuals as a column
r.frame[, resid.col] <- r.raw
# Cast data to wide format (week as columns)
r.wide <- acast(r.frame, Subject ~ Week, value.var = resid.col)
# Calculate correlation matrix
r.cor <- rcorr(r.wide)$r
# Create data frame for correlogram
r.tall <- data.frame(
rows = rownames(r.cor)[row(r.cor)],
vars = colnames(r.cor)[col(r.cor)],
values = c(r.cor)
)
# Calculate distance between residuals
r.tall$d <- abs(as.numeric(r.tall$rows) - as.numeric(r.tall$vars))
# Aggregate values by distance
r.tall <- aggregate(values ~ d, data = r.tall, function(x) { tanh(mean(atanh(x))) })
return(r.tall)
}
# MLR model with Week as factor
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)
# Calculate and plot correlogram
mycorgram <- cor.gram(mymodel)
ggplot(mycorgram, aes(x = d, y = values)) +
geom_point() +
geom_smooth(method = "loess", size = 1.5) +
ggtitle("Correlogram of Monkey Memory MLR Residuals") +
xlab("Distance between residuals (Week)") +
ylim(0, 1)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
# Load libraries
library(nlme)
# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)
# MLR model (already defined)
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)
# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corCompSymm(form = ~ 1 | Monkey))
# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# Unstructured model (may not run due to insufficient data)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# Compare AIC values
print(AIC(mymodel))
## [1] 725.2303
print(AIC(CSmodel))
## [1] 706.9572
print(AIC(AR1model))
## [1] 709.2408
print(AIC(UNmodel))
## [1] 709.2408
print(mymodel)
##
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 81.788 -18.505 -1.043
## TreatmentTreated:Week
## 1.387
print(CSmodel)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -347.4786
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 81.788123 -18.505353 -1.043118
## TreatmentTreated:Week
## 1.387027
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Rho
## 0.3504484
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.32539
print(AR1model)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -348.6204
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 80.7556220 -16.8676715 -0.8650237
## TreatmentTreated:Week
## 1.1175535
##
## Correlation Structure: AR(1)
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Phi
## 0.415961
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546
print(UNmodel)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -348.6204
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 80.7556220 -16.8676715 -0.8650237
## TreatmentTreated:Week
## 1.1175535
##
## Correlation Structure: AR(1)
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Phi
## 0.415961
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546
Interpretation:
The correlogram shows how the correlation between residuals changes as the distance (difference in weeks) between them changes. Ideally, the correlogram should be close to a flat line at zero, indicating no correlation between residuals. However, the provided image suggests that there might be some correlation between the residuals, especially for smaller distances (closer weeks). This indicates a potential violation of the independence assumption of MLR, which can affect the accuracy of the model’s results.
While the specific correlation structure is not immediately clear from the correlogram, considering structures like compound symmetry (CS) or autoregressive (AR1) might be appropriate to account for potential dependencies in the residuals when fitting the model.
# Load libraries
library(tidyverse)
library(reshape2) # For correlogram
library(Hmisc) # For correlogram
# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)
# Correlogram function
cor.gram <- function(mymodel, resid.col = "NewPerCorrect") {
# Get residuals
r.raw <- residuals(mymodel)
# Create data frame with subject and week
r.frame <- data.frame(
Subject = data$Monkey,
Week = data$Week
)
# Add residuals as a column
r.frame[, resid.col] <- r.raw
# Cast data to wide format (week as columns)
r.wide <- acast(r.frame, Subject ~ Week, value.var = resid.col)
# Calculate correlation matrix
r.cor <- rcorr(r.wide)$r
# Create data frame for correlogram
r.tall <- data.frame(
rows = rownames(r.cor)[row(r.cor)],
vars = colnames(r.cor)[col(r.cor)],
values = c(r.cor)
)
# Calculate distance between residuals
r.tall$d <- abs(as.numeric(r.tall$rows) - as.numeric(r.tall$vars))
# Aggregate values by distance
r.tall <- aggregate(values ~ d, data = r.tall, function(x) { tanh(mean(atanh(x))) })
return(r.tall)
}
# MLR model with Week as factor
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)
# Calculate and plot correlogram
mycorgram <- cor.gram(mymodel)
ggplot(mycorgram, aes(x = d, y = values)) +
geom_point() +
geom_smooth(method = "loess", size = 1.5) +
ggtitle("Correlogram of Monkey Memory MLR Residuals") +
xlab("Distance between residuals (Week)") +
ylim(0, 1)
## `geom_smooth()` using formula = 'y ~ x'
# Load libraries
library(nlme)
# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)
# MLR model (already defined)
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)
# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corCompSymm(form = ~ 1 | Monkey))
# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# Unstructured model (may not run due to insufficient data)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# Compare AIC values
print(AIC(mymodel))
## [1] 725.2303
print(AIC(CSmodel))
## [1] 706.9572
print(AIC(AR1model))
## [1] 709.2408
print(AIC(UNmodel))
## [1] 709.2408
print(mymodel)
##
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 81.788 -18.505 -1.043
## TreatmentTreated:Week
## 1.387
print(CSmodel)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -347.4786
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 81.788123 -18.505353 -1.043118
## TreatmentTreated:Week
## 1.387027
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Rho
## 0.3504484
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.32539
print(AR1model)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -348.6204
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 80.7556220 -16.8676715 -0.8650237
## TreatmentTreated:Week
## 1.1175535
##
## Correlation Structure: AR(1)
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Phi
## 0.415961
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546
print(UNmodel)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## Log-restricted-likelihood: -348.6204
##
## Coefficients:
## (Intercept) TreatmentTreated Week
## 80.7556220 -16.8676715 -0.8650237
## TreatmentTreated:Week
## 1.1175535
##
## Correlation Structure: AR(1)
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Phi
## 0.415961
## Degrees of freedom: 90 total; 86 residual
## Residual standard error: 13.43546
summary(mymodel)
##
## Call:
## lm(formula = NewPerCorrect ~ Treatment * Week, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.584 -8.377 -1.181 8.137 30.215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.7881 4.2729 19.141 < 2e-16 ***
## TreatmentTreated -18.5054 5.4660 -3.386 0.00107 **
## Week -1.0431 0.4343 -2.402 0.01847 *
## TreatmentTreated:Week 1.3870 0.5556 2.497 0.01444 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.16 on 86 degrees of freedom
## Multiple R-squared: 0.1274, Adjusted R-squared: 0.09695
## F-statistic: 4.185 on 3 and 86 DF, p-value: 0.008152
summary(CSmodel)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## AIC BIC logLik
## 706.9572 721.6833 -347.4786
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Rho
## 0.3504484
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 81.78812 4.587637 17.827940 0.0000
## TreatmentTreated -18.50535 5.868525 -3.153322 0.0022
## Week -1.04312 0.354381 -2.943491 0.0042
## TreatmentTreated:Week 1.38703 0.453326 3.059667 0.0030
##
## Correlation:
## (Intr) TrtmnT Week
## TreatmentTreated -0.782
## Week -0.649 0.507
## TreatmentTreated:Week 0.507 -0.649 -0.782
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.14509622 -0.62862098 -0.08864107 0.61064724 2.26745238
##
## Residual standard error: 13.32539
## Degrees of freedom: 90 total; 86 residual
summary(AR1model)
## Generalized least squares fit by REML
## Model: NewPerCorrect ~ Treatment * Week
## Data: data
## AIC BIC logLik
## 709.2408 723.9668 -348.6204
##
## Correlation Structure: AR(1)
## Formula: ~1 | Monkey
## Parameter estimate(s):
## Phi
## 0.415961
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 80.75562 5.188659 15.563872 0.0000
## TreatmentTreated -16.86767 6.637355 -2.541324 0.0128
## Week -0.86502 0.485895 -1.780268 0.0786
## TreatmentTreated:Week 1.11755 0.621559 1.797984 0.0757
##
## Correlation:
## (Intr) TrtmnT Week
## TreatmentTreated -0.782
## Week -0.799 0.625
## TreatmentTreated:Week 0.625 -0.799 -0.782
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.20974059 -0.64675208 -0.09514133 0.60074148 2.31265412
##
## Residual standard error: 13.43546
## Degrees of freedom: 90 total; 86 residual
# Predicted values for each model
mymodel_pred <- predict(mymodel, newdata = data)
CSmodel_pred <- predict(CSmodel, newdata = data)
AR1model_pred <- predict(AR1model, newdata = data)
# Plot actual vs. predicted values for each model
plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (MLR) :Plot 1")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")
plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (CS) :Plot 2")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")
plot(data$NewPerCorrect ~ data$Week, main = "Actual vs. Predicted (AR1): Plot 3")
abline(lm(NewPerCorrect ~ Week, data = data), lwd = 2, col = "blue")
# Plot residuals vs. fitted values for each model
plot(mymodel_pred, data$NewPerCorrect - mymodel_pred, main = "Residuals vs. Fitted (MLR): Plot 4")
abline(h = 0, lwd = 2, col = "red")
plot(CSmodel_pred, data$NewPerCorrect - CSmodel_pred, main = "Residuals vs. Fitted (CS): PLot 5")
abline(h = 0, lwd = 2, col = "red")
plot(AR1model_pred, data$NewPerCorrect - AR1model_pred, main = "Residuals vs. Fitted (AR1): Plot 6")
abline(h = 0, lwd = 2, col = "red")
# Extract predictions from each model
mymodel_pred <- predict(mymodel)
CSmodel_pred <- predict(CSmodel)
AR1model_pred <- predict(AR1model)
# Unstructured model prediction might not be available
# Optional: Plot the raw data
plot(data$Week, data$NewPerCorrect, xlab = "Week : Plot 7", ylab = "NewPerCorrect")
# Add lines for predicted values from each model
lines(data$Week, mymodel_pred, col = "blue", lty = 2, lwd = 1)
lines(data$Week, CSmodel_pred, col = "green", lty = 2, lwd = 1)
lines(data$Week, AR1model_pred, col = "red", lty = 2, lwd = 1)
legend("topright", legend = c("Raw Data", "MLR", "CS", "AR1"), col=c("black", "blue", "green", "red"), fill = NA)
# Load necessary libraries
library(nlme)
library(ggplot2)
# Read the data
data <- read.csv("monkey.csv", stringsAsFactors = TRUE)
# MLR model
mymodel <- lm(NewPerCorrect ~ Treatment * Week, data = data)
# Compound Symmetry (CS) model
CSmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corCompSymm(form = ~ 1 | Monkey))
# Autoregressive (AR1) model
AR1model <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# Unstructured model (modeled as AR1 here)
UNmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corAR1(form = ~ 1 | Monkey))
# AIC values
aic_values <- data.frame(
Model = c("MLR", "CS", "AR1", "UN"),
AIC = c(AIC(mymodel), AIC(CSmodel), AIC(AR1model), AIC(UNmodel))
)
# Plot
ggplot(aic_values, aes(x = Model, y = AIC, fill = Model)) +
geom_bar(stat = "identity") +
ggtitle("Comparison of AIC Values for Different Models: Plot 8") +
xlab("Model") +
ylab("AIC Value") +
theme_minimal()
Correlogram: The plot presented is a correlogram of the residuals from a Multiple Linear Regression (MLR) analysis of Monkey Memory: - Horizontal Axis: Represents the distance between the residuals in terms of weeks. - Vertical Axis: Indicates the correlation values of the residuals. - Points: Each point reflects the correlation coefficient between residuals at a given distance apart. - Line with Shading: The blue line possibly represents a loess smoothing of the correlation coefficients, and the shaded area might indicate a confidence interval around the smooth. The correlogram shows high initial correlation between residuals that declines with increasing time gaps, typical for time-series data. However, the fluctuating correlation, not fitting a simple autoregressive or compound symmetry pattern, suggests the need for a nuanced model beyond standard MLR to appropriately capture the data’s correlation structure.
Observations: - The MLR model, labeled as
mymodel
, has the highest AIC value of 725.2303. This
suggests that it is not the most suitable fit for the data compared to
the other models. - The Compound Symmetry (CS) model shows a lower AIC
(706.9572) than both the MLR and AR1 models. This indicates that the CS
model might be a better fit for the data. - The Autoregressive (AR1)
model, with an AIC of 709.2408, is slightly less preferable than the CS
model based on the AIC value. - The unstructured model,
UNmodel
, displays the same AIC as the AR1 model. However,
there is a concern that it may not have executed correctly due to
potentially insufficient data. From this analysis, I conclude that the
Compound Symmetry (CS) model appears to be the most suitable fit for the
dataset among the models compared, considering the AIC values.
Nevertheless, a couple of points are worth noting: - The small
difference in AIC values between the CS and AR1 models implies that both
could be reasonable choices, depending on the context. - The final
selection of the model should also consider other factors such as the
interpretability of the model structure and the specific objectives of
the research. It’s important to remember that the choice of the most
appropriate correlation structure in statistical modeling often involves
balancing quantitative metrics like AIC with qualitative factors such as
the context of the study and the ease of interpreting the model.
Comparison of AIC Values foir Different Models:” Based on the AIC comparison, UN model emerges as the most efficient, followed by CS and MLR, with AR1 being the least preferred. Remember, AIC is one piece of the puzzle, and further evaluation is needed for a definitive conclusion.
Plot 1: This plot illustrates the actual values
versus the predictions made by the Multiple Linear Regression (MLR)
model: - Scatter Plot Points: They represent the actual
NewPerCorrect
values observed over different weeks. -
Prediction Line: The MLR model has predicted a flat
line, indicating no change in NewPerCorrect
values over
time according to the model. - Week Axis: It shows the
timeline of the data collection, with weeks on the horizontal axis. -
NewPerCorrect Axis: This axis, on the vertical, shows
the percentage correct. From this plot, I can observe that the MLR
model’s predictions do not fluctuate with the actual data; instead, they
suggest a constant value across all weeks. This could indicate that the
MLR model isn’t capturing any potential trends or changes in the data
over time. It suggests that the factors included in the MLR model do not
explain the variability in the weekly data, which could imply the need
for a more complex model or the inclusion of other explanatory
variables.
Plot 2: In this plot, we’re examining the actual
versus predicted values from the Compound Symmetry (CS) model: -
Scatter Plot Points: These represent the actual
NewPerCorrect
values across different weeks. The spread
indicates variability in the dataset. - Prediction
Line: The CS model provides a horizontal blue line, which shows
that the model predicts a constant NewPerCorrect
value,
regardless of the time point. - Week Axis: The
horizontal axis shows the progression of weeks, suggesting the
longitudinal nature of the dataset. - NewPerCorrect
Axis: The vertical axis shows the NewPerCorrect
values. My analysis of this plot is that the CS model does not capture
any trend or seasonality in the data since it predicts a flat line. The
actual data points vary around this line, indicating that the model’s
constant prediction is not fitting the weekly variations in the data.
This could mean that either the true relationship is not captured by the
model, or that the variability in the data is due to random fluctuations
rather than a systematic pattern.
Plot 3: This plot displays actual vs. predicted
values from the Autoregressive (AR1) model: - Scatter Plot
Points: The actual NewPerCorrect
values are spread
out over the weeks, with considerable variance from the highest to the
lowest points. - Prediction Line: The AR1 model’s
predictions are represented by a straight, horizontal blue line, which
does not seem to reflect any changes or trends in the data over time. -
Week Axis: The horizontal axis labeled
data$Week
indicates the time series aspect of the data,
spread across different weeks. - NewPerCorrect Axis:
The vertical axis shows the NewPerCorrect
values, both
observed and predicted. In brief, the AR1 model’s predictions suggest a
constant level of NewPerCorrect
across all weeks, which
does not align with the fluctuation observed in the actual data. This
might indicate that the AR1 model does not capture the variability in
the data adequately.
Plot 4: This plot displays the residuals from a
Multiple Linear Regression (MLR) model against its fitted values: -
Horizontal Axis (Fitted Values): Displays the
predictions from the MLR model, reflecting the expected
NewPerCorrect
values. - Vertical Axis
(Residuals): Shows the residuals, which are the differences
between the actual data points and the model’s predictions. -
Scatter Plot Points: Each represents a residual for a
given predicted value. The spread of these points gives us an indication
of how well the model fits the data at different levels of the predicted
values. - Zero Line: The red line represents a residual
of zero, where the model’s prediction would be exactly correct.
Analyzing this plot, the residuals appear to be randomly scattered
around the zero line without showing any apparent patterns, suggesting
that the MLR model’s assumptions are not violated. However, there is
some spread in the residuals, indicating potential outliers or moments
where the model does not capture the data’s variability very well.
Overall, the MLR model seems to provide an even distribution of
residuals across the range of predictions.
Plot 5: In the provided plot, which is a residual
versus fitted plot for the Compound Symmetry (CS) model: -
Horizontal Axis (Fitted Values): The fitted values from
the CS model are shown, representing the model’s predictions of the
NewPerCorrect
values. - Vertical Axis
(Residuals): The residuals displayed indicate the differences
between the actual and predicted values. - Scatter Plot
Points: Each point represents a residual for its corresponding
predicted value. The distribution of points gives us an idea of how well
the model fits the data. - Zero Line: The red line
across zero suggests where the model predictions would be perfect, with
no residual. My observation is that the residuals are relatively evenly
scattered above and below the zero line, with no clear pattern or
structure. This suggests that the CS model does not exhibit systematic
bias in the residuals. However, similar to the AR1 model, there are some
points with relatively large residuals, indicating potential outliers or
instances where the model’s predictions are less accurate.
Plot 6: This plot is a residual versus fitted plot
for the AR1 (Autoregressive of order 1) model: - Horizontal Axis
(Fitted Values): It represents the predicted values from the
AR1 model, expected NewPerCorrect
values. -
Vertical Axis (Residuals): Indicates the residuals,
which are the differences between the actual and predicted
NewPerCorrect
values. - Scatter Plot
Points: Each point shows a residual for a corresponding
prediction. The distribution of these points suggests the variation in
the model’s prediction accuracy. - Zero Line: The red
line at zero marks where the predicted values match the actual values
exactly. From this plot, I see that the residuals for the AR1 model are
randomly dispersed around the zero line, suggesting no clear pattern of
errors. This randomness is a good sign, indicating that the model’s
assumptions are likely being met. However, there are some points with
larger residuals, indicating predictions that deviate more from the
actual values. These could be outliers or instances where the model does
not fit as well. Overall, the AR1 model seems to have a consistent
spread of residuals across its predictions.
Plot 1: This plot illustrates the actual values
versus the predictions made by the Multiple Linear Regression (MLR)
model: - Scatter Plot Points: They represent the actual
NewPerCorrect
values observed over different weeks. -
Prediction Line: The MLR model has predicted a flat
line, indicating no change in NewPerCorrect
values over
time according to the model. - Week Axis: It shows the
timeline of the data collection, with weeks on the horizontal axis. -
NewPerCorrect Axis: This axis, on the vertical, shows
the percentage correct. From this plot, I can observe that the MLR
model’s predictions do not fluctuate with the actual data; instead, they
suggest a constant value across all weeks. This could indicate that the
MLR model isn’t capturing any potential trends or changes in the data
over time. It suggests that the factors included in the MLR model do not
explain the variability in the weekly data, which could imply the need
for a more complex model or the inclusion of other explanatory
variables.
Plot 2: In this plot, we’re examining the actual
versus predicted values from the Compound Symmetry (CS) model: -
Scatter Plot Points: These represent the actual
NewPerCorrect
values across different weeks. The spread
indicates variability in the dataset. - Prediction
Line: The CS model provides a horizontal blue line, which shows
that the model predicts a constant NewPerCorrect
value,
regardless of the time point. - Week Axis: The
horizontal axis shows the progression of weeks, suggesting the
longitudinal nature of the dataset. - NewPerCorrect
Axis: The vertical axis shows the NewPerCorrect
values. My analysis of this plot is that the CS model does not capture
any trend or seasonality in the data since it predicts a flat line. The
actual data points vary around this line, indicating that the model’s
constant prediction is not fitting the weekly variations in the data.
This could mean that either the true relationship is not captured by the
model, or that the variability in the data is due to random fluctuations
rather than a systematic pattern.
Plot 3: This plot displays actual vs. predicted
values from the Autoregressive (AR1) model: - Scatter Plot
Points: The actual NewPerCorrect
values are spread
out over the weeks, with considerable variance from the highest to the
lowest points. - Prediction Line: The AR1 model’s
predictions are represented by a straight, horizontal blue line, which
does not seem to reflect any changes or trends in the data over time. -
Week Axis: The horizontal axis labeled
data$Week
indicates the time series aspect of the data,
spread across different weeks. - NewPerCorrect Axis:
The vertical axis shows the NewPerCorrect
values, both
observed and predicted. In brief, the AR1 model’s predictions suggest a
constant level of NewPerCorrect
across all weeks, which
does not align with the fluctuation observed in the actual data. This
might indicate that the AR1 model does not capture the variability in
the data adequately.
Plot 4: This plot displays the residuals from a
Multiple Linear Regression (MLR) model against its fitted values: -
Horizontal Axis (Fitted Values): Displays the
predictions from the MLR model, reflecting the expected
NewPerCorrect
values. - Vertical Axis
(Residuals): Shows the residuals, which are the differences
between the actual data points and the model’s predictions. -
Scatter Plot Points: Each represents a residual for a
given predicted value. The spread of these points gives us an indication
of how well the model fits the data at different levels of the predicted
values. - Zero Line: The red line represents a residual
of zero, where the model’s prediction would be exactly correct.
Analyzing this plot, the residuals appear to be randomly scattered
around the zero line without showing any apparent patterns, suggesting
that the MLR model’s assumptions are not violated. However, there is
some spread in the residuals, indicating potential outliers or moments
where the model does not capture the data’s variability very well.
Overall, the MLR model seems to provide an even distribution of
residuals across the range of predictions.
Plot 5: In the provided plot, which is a residual
versus fitted plot for the Compound Symmetry (CS) model: -
Horizontal Axis (Fitted Values): The fitted values from
the CS model are shown, representing the model’s predictions of the
NewPerCorrect
values. - Vertical Axis
(Residuals): The residuals displayed indicate the differences
between the actual and predicted values. - Scatter Plot
Points: Each point represents a residual for its corresponding
predicted value. The distribution of points gives us an idea of how well
the model fits the data. - Zero Line: The red line
across zero suggests where the model predictions would be perfect, with
no residual. My observation is that the residuals are relatively evenly
scattered above and below the zero line, with no clear pattern or
structure. This suggests that the CS model does not exhibit systematic
bias in the residuals. However, similar to the AR1 model, there are some
points with relatively large residuals, indicating potential outliers or
instances where the model’s predictions are less accurate.
Plot 6: This plot is a residual versus fitted plot
for the AR1 (Autoregressive of order 1) model: - Horizontal Axis
(Fitted Values): It represents the predicted values from the
AR1 model, expected NewPerCorrect
values. -
Vertical Axis (Residuals): Indicates the residuals,
which are the differences between the actual and predicted
NewPerCorrect
values. - Scatter Plot
Points: Each point shows a residual for a corresponding
prediction. The distribution of these points suggests the variation in
the model’s prediction accuracy. - Zero Line: The red
line at zero marks where the predicted values match the actual values
exactly. From this plot, I see that the residuals for the AR1 model are
randomly dispersed around the zero line, suggesting no clear pattern of
errors. This randomness is a good sign, indicating that the model’s
assumptions are likely being met. However, there are some points with
larger residuals, indicating predictions that deviate more from the
actual values. These could be outliers or instances where the model does
not fit as well. Overall, the AR1 model seems to have a consistent
spread of residuals across its predictions.
Plot 7: The line plot overlays the fitted regression lines from the MLR, CS, and AR1 models on the actual data points across different weeks. Here’s my concise analysis: - The raw data shows variability in the ‘NewPerCorrect’ values across the weeks, which all models aim to capture. - The MLR model’s line suggests a constant trend without accounting for the within-subject correlation. - The CS and AR1 models, indicated by the dashed lines, seem to follow a similar trajectory, implying that both models lead to comparable predictions across the weeks. - All models appear to underfit the data at the higher ‘NewPerCorrect’ values, particularly around weeks 2 to 4 and weeks 10 to 12, where there are noticeable deviations from the fitted lines. From this visualization, I can see that while the MLR, CS, and AR1 models provide an overall approximation of the trend in data, there may still be room for improving the models to capture the variability more accurately, especially for the outliers.
Plot 8: The bar chart compares the AIC values of four different models: AR1, CS, MLR, and UN. The AIC values serve as a metric for model comparison, with lower values indicating a model that balances goodness of fit with complexity. Here’s my brief analysis: - The AR1 model has the highest AIC value, suggesting it may not be the best fit among the options given its complexity and the data at hand. - The CS model shows a significantly lower AIC value, indicating a better fit compared to AR1. - The MLR model, while simpler, has an AIC value that is competitive with the CS model, which suggests it might be a parsimonious choice. - The UN model, which is the most flexible, doesn’t necessarily provide the best fit as indicated by its AIC value, which is close to that of the AR1 model. This visualization helps me conclude that, based on AIC alone, the CS model seems to provide a good balance between fit and complexity for this dataset, while the UN model, despite its flexibility, does not offer a substantial improvement to justify its complexity.
# Load the multcomp library
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
# Fit the model with the best AIC
CSmodel <- gls(NewPerCorrect ~ Treatment * Week,
data = data,
correlation = corCompSymm(form = ~ 1 | Monkey))
# Create a contrast matrix for all pairwise comparisons
contrasts <- glht(CSmodel, linfct = mcp(Treatment = "Tukey"))
## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
## default contrast might be inappropriate
# Adjust for multiple testing using Bonferroni correction
summary(contrasts, test = adjusted("bonferroni"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: gls(model = NewPerCorrect ~ Treatment * Week, data = data, correlation = corCompSymm(form = ~1 |
## Monkey))
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## Treated - Control == 0 -18.505 5.869 -3.153 0.00161 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- bonferroni method)
Analysis:
The output from the Tukey contrasts with Bonferroni adjustment indicates a significant difference between the treated and control groups, as denoted by the ’**’ sign next to the p-value (p = 0.00161), which is less than the standard alpha level of 0.05. This significant result after adjusting for multiple comparisons strengthens the evidence that the treatment has an effect. Given this result, it appears reasonable to conclude that the researcher’s hypothesis, which presumably was about the efficacy of the treatment, is supported by the data. The negative estimate (-18.505) suggests that the treatment group has a lower mean of ‘NewPerCorrect’ compared to the control group. This statistically significant difference points to the treatment having a distinct impact on the outcome measured weekly.