This report summarizes the findings of a series of regression analyses conducted using the MARSI data. Please note that the total sample size was, N = 13. Most of the regression analyses are conducted using higher sample size than the one in this study. There are various recommendations about the sample size and the power of a study. The minimum, as recommended by Green (1991) is ten per predictor. We have at least two predictors for the pretest model, i.e., Grade and Gender of the students, not including attendance and the Wordsread. For posttest model, there is one more predictor, i.e., the pretest scores. Because of the lack of enough data points, we could not conduct a Multivariate regression. Thus, this report includes the results of many Ordinary Least Square (OLS) methods having one predictor at a time.

#Loading Required Packages
library(nlme) # Just in case we want to run a multilevel model
library(ggplot2)
library(cowplot)
library(tidyverse)

rm(list = ls())

1. Exploratory Data Analysis

Before conducting regression analysis, the researchers conducted a series of exploratory data analyses. The results of the descriptive analyses are presented herewith.

### Uploading Data
Grd_Long <- read.csv("Grd_Long.csv")
  summary(Grd_Long[,(4:8)])
   Attendance      Wordsread     Score_Types        Pretest_Scores 
 Min.   :14.00   Min.   :16631   Length:78          Min.   :1.000  
 1st Qu.:16.00   1st Qu.:21316   Class :character   1st Qu.:3.000  
 Median :17.00   Median :31066   Mode  :character   Median :4.000  
 Mean   :16.54   Mean   :34374                      Mean   :3.795  
 3rd Qu.:18.00   3rd Qu.:38867                      3rd Qu.:4.750  
 Max.   :18.00   Max.   :71353                      Max.   :9.000  
 Posttest_Scores
 Min.   :1.000  
 1st Qu.:3.000  
 Median :4.000  
 Mean   :3.782  
 3rd Qu.:4.000  
 Max.   :7.000  
Grd_Long2 <- read.csv("Grd_Long2.csv")
Grd_Long2$Std_ID <- as_factor(Grd_Long2$Std_ID)
Grd_Long2$Gender <- as_factor(Grd_Long2$Gender)
Grd_Long2$Grade <- as_factor(Grd_Long2$Grade)
Grd_Long2$Test_Type <- factor(Grd_Long2$Test_Type, levels=c("Pretest", "Posttest"))
#Grd_Long2$Test_Type <- as_factor(Grd_Long2$Test_Type)

  #summary(Grd_Long2)

Grd_Long3 <- read.csv("Grd_Long3.csv")

The table above shows the descriptive analysis of the variables included in the model. There are some variables like students grades and gender which are excluded in the report above. Among the participants, the maximum attendance was 18 of 20, and the least was 14. The average attendance, i.e., X_Bar(Attendance)= 16.54 days, and medium was 17. The maximum and minimum words read were 71,353 & 16,631, respectively, while the mean was 34, 374. A new variable named Score_Types was created which recorded whether a score was pre-, or posttest on Vocabulary, Sentence Comprehension, Passage Comprehension, Comprehension Composite, Total, and Listening Comprehension. The variable named ‘Pretest_Scores’ recored the pretest scores, while the ‘Posttest_Scores’ column recorded the posttest scores for aforementioned test types. The minimum pretest score was 1 and the average was 3.795 when the highest hit the score of 9. The average posttest score was slightly lower (Mean_posttest = 3.782), while the lower and higher values turned out to be 1 and 7, respectively.

1.a. Pretest and Posttests Based on Test Types

First of all, we wanted to see the relationships between the pretest and posttest scores based on the test types, e.g., Vocabulary, Sentence Comprehension, Passage Comprehension, Comprehension Composite, Total, and Listening Comprehension. The line graph below shows their relationships.

# Checking pretest-posttest relationships among all students
ggplot(Grd_Long, aes(x=Pretest_Scores, y=Posttest_Scores, col=Score_Types))+geom_point()+ geom_smooth(method="lm", se=FALSE)+
  xlab("Students' GRADE Stanine Pretest Scores")+
  ylab("Students' GRADE Stanine Posttest Scores")

# Checking pretest-posttest relationships among all students separately
ggplot(Grd_Long, aes(x=Pretest_Scores, y=Posttest_Scores))+geom_smooth(method="lm")+ geom_jitter()+ facet_wrap(~Std_ID)

# Renaming the columns and creating a different object
Grd_Long2_M <- Grd_Long2 %>%
  rename(voc_stan_Scr = Vocabulary_Stanine_Score,
          sen_comp_scr = Sentence_Comprehension_Score,
          pas_comp_scr = Passage_Comprehension_Score,
          comp_com_scr = Comprehension_Composite_Score,
          list_comp_scr = Listening_Comprehension_Score
         )
# Calculating relationship between outcome variables
library(lessR)
Plot(c(voc_stan_Scr,sen_comp_scr, pas_comp_scr,comp_com_scr,list_comp_scr, Total_Score),fit="lm",fit_se=0, data=Grd_Long2_M)

Above plot shows fairly linear relationships between the pretest and posttest scores in all measures, i.e., Comprehension Composite, Listening Comprehension, Passage Comprehension, Sentence Comprehension, Total Test, and Vocabulary Stanine Scores. The most steeper change is seen in Passage Comprehension, Comprehension Composite, Sentence Comprehension, and Total Test Scores. The least steeper slope appears to be between the pretest and posttest scores of Listening Comprehension.

1.b. Pretest and Posttests Based on Gender

ggplot(Grd_Long, aes(x=Pretest_Scores, y=Posttest_Scores, col=Gender))+geom_point()+geom_smooth(method="lm", se=FALSE)+
  xlab("Students' GRADE Stanine Pretest Scores")+
  ylab("Students' GRADE Stanine Posttest Scores")

Like the pretest scores, students’ gender seem to have positive linear relationships with their posttest scores in GRADE Stanine Scores. higher gap was observed on the pretest scores between male and female students while that gap was reduced dramatically in the posttest model. On average, female students had higher pretest scores and they somewhat comparable postest scores.

1.c. Pretest and Posttests Based on Grade

ggplot(Grd_Long, aes(x=Pretest_Scores, y=Posttest_Scores, col=Grade))+geom_point()+geom_smooth(method="lm", se=FALSE)+
  xlab("Students' GRADE Stanine Pretest Scores")+
  ylab("Students' GRADE Stanine Posttest Scores")

The relationships of students’ pretest and posttest scores seem to have affected by students’ grades. Above plots shows a mixed results. Fourth grade students had higher pretest scores which dipped linearly in the posttest model. While students in both third and fifth grades had linearly positive relationships. Fifth graders had both higher pretest and post test scores compared to third graders.

1.d.i. Vocabulary Stanine Scores

For general understanding of the given data, we requested some comparative plots of pretest and posttest scores based on the presence of the third variable, i.e., gender and grade-level of the students. We requested box plot for the Vocabulary Statine Scores and Density plots for Sentence Comprehension, Passage Comprehension, Comprehension Composite, Total Scores and Listening Comprehension Scores.

voc1 <- ggplot(Grd_Long2, aes(x=Test_Type, y=Vocabulary_Stanine_Score, col=Gender))+ geom_boxplot(outlier.size=1)+
  geom_point(aes(fill=Gender), shape=19, alpha=0.3, position=position_jitterdodge(jitter.width = 0.3))+
  xlab("Voc_Stanine Pre-Post Scores by Gender")

voc2 <- ggplot(Grd_Long2, aes(x=Test_Type, y=Vocabulary_Stanine_Score, col=Grade))+ geom_boxplot(outlier.size=2)+
  geom_point(aes(fill=Grade), shape=19, alpha=0.6, position=position_jitterdodge(jitter.width = 0.4))+
  xlab("Voc_Stanine Pre- & Post Scores by Grade")

plot_grid(voc1,voc2)

library(geomtextpath)
ggplot(Grd_Long, 
       aes(x=Pretest_Scores, y=Posttest_Scores,color=Grade, label=Grade))+
  geom_point(alpha=0.2, size=4)+
  geom_labelsmooth(method="lm", boxlinewidth=0)+
  scale_color_manual(values=c("darkorange", "purple","cyan4"))+
  theme(legend.position="none")

1.d.ii. Comparative Pre- & Posttest Scores: Red-Posttests & Blue-Pretests

library(tidyverse)
library(gridExtra)
Grd_Long2_pretest <- filter(Grd_Long2, Test_Type==c("Pretest"))
Grd_Long2_posttest <- filter(Grd_Long2, Test_Type==c("Posttest"))

scc <- ggplot()+
  geom_density(data=Grd_Long2_pretest, aes(Sentence_Comprehension_Score), color="darkblue" )+
  geom_density(data=Grd_Long2_posttest, aes(Sentence_Comprehension_Score),  color="darkred" )+
  xlab("Sentence Comprehension Scores")

# Passage Comprehension Scores
pcs <- ggplot()+
  geom_density(data=Grd_Long2_pretest, aes(Passage_Comprehension_Score), color="darkblue" )+
  geom_density(data=Grd_Long2_posttest, aes(Passage_Comprehension_Score),  color="darkred" )+
  xlab("Passage Comprehension Scores")

# Comprehension Composite Scores
ccs <- ggplot()+
  geom_density(data=Grd_Long2_pretest, aes(Comprehension_Composite_Score), color="darkblue" )+
  geom_density(data=Grd_Long2_posttest, aes(Comprehension_Composite_Score),  color="darkred" )+
  xlab("Comprehension Composite Scores")

# Listening Comprehension Scores
lcs <- ggplot()+
  geom_density(data=Grd_Long2_pretest, aes(Listening_Comprehension_Score), color="darkblue" )+
  geom_density(data=Grd_Long2_posttest, aes(Listening_Comprehension_Score), color="darkred" )+
  xlab("Listening Comprehension Scores")

# Putting them Together
plot_grid(scc, pcs, ccs, lcs,
         ncol = 2, nrow = 2)

1.d.iii. Total Scores

ggplot()+
  geom_density(data=Grd_Long2_pretest, aes(Total_Score), color="darkblue" )+
  geom_density(data=Grd_Long2_posttest, aes(Total_Score), color="darkred" )+
  xlab("Students' Total Scores Pretest(darkblue) and Posttests(darkred)")

We see a variable degrees of relationships between pretest and posttest scores based on the test types. In most of the cases, average posttest scores seem be higher than average pretest scores. Likewise, pretest scores are more spread (showing higher variability) compared to the postest scores.

Are these findings any good? Let’s check.

Poor Model Fitting and Testing an Alternate Model

Looking at the plots above and the regression models conducted for MARSI analyses, we came to realize that most of our models fitted poorly as observed by the value of R-squared. Indicated throughout, one of the reasons was a small sample size, while the other could be the biases. Linear Regression is considered to be a High Bias analysis because it tries to simplify the target function in order to boost the generalizability of the results and to decrease the dependency of a model on a single data point. High bias in analyses often leads to an underfitting of a model. Low R-squared values and the gap in R-squared and Adjusted R-squared hinted us of such underfitting. This assumption could be strengthened when we look closely on the the Fitted Values and Pearson Residual graphs towards the end of this report. The data points are scattered everywhere and the fitted line is curved hinting low variance and high bias.

In this context, we wanted to see how well our analyses performed. One of the ways could be by running a Least Absolute Shrinkage and Selection Operator (LASSO) regression and compare the findings. LASSO regression is often used in machine learning and predictive analyses. It helps us fit a regression model when we have small data set, numerous predictors, and multicollinearity in data by finding a coefficient (Lambda), which minimizes the sum of squared residuals (RSS). We are going to use glmnet package in R.

In this analysis, we predict posttest scores based on pretest scores, students’ gender, grade, attendance, and even the words read. Thus, our outcome variable (y) was “posttest scores”, while “grade”, “gender”, “attendance”, “wordsread”, and “pretest scores” were the predictor variables (x).

library(glmnet)
# Defining the Dependent Variable
y <- Grd_Long$Posttest_Scores
## Defining the Predictor Variables
x <- data.matrix(Grd_Long[,c("Pretest_Scores","Gender","Grade","Attendance","Wordsread")])

The lasso regression can be conducted using the glmnet() function, where we specify the alpha of 1. We can identify the value of lambda by performing k-fold cross-validation, which will be able to calculate the lowest mean squared error (MSE).

# Perform k-fold cross validation 
cross_valid_model <- cv.glmnet(x,y, alpha = 1)
# Identify the optimal lambda value that minimizes MSE
optimal_lambda <- cross_valid_model$lambda.min
# Printing the Optimal Lambda
optimal_lambda
[1] 0.001649232
# Final Model
final_model <- glmnet(x,y, alpha=1, lambda=optimal_lambda)
coef(final_model)
6 x 1 sparse Matrix of class "dgCMatrix"
                           s0
(Intercept)     7.80358948635
Pretest_Scores  0.15857797960
Gender         -1.01808700217
Grade          -0.52019489121
Attendance     -0.10783950352
Wordsread      -0.00001478169

The beta values of intercept, pretest scores, gener, grade, attendance, and words read are 7.80, 0.158, -1.02, -0.52, -0.108, and -0.00001478169, respectively. The second order model recommended that we drop the variables Attendance and Wordsread from our model as seen in the column ‘s2’ below.

coef(cross_valid_model, c(cross_valid_model$lambda.min,
                          cross_valid_model$lambda.1se))
6 x 2 sparse Matrix of class "dgCMatrix"
                           s1          s2
(Intercept)     7.80321440317  4.46450779
Pretest_Scores  0.15859581476  0.05901489
Gender         -1.01802465990 -0.35391149
Grade          -0.52017720433 -0.23067361
Attendance     -0.10783019902  .         
Wordsread      -0.00001478059  .         
print(paste(cross_valid_model$lambda.min,
            log(cross_valid_model$lambda.min)))
[1] "0.00164923216827857 -6.4074454519382"

The range above shows the range of possible lambda at which the minimal Mean Squared Error (MSE) is achieved.

print(paste(cross_valid_model$lambda.1se,
            log(cross_valid_model$lambda.1se)))
[1] "0.228399915311335 -1.47665717199135"

The range of values above shows the largest possible value for lambda (the penalty) at which the MSE is within one standard error of the minimal MSE.

The range between the highest and lowest penalty parameter, i.e., lambda is much hihger if we want to include all the variables in the model. There are the chances that we get better results if we omit Attendance and Wordsread and even the Pretest_Score variables from the model. Our model improves much better if we predict posttest scores only based on students’ gender and grades. This finding could be easily traced in the plot below.

# Requesting the MSE Plot 
plot(cross_valid_model)

As depicted by the dotted lines, we can have any number of predictors between 3 and 5, however having less than 3 is not recommended. Above statistics suggest having less is better than having all. Thus, we are not going to include students’ attendance and total words read in our model.

Regression Analyses

A. Pretest Models

pretest_mod <- lm(Pretest_Scores ~ Gender + Grade, data = Grd_Long)
summary(pretest_mod)

Call:
lm(formula = Pretest_Scores ~ Gender + Grade, data = Grd_Long)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8039 -0.8039 -0.2843  0.6814  4.8431 

Coefficients:
                  Estimate Std. Error t value            Pr(>|t|)    
(Intercept)         3.8039     0.2541  14.968 <0.0000000000000002 ***
GenderMale         -0.5196     0.3427  -1.516              0.1337    
GradeFourth Grade   0.8725     0.3812   2.289              0.0249 *  
GradeThird Grade   -0.3824     0.3675  -1.040              0.3015    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.34 on 74 degrees of freedom
Multiple R-squared:  0.1294,    Adjusted R-squared:  0.09407 
F-statistic: 3.665 on 3 and 74 DF,  p-value: 0.01603

Findings

  • The model was statistically significant at 1% level.
  • There was no difference in average pretest scores between male and female students.
  • Compared to the fifth grade students, the fourth graders had statistically significantly higher stanine pretest scores after controlling for their gender.
  • Unlike the fourth graders, there was no difference in the stanine pretest scores among the third and fifth grade students.
  • The model accounted for by roughly 13% of variability in the Stanine Pretest Scores. Approximately, 28% of gap in Multiple R-squared and the Adjusted R-Squared value suggests that the actual variability would be 28% less had our data come from the population.

B. Posttest Models

A series of posttest models (linear least square)are drawn to check for the relationship between the dependent and independent variables.

B.1. Pretest as a Predictor of the Posttest Scores Before running a model with predictors, we wanted to check the relationship between the pretest and posttest stanine scores. The findings are reported below:

mod1 <- lm(Posttest_Scores ~ Pretest_Scores, Grd_Long)
summary(mod1)

Call:
lm(formula = Posttest_Scores ~ Pretest_Scores, data = Grd_Long)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0781 -0.5868  0.1676  0.6588  3.1676 

Coefficients:
               Estimate Std. Error t value        Pr(>|t|)    
(Intercept)     2.84990    0.37558   7.588 0.0000000000674 ***
Pretest_Scores  0.24563    0.09286   2.645         0.00991 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.148 on 76 degrees of freedom
Multiple R-squared:  0.08431,   Adjusted R-squared:  0.07226 
F-statistic: 6.997 on 1 and 76 DF,  p-value: 0.009915
  • The above output shows that the model was statistically significant. The Stanine pretest score was a statistically significant predictor of Stanine posttest scores. In other words, the Stanine pretest scores had positive linear relationship with the Stanine posttest scores. A one unit increase in average pretest scores would result roughly 0.25 unit higher posttest scores.
  • The value of R-squared, i.e., 0.084 suggests that the Stanine Pretest score was accounted for approximately 8% variability in students’ posttest scores.

Let’s add other predictors in the model.

B.2. Students’ Gender, Grade, and Pretest Scores as a Predictor of the Posttest Scores

mod2 <- lm(Posttest_Scores ~ Pretest_Scores + Gender + Grade, Grd_Long)
summary(mod2)

Call:
lm(formula = Posttest_Scores ~ Pretest_Scores + Gender + Grade, 
    data = Grd_Long)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7720 -0.4264 -0.1406  0.5967  2.5736 

Coefficients:
                  Estimate Std. Error t value            Pr(>|t|)    
(Intercept)         3.7949     0.3865   9.818 0.00000000000000555 ***
Pretest_Scores      0.1579     0.0881   1.792            0.077294 .  
GenderMale         -0.8951     0.2637  -3.394            0.001116 ** 
GradeFourth Grade  -0.1279     0.2990  -0.428            0.669946    
GradeThird Grade   -0.9658     0.2805  -3.443            0.000957 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.016 on 73 degrees of freedom
Multiple R-squared:  0.3107,    Adjusted R-squared:  0.2729 
F-statistic: 8.226 on 4 and 73 DF,  p-value: 0.0000156

Findings

  • The model was statistically significant at 1% level.
  • The value of R-squared raised to 0.3107 showing that this model accounts for roughly 31% of variability in students’ posttest scores. There still exists the gap in the values between Multiple R-squared and Adjusted R-squared. However, compared to the pretest model, the gap is much lower.
  • The Stanine Pretest Score is no longer a statitically significant predictor of the Stanine Posttest Scores after controlling for students’ Gender and Grades.
  • Students’ Gender is a statistically significant predictor of thier Stanine posttest scores after controlling for thier pretest scores and the grades. Compared to Females, male student had approximately .89 unit lower posttest scores and it was statistically significantly different from a zero.
  • After controlling for students’ gender and pretest scores, Third Graders had statistically significant lower stanine posttest scores compared to the Fifth Graders. They had approximately 1 unit lower Stanine Posttest scores. However, there was no statistically significant different posttest scores among the fourth and fifth graders after controlling for the pretest scores and gender.

Checking the ANOVA Table of the Above Model.

anova(mod1, mod2)
Analysis of Variance Table

Model 1: Posttest_Scores ~ Pretest_Scores
Model 2: Posttest_Scores ~ Pretest_Scores + Gender + Grade
  Res.Df     RSS Df Sum of Sq      F   Pr(>F)    
1     76 100.080                                 
2     73  75.337  3    24.744 7.9921 0.000112 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above output shows the results of the model comparison between mod1 and mod2. The results showed that mod2 significantly improved the fit of the model to the data compared to mod1, F(3, 73) = 7.99, p < .001.

Checking Model Assumptions

The data were tested for the assumptions once the final model was identified. The assumptions were tested for the posttest model not the pretest models.

a. Outliers and Influential Cases

We tested our data for the outliers once we identified the best fitting model. We added the residual, standardized residual, and the studentized residuals to the original data set. We then skimmed the values to look for any outliers present in the dataset. In addition, the data were tested for the presence of influential cases, which could potentially skew the results. Cooks’ Distance measures on the final posttest model were calculted and they were combined with the original dataset.

# Adding the Fitted Values Column
Grd_Long$Fitted_Values <- fitted.values(mod2)
# Adding Residuals Column
Grd_Long$Residuals <- resid(mod2)
# Adding Standardized Residual Column
Grd_Long$Standardized_Residuals <- rstandard(mod2)
# Adding Cook's Distance Column
Grd_Long$Cooks_Distance <- cooks.distance(mod2)
# Checking the New Data File
  #head(Grd_Long)
# Checking if the standardized residuals have the values beyond +- 2.
Grd_Long$Standardized_Residuals > 2 | Grd_Long$Standardized_Residuals < -2
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE

As mentioned in Field, Miles, & Field, (2012) and ordinary dataset is expected to have 95% of residuals within +2 and -2. Some of the resdiuals seem to have values beyond this expectation. We had total of 78 data points because we merged the data by test types thus one student had six different score types. Thus, our dataset can tolerate maximum of 8 data points having standardized residuals beyond +2 and -2 limitations.

# Adding a New Column (Large Residual) in the Original Dataset
Grd_Long$Large_Residual <- Grd_Long$Standardized_Residuals > 2 | Grd_Long$Standardized_Residuals < -2
# Writing the New Data in the Local Disc
write.csv(Grd_Long, file = "Grd_Long_Residuals.csv", row.names = FALSE)
  #head(Grd_Long)
# Checking Total Number of Residuals Having Large Residuals
sum(Grd_Long$Large_Residual)
[1] 5

The above output shows that only 5 cases had large residuals defined as one bigger than 2 or smaller than -2, which makes roughly 6% of the available data points. This assumption can be further tested by checking the Cook’s Distance.

Grd_Long[Grd_Long$Large_Residual, c("Cooks_Distance")]
[1] 0.04423522 0.05007785 0.08289269 0.09180204 0.18073633

Though, the value slightly violates the estimated 5% cases, none of them has a Cook’s Distance greater than 1. Thus, we concluded that one of the cases is having an undue influence in the final posttest model.

b. Checking the Assumption of Independence

The assumption of independent errors of the data can be tested using the Durbin-Watson test. Based on the rule-of-thumb, the D-W statistics should lie between 1 and 3 to meet the assumption of independent errors. The closer the D-W value to 2, the better for the data at hand. Let’s check:

library(car)
library(QuantPsyc)
  #durbinWatsonTest(mod2)
# Or
dwt(mod2)
 lag Autocorrelation D-W Statistic p-value
   1       0.1129869      1.747772   0.132
 Alternative hypothesis: rho != 0

Not too close to 2, but the D-W statistics for our model is close enough. The p-value of 0.136 confirms the assumptions of independent errors.

c. Checking the Assumption of no Multicollinearity

In general, the Variance Inflation Factor (VIF) tests the amount of multicollinearity among the varables used in a regression model. We can calculate tolerance statistics dividing 1 by the VIF values. The value of VIF should lie between 1 and 10. If all the VIF values in a model are much higher than 1, it’s something for concern. If we use the tolerance, a value below 0.1 suggests serious problem, while a value of 0.2 suggest a potential problem (Field, Miles, & Field, 2012). Let’s check:

# Checking the values of VIF for the variables included in the model
vif(mod2)
                   GVIF Df GVIF^(1/(2*Df))
Pretest_Scores 1.148594  1        1.071725
Gender         1.119712  1        1.058165
Grade          1.239171  2        1.055074
# Mean VIF
mean(vif(mod2))
[1] 1.188049
# Calculating Tolerance
1/vif(mod2)
                    GVIF  Df GVIF^(1/(2*Df))
Pretest_Scores 0.8706297 1.0       0.9330754
Gender         0.8930865 1.0       0.9450326
Grade          0.8069909 0.5       0.9478010
# Mean Tolerance
mean(1/vif(mod2))
[1] 0.8774018

For the final posttest model, the VIF values are well below 10 and the tolerance statistics are well above 0.2. All of the VIF values are very close to 1, thus, we can conclude that the assumption of no multicollinearity is met.

d. Plotting the Residuals

residualPlots(mod2)

               Test stat Pr(>|Test stat|)   
Pretest_Scores   -2.9743         0.003994 **
Tukey test       -1.6530         0.098340 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In both of the plots, we can see some sort of patterns, although the data points scatter everywhere. It may be because both pretest and posttest variables are some how categorical. These outcomes hint that the relationships between pretest and posttest scores are not always linear. It was the case when we ran the final posttest model. Furthermore, the test statistics for Pretest_Scores is statistically significant (p < .005), which strengthen the idea of presence of quadratic relationship.

e. Test of Normality of Residuals

It is important that the regression residuals follow a normal distribution, which can be easily checked using the quantile-quantile (Q-Q) plots. Solid blue line shows the normal ideal distribution, the dots represent the data points and the shaded region show the 95% confidence interval.

qqPlot(mod2)

[1] 64 67

Most of the data points fall within the 95% confidence interval, thus, we can conclude the normality of residuals in the posttest model.

e.i. Test of Normality of Residuals

The normality of the residuals can also be tested by plotting the studentized residuals of the regression model. A fair bell-shaped histogram suggests a normality of the residuals.

hist(rstudent(mod2))

The histogram definitely looks bell-shaped suggesting the distribution of the data to be normal.

The data points seem to fall within the 95% Confidence Interval fairly closely confirming the normal distribution.

confint(mod2)
                        2.5 %     97.5 %
(Intercept)        3.02454521  4.5652727
Pretest_Scores    -0.01771711  0.3334523
GenderMale        -1.42068416 -0.3695057
GradeFourth Grade -0.72378163  0.4678951
GradeThird Grade  -1.52490280 -0.4066625
  #confint(mod1, method=c("boot"), boot.type=c("perc"))
  #confint(mod1, method=c("boot"), boot.type=c("basic"))
  #confint(mod1, method=c("boot"), boot.type=c("norm"))
  #confint(mod1, method=c("Wald"))
  #confint(mod1, method=c("profile"))
#### Hand Calculating 95% Confidence Interval for all Posttest Scores 
# A. Calculating Mean for Pretes Scores
  #post_mean <- mean(Grd_Long$Posttest_Scores)
  #post_mean
# B. Calculating Length
  #post_length <- length(Grd_Long$Posttest_Scores)
# C. Calculating Standard Deviation
  #post_sd <- sd(Grd_Long$Posttest_Scores)
# D. Calculating Standard Error
  #post_se <- post_sd/sqrt(post_length)
# E. Denoting the Significance Level
  #alpha = 0.05
# F. Calculating Degrees of Freedom
  #degrees_of_freedom <- post_length - 1
  #degrees_of_freedom
# G. Calculating t-scores
  #t_score = qt(p=alpha/2, df=degrees_of_freedom, lower.tail = F)
  #t_score
# H. Calculating Margin of Error
  #margin_error <- t_score * post_se
  #margin_error
# I. Calculating Lower Bound Value of the 95% Confidence Interval
  #lower_bound <- post_mean - margin_error
# J. Calculating the Upper Values of the 95% CI
  #upper_bound <- post_mean + margin_error
# K. Combining the Lower and Upper Bound Values Putting them in a Dataframe
  #post_CI <- print(c(lower_bound, upper_bound))

#### Using Grd_Long2 Dataset
#Based on the above results, we came to realize that there is not a statistically significant linear relationship between students' pretest and posttest scores calculated using the GRADE measures. However, we would like make sure it is the case. For this, we are taking a little different approach and see what results show. 

#Like in the previous analysis, pretest and posttest scores tell the story of two different time intervals. One before the reading intervention and one after it. We are going to check the shift on the scores based on students' grade. Thus, our variables of interest are the Test Types and Grades.

#library(dplyr)
#Grd_Long2 %>%
  #arrange(Test_Type,  Std_ID)
  #head(Grd_Long2)

#### Fixed Part 
#The fixed part for this analysis would be Test_Type and Test_Type*Grade. One of the rule of thumb in regression modeling to include the baseline effects while modeing for an interaction in the model. 

# Creating a Matrix for Vocabulary Stanine Scores using the Above Measures
  #voc_matrix <- model.matrix(~ Test_Type*Grade, data = Grd_Long2)
# Checking the names of the voc_matrix
  #colnames(voc_matrix)