Background: A researcher is interested in whether an active learning environment is more or less effective in increasing achievement in science than a lecture format. To this end, the researcher assesses science achievement using a fifty-point test. The study involves 30 male and female students in sciences classes. The reseracher randomly assigns the students to one of the two groups(AL: active learning. LE: Lecture), implements the learning environment and at the ned of the semester administers the science achievement test. In addition to science acheivement test scores (achieve), the investigator obtains the students’ IQ scores. For all variables a code of -99 is used to represent missing scores.

  1. Do students in the active learning environment perform differently on the science achievement test than do students in the lecture format or are an differences due to chance alone? How do you know?

To test if the Teaching Format affects Achievement, a simple linear regression model (model1) was used. This can also be thought of as an anova in our case, as both the linear regression and anova had the same p-value = 0.1081. A p-value abot our alpha of 0.07 signifies that there is no significant difference between the means of Active Learning (AL) and Lecture (LE) Teaching Formats. A boxplot was also made to depict that though there is a difference means, there is a non-significant difference. To summarize, we fail to reject the null hypothesis that there is no difference between sample means and can conclude that there is no difference in sample means due to chance alone.

  1. The reseracher presents the results at a conference. One of the audience members asks about whether the students’ IQ scores could possibly explain the results. That is, even though the investigator used random assignment to groups it could still be that IQ could account for the results. What is the proportion of variability in Achievement scores accounted for by IQ scores?

IQ acounts for approzimately 60% of the variability in Achievement scores (Multiple R-squared: 0.6156,Adjusted R-squared: 0.5981 )

  1. To address the audience member’s comment the investigator rereuns the analyses. However, this time the model includes both the students’ IQ as well as the learning environment (Teaching Format) to which the person was assigned. Do students in the active learning environment perform differently on the science achievement test then do students in the lecture format or any any differences due to chance alone? How do you know?

To investigate if a student’s IQ could explain their Achievement score, a semi-partial correlation was used. By controling for any effects of IQ on Achievement, I can adress the researchers new question. First, I examined the correlation between IQ and Teaching Format (model2). A p-value: 0.7067 indicated that there was not a significant relatioship between IQ and Teaching Format. Next, I examined the correlation between Achievement and IQ which had a p-value: 5.67e-06 indicating a significant relationship. With this in mind, I preformed a semi-partial correlation anaylsis between Achievement and Teaching Format controling for the effects of IQ on Achievement. The p-value of the semi-partial correlation (p = 0.206) was greater than the zero-order correlation p-value (p = 0.108), suggesting that even when accounting for IQ, there is a insignificant relatioship between Achievement and Teaching Format. Another avenue of approach I took was to look at the Pr(>|t|) value from the summary aput of a model incoroporating all 3 variables. A p-value or 0.035 for Teaching Format indicates there there is a significant difference of Achievement among the different groups of Teaching Formats. This was confirmed through an ANCOVA analysis. These conflicting results leave me at an impass on answering this question.

  1. For the model in #3 do we have heteroscedasticity and/or nonlinearity? How do you know?

Assumptions of homoscedasticity and linearity were met.

Question 1

#packages used: DPLYR, CAR, STAT, GGPLOT, PPCOR, MULTCOMP, GVLMA, KNITR, MAGRITTR
#Data was edited in Excel to ensure that Teaching Format was consistantly formated (le -> LE etc.)
library(dplyr) 
library(ggplot2)
data<-read.csv("F:/Grad School/FALL 2019/EDPS 942_Correlation Methods/Assignment 3/CA3.csv")

#filtering out all -99 in both IQ and Achieve columns
data %>% filter(IQ!=-99,achieve!=-99)->data
shapiro.test(data$achieve)#p-value = 0.8861
## 
##  Shapiro-Wilk normality test
## 
## data:  data$achieve
## W = 0.97948, p-value = 0.8861
model1<-lm(data=data,achieve~teach_form)
summary(model1)#p-value: 0.1081
## 
## Call:
## lm(formula = achieve ~ teach_form, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5000  -4.7500  -0.1667   4.2500  12.1667 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    33.500      1.970  17.004 3.84e-14 ***
## teach_formLE   -4.667      2.786  -1.675    0.108    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.825 on 22 degrees of freedom
## Multiple R-squared:  0.1131, Adjusted R-squared:  0.07278 
## F-statistic: 2.805 on 1 and 22 DF,  p-value: 0.1081
#boxplots
ggplot()+geom_boxplot(data=data,aes(x=teach_form,y=achieve))+theme_classic()

Question 2

shapiro.test(data$IQ)#p-value = 0.7392
## 
##  Shapiro-Wilk normality test
## 
## data:  data$IQ
## W = 0.97293, p-value = 0.7392
car::qqPlot(data$IQ)#data achieve assumption of normality for parametric testing

## [1]  6 17
model3<-lm(data=data,achieve~IQ)
summary(model3)#p-value: 5.67e-06
## 
## Call:
## lm(formula = achieve ~ IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.808 -3.324 -0.131  3.150 10.333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.42861    8.74144  -2.337   0.0289 *  
## IQ            0.46016    0.07753   5.935 5.67e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.493 on 22 degrees of freedom
## Multiple R-squared:  0.6156, Adjusted R-squared:  0.5981 
## F-statistic: 35.23 on 1 and 22 DF,  p-value: 5.67e-06

Question 3

library(multcomp)
model2<-lm(data=data,IQ~teach_form)
summary(model2)#p-value: 0.7067
## 
## Call:
## lm(formula = IQ ~ teach_form, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.083  -9.104  -0.125   9.333  21.917 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   113.083      3.555  31.809   <2e-16 ***
## teach_formLE   -1.917      5.028  -0.381    0.707    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.32 on 22 degrees of freedom
## Multiple R-squared:  0.006563,   Adjusted R-squared:  -0.03859 
## F-statistic: 0.1453 on 1 and 22 DF,  p-value: 0.7067
model3<-lm(data=data,achieve~IQ)
summary(model3)#p-value: 5.67e-06
## 
## Call:
## lm(formula = achieve ~ IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.808 -3.324 -0.131  3.150 10.333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.42861    8.74144  -2.337   0.0289 *  
## IQ            0.46016    0.07753   5.935 5.67e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.493 on 22 degrees of freedom
## Multiple R-squared:  0.6156, Adjusted R-squared:  0.5981 
## F-statistic: 35.23 on 1 and 22 DF,  p-value: 5.67e-06
ppcor::spcor.test(data$achieve,data$teach_form,data$IQ)#semi-partial of Achievement and Teaching Format controling IQ on Achievement
##    estimate   p.value statistic  n gp  Method
## 1 -0.273642 0.2064316 -1.303747 24  1 pearson
model4<-lm(data=data,achieve~teach_form+IQ)
summary(model4)
## 
## Call:
## lm(formula = achieve ~ teach_form + IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.970 -2.771 -1.387  2.771  8.244 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.06115    8.16650  -2.089    0.049 *  
## teach_formLE  -3.80970    1.69033  -2.254    0.035 *  
## IQ             0.44711    0.07144   6.258 3.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.127 on 21 degrees of freedom
## Multiple R-squared:  0.6904, Adjusted R-squared:  0.661 
## F-statistic: 23.42 on 2 and 21 DF,  p-value: 4.496e-06
#Ancova
posth=glht(model4, linfct=mcp(teach_form="Tukey"))  ##gives the post-hoc Tukey analysis
summary(posth)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = achieve ~ teach_form + IQ, data = data)
## 
## Linear Hypotheses:
##              Estimate Std. Error t value Pr(>|t|)  
## LE - AL == 0    -3.81       1.69  -2.254    0.035 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Question 4

library(gvlma)
gvmodel<-gvlma(model4)
summary(gvmodel)
## 
## Call:
## lm(formula = achieve ~ teach_form + IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.970 -2.771 -1.387  2.771  8.244 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.06115    8.16650  -2.089    0.049 *  
## teach_formLE  -3.80970    1.69033  -2.254    0.035 *  
## IQ             0.44711    0.07144   6.258 3.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.127 on 21 degrees of freedom
## Multiple R-squared:  0.6904, Adjusted R-squared:  0.661 
## F-statistic: 23.42 on 2 and 21 DF,  p-value: 4.496e-06
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = model4) 
## 
##                      Value p-value                Decision
## Global Stat        2.44840  0.6539 Assumptions acceptable.
## Skewness           0.90651  0.3410 Assumptions acceptable.
## Kurtosis           0.87866  0.3486 Assumptions acceptable.
## Link Function      0.63856  0.4242 Assumptions acceptable.
## Heteroscedasticity 0.02467  0.8752 Assumptions acceptable.
plot(model4,1)