EPDS 942 Assignment 3

Background: A researcher is interested in whether an active learning environment is more or less effective in increasing achievement in science than a lecture format. To this end, the researcher assesses science achievement using a fifty-point test. The study involves 30 male and female students in sciences classes. The reseracher randomly assigns the students to one of the two groups(AL: active learning. LE: Lecture), implements the learning environment and at the ned of the semester administers the science achievement test. In addition to science acheivement test scores (achieve), the investigator obtains the students’ IQ scores. For all variables a code of -99 is used to represent missing scores.

Do students in the active learning environment perform differently on the science achievement test than do students in the lecture format or are an differences due to chance alone? How do you know?

To test if the Teaching Format affects Achievement, a simple linear regression model (model1) was used. This can also be thought of as an anova in our case, as both the linear regression and anova had the same p-value = 0.1081. A p-value abot our alpha of 0.07 signifies that there is no significant difference between the means of Active Learning (AL) and Lecture (LE) Teaching Formats. A boxplot was also made to depict that though there is a difference means, there is a non-significant difference. To summarize, we fail to reject the null hypothesis that there is no difference between sample means and can conclude that there is no difference in sample means due to chance alone.

The reseracher presents the results at a conference. One of the audience members asks about whether the students’ IQ scores could possibly explain the results. That is, even though the investigator used random assignment to groups it could still be that IQ could account for the results. What is the proportion of variability in Achievement scores accounted for by IQ scores?

IQ acounts for approzimately 60% of the variability in Achievement scores (Multiple R-squared: 0.6156,Adjusted R-squared: 0.5981 )

To address the audience member’s comment the investigator rereuns the analyses. However, this time the model includes both the students’ IQ as well as the learning environment (Teaching Format) to which the person was assigned. Do students in the active learning environment perform differently on the science achievement test then do students in the lecture format or any any differences due to chance alone? How do you know?

To investigate if a student’s IQ could explain their Achievement score, a semi-partial correlation was used. By controling for any effects of IQ on Achievement, I can adress the researchers new question. First, I examined the correlation between IQ and Teaching Format (model2). A p-value: 0.7067 indicated that there was not a significant relatioship between IQ and Teaching Format. Next, I examined the correlation between Achievement and IQ which had a p-value: 5.67e-06 indicating a significant relationship. With this in mind, I preformed a semi-partial correlation anaylsis between Achievement and Teaching Format controling for the effects of IQ on Achievement. The p-value of the semi-partial correlation (p = 0.206) was greater than the zero-order correlation p-value (p = 0.108), suggesting that even when accounting for IQ, there is a insignificant relatioship between Achievement and Teaching Format.

For the model in #3 do we have heteroscedasticity and/or nonlinearity? How do you know?

Assumptions of homoscedasticity and linearity were met.

Question 1

#packages used: DPLYR, CAR, STAT, GGPLOT, PPCOR, MULTCOMP, GVLMA, KNITR, MAGRITTR
#Data was edited in Excel to ensure that Teaching Format was consistantly formated (le -> LE etc.)
library(dplyr) 
library(ggplot2)
data<-read.csv("C:/Users/qdean2/Desktop/cad3.csv")

#filtering out all -99 in both IQ and Achieve columns
data %>% filter(IQ!=-99,achieve!=-99)->data
shapiro.test(data$achieve)#p-value = 0.8861

## 
##  Shapiro-Wilk normality test
## 
## data:  data$achieve
## W = 0.97948, p-value = 0.8861

model1<-lm(data=data,achieve~teach.form)
summary(model1)#p-value: 0.1081

## 
## Call:
## lm(formula = achieve ~ teach.form, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3333  -4.6146  -0.8333   5.3438  10.6667 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.0000     4.8358   7.031 1.08e-06 ***
## teach.formAl  -9.0000     8.3758  -1.075    0.296    
## teach.formAL   0.3333     5.3462   0.062    0.951    
## teach.formle  -8.2500     5.9226  -1.393    0.180    
## teach.formLE  -3.6250     5.4066  -0.670    0.511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.839 on 19 degrees of freedom
## Multiple R-squared:  0.2308, Adjusted R-squared:  0.06892 
## F-statistic: 1.426 on 4 and 19 DF,  p-value: 0.2638

#boxplots
ggplot()+geom_boxplot(data=data,aes(x=teach.form,y=achieve))+theme_classic()

Question 2

shapiro.test(data$IQ)#p-value = 0.7392

## 
##  Shapiro-Wilk normality test
## 
## data:  data$IQ
## W = 0.97293, p-value = 0.7392

car::qqPlot(data$IQ)#data achieve assumption of normality for parametric testing

## [1]  6 17

model3<-lm(data=data,achieve~IQ)
summary(model3)#p-value: 5.67e-06

## 
## Call:
## lm(formula = achieve ~ IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.808 -3.324 -0.131  3.150 10.333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.42861    8.74144  -2.337   0.0289 *  
## IQ            0.46016    0.07753   5.935 5.67e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.493 on 22 degrees of freedom
## Multiple R-squared:  0.6156, Adjusted R-squared:  0.5981 
## F-statistic: 35.23 on 1 and 22 DF,  p-value: 5.67e-06

Question 3

library(multcomp)
model2<-lm(data=data,IQ~teach.form)
summary(model2)#p-value: 0.7067

## 
## Call:
## lm(formula = IQ ~ teach.form, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.2222  -7.9375   0.8889   7.0625  21.7778 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   117.000      9.124  12.824 8.37e-11 ***
## teach.formAl  -13.000     15.802  -0.823    0.421    
## teach.formAL   -3.778     10.086  -0.375    0.712    
## teach.formle   -9.000     11.174  -0.805    0.431    
## teach.formLE   -4.250     10.200  -0.417    0.682    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.9 on 19 degrees of freedom
## Multiple R-squared:  0.05823,    Adjusted R-squared:  -0.14 
## F-statistic: 0.2937 on 4 and 19 DF,  p-value: 0.8784

model3<-lm(data=data,achieve~IQ)
summary(model3)#p-value: 5.67e-06

## 
## Call:
## lm(formula = achieve ~ IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.808 -3.324 -0.131  3.150 10.333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.42861    8.74144  -2.337   0.0289 *  
## IQ            0.46016    0.07753   5.935 5.67e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.493 on 22 degrees of freedom
## Multiple R-squared:  0.6156, Adjusted R-squared:  0.5981 
## F-statistic: 35.23 on 1 and 22 DF,  p-value: 5.67e-06

ppcor::spcor.test(data$achieve,data$teach.form,data$IQ)#semi-partial of Achievement and Teaching Format controling IQ on Achievement

##     estimate   p.value  statistic  n gp  Method
## 1 -0.1543627 0.4819023 -0.7159599 24  1 pearson

model4<-lm(data=data,achieve~teach.form+IQ)
summary(model4)

## 
## Call:
## lm(formula = achieve ~ teach.form + IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.919 -2.409 -1.172  3.122  7.169 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -15.97596    9.14067  -1.748   0.0975 .  
## teach.formAl  -3.44712    5.18502  -0.665   0.5146    
## teach.formAL   1.94699    3.26409   0.596   0.5583    
## teach.formle  -4.40570    3.66375  -1.203   0.2447    
## teach.formLE  -1.80963    3.30384  -0.548   0.5906    
## IQ             0.42714    0.07397   5.775 1.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.16 on 18 degrees of freedom
## Multiple R-squared:  0.7304, Adjusted R-squared:  0.6555 
## F-statistic: 9.751 on 5 and 18 DF,  p-value: 0.0001225

#Ancova
posth=glht(model4, linfct=mcp(teach.form="Tukey"))  ##gives the post-hoc Tukey analysis
summary(posth)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = achieve ~ teach.form + IQ, data = data)
## 
## Linear Hypotheses:
##              Estimate Std. Error t value Pr(>|t|)
## Al - al == 0  -3.4471     5.1850  -0.665    0.958
## AL - al == 0   1.9470     3.2641   0.596    0.971
## le - al == 0  -4.4057     3.6637  -1.203    0.735
## LE - al == 0  -1.8096     3.3038  -0.548    0.979
## AL - Al == 0   5.3941     4.4379   1.215    0.728
## le - Al == 0  -0.9586     4.6605  -0.206    1.000
## LE - Al == 0   1.6375     4.4597   0.367    0.995
## le - AL == 0  -6.3527     2.5296  -2.511    0.122
## LE - AL == 0  -3.7566     2.0217  -1.858    0.355
## LE - le == 0   2.5961     2.5716   1.009    0.837
## (Adjusted p values reported -- single-step method)

Question 4

library(gvlma)
gvmodel<-gvlma(model4)
summary(gvmodel)

## 
## Call:
## lm(formula = achieve ~ teach.form + IQ, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.919 -2.409 -1.172  3.122  7.169 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -15.97596    9.14067  -1.748   0.0975 .  
## teach.formAl  -3.44712    5.18502  -0.665   0.5146    
## teach.formAL   1.94699    3.26409   0.596   0.5583    
## teach.formle  -4.40570    3.66375  -1.203   0.2447    
## teach.formLE  -1.80963    3.30384  -0.548   0.5906    
## IQ             0.42714    0.07397   5.775 1.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.16 on 18 degrees of freedom
## Multiple R-squared:  0.7304, Adjusted R-squared:  0.6555 
## F-statistic: 9.751 on 5 and 18 DF,  p-value: 0.0001225
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = model4) 
## 
##                       Value p-value                Decision
## Global Stat        0.947503  0.9176 Assumptions acceptable.
## Skewness           0.138027  0.7103 Assumptions acceptable.
## Kurtosis           0.697832  0.4035 Assumptions acceptable.
## Link Function      0.102537  0.7488 Assumptions acceptable.
## Heteroscedasticity 0.009106  0.9240 Assumptions acceptable.

plot(model4,1)

EPDS 942 Assignment 3

Quintin Dean

December 10, 2019