1. Setting

System under test

The data set tested comes from the Ecdat package in R and the data set is named Mathlevel. The data set is a cross-section from 1983-1986 of 609 individuals in the United States.

## Loading required package: Ecfun
## 
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
## 
##     sign
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths

Factors and Levels

The data set contains the following factors:

mathlevel:higest level of math attained (170,171a,172,171b,172b,221a,221b)

language: foreign langauge proficiency (yes,no)

sex: levels are male,female

major: level is one of the following majors one of other,eco,oss(other social sciences),ns (natural sciences), hum (humanities)

mathcourse: level is number of courses in adbanced math (0 to 3)

physsiccourse: level is number of courses in physics (0,2)

chemistcourse: level is number of courses in chemistry (0,2)

Continous variables

There is one continous variable in the data set: sat: SAT Math Score

Response variable

The response is the SAT Math Score sat.

The Data: How is it organized and what does it look like?

Below are the first 6 observations of the Mathlevel data set

##   mathlevel sat language    sex major mathcourse physiccourse
## 1       170 670       no   male    ns          1            2
## 2       170 660       no   male other          1            1
## 3       170 610       no female   eco          1            0
## 4       170 620      yes   male   eco          1            0
## 5       170 430       no   male   eco          0            1
## 6       170 580       no female   oss          2            1
##   chemistcourse
## 1             1
## 2             1
## 3             1
## 4             1
## 5             1
## 6             1

Below is the structure of the data set

## 'data.frame':    609 obs. of  8 variables:
##  $ mathlevel    : Ord.factor w/ 7 levels "170"<"171a"<"172a"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sat          : int  670 660 610 620 430 580 550 510 560 670 ...
##  $ language     : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 2 1 2 1 ...
##  $ sex          : Factor w/ 2 levels "male","female": 1 1 2 1 1 2 2 2 1 1 ...
##  $ major        : Factor w/ 5 levels "other","eco",..: 4 1 2 2 2 3 1 2 5 3 ...
##  $ mathcourse   : num  1 1 1 1 0 2 1 1 1 1 ...
##  $ physiccourse : num  2 1 0 0 1 1 0 1 0 0 ...
##  $ chemistcourse: num  1 1 1 1 1 1 1 1 0 1 ...

This data set consists of 609 observations of 8 variables.

2. (Experimental) Design

The purpose of this experiment is to observe the impact of a multitude of factors on the highest math level of a student. A fractional factorial design is used.

How will the expreriment be organized and conducted to test the hypothesis?

A \(2^{6-3}\) fractional factorial design will be used. The two 2-level factors for this experiment are language and sex. The two 3-level factors for this experiment are physsiccourse and chemistcourse. Both of these three-level factors will be decomposed to 2-level factors.

Randomize: What is the Randomization Scheme?

There is no randomization in the data set.

Replicate: Are there any replicates or repeated measures?

There are no replicates in the original dataset.

Block: Did you use blocking in the design?

Blocking is not used in this design. However, it is possible to block if necessary.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Before decomposing the two 3-level factors, exploratory data analysis will be conducted.

boxplot(SMS~S,xlab="Sex of Student",ylab="SAT Math Score")

boxplot(SMS~FL,xlab="Is the student proficient in a foreign language",ylab="SAT Math Score")

boxplot(SMS~PC,xlab= "Number of physic courses the student has taken",ylab="SAT Math Score")

boxplot(SMS~CC,xlab= "Number of chemistry courses the student has taken",ylab="SAT Math Score")

The two 3-level factors are each split into two 2-level factors. The new factors are Atmost1phy,Atleast1phy,Atmost1chem, and Atleast1chem which refers to have at least 1 or at most 1 physic or chemistry course.

Atmost1phy <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atleast1phy <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atmost1chem <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atleast1chem <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
project3 <- data.frame(Mathlevel$sat,Mathlevel$language,Mathlevel$sex,Mathlevel$physiccourse,Mathlevel$chemistcourse)
colnames<-c("sat","language","sex","physiccourse","chemistcourse")
colnames(project3)<-colnames
Mathlevel2 <- data.frame(project3,Atmost1phy,Atleast1phy,Atmost1chem,Atleast1chem)

for (i in 1:609){
  if(Mathlevel2$physiccourse[i]==1){
    Mathlevel2$Atmost1phy[i] <- 1
    Mathlevel2$Atleast1phy[i] <- 1
  }
  else if (Mathlevel2$physiccourse[i]==0){
    Mathlevel2$Atmost1phy[i] <- 0
    Mathlevel2$Atleast1phy[i] <- 1
  }
  else{
    Mathlevel2$Atmost1phy[i] <- 1
    Mathlevel2$Atleast1phy[i] <- 0
  }
   if(Mathlevel2$chemistcourse[i]==1){
    Mathlevel2$Atmost1chem[i] <- 1
    Mathlevel2$Atleast1chem[i] <- 1
  }
  else if (Mathlevel2$chemistcourse[i]==0){
    Mathlevel2$Atmost1chem[i] <- 0
    Mathlevel2$Atleast1chem[i] <- 1
  }
  else{
    Mathlevel2$Atmost1chem[i] <- 1
    Mathlevel2$Atleast1chem[i] <- 0
  }
}

head(Mathlevel2$sat~(Mathlevel2))
## Mathlevel2$sat ~ (Mathlevel2)

Testing

The full factorial expirimental design of 2^6 = 64 runs is below:

expand.grid(language=c(0,1),sex=c(0,1),Atmost1phy=c(0,1),Atleast1phy=c(0,1),Atmost1chem=c(0,1),Atleast1chem=c(0,1))
##    language sex Atmost1phy Atleast1phy Atmost1chem Atleast1chem
## 1         0   0          0           0           0            0
## 2         1   0          0           0           0            0
## 3         0   1          0           0           0            0
## 4         1   1          0           0           0            0
## 5         0   0          1           0           0            0
## 6         1   0          1           0           0            0
## 7         0   1          1           0           0            0
## 8         1   1          1           0           0            0
## 9         0   0          0           1           0            0
## 10        1   0          0           1           0            0
## 11        0   1          0           1           0            0
## 12        1   1          0           1           0            0
## 13        0   0          1           1           0            0
## 14        1   0          1           1           0            0
## 15        0   1          1           1           0            0
## 16        1   1          1           1           0            0
## 17        0   0          0           0           1            0
## 18        1   0          0           0           1            0
## 19        0   1          0           0           1            0
## 20        1   1          0           0           1            0
## 21        0   0          1           0           1            0
## 22        1   0          1           0           1            0
## 23        0   1          1           0           1            0
## 24        1   1          1           0           1            0
## 25        0   0          0           1           1            0
## 26        1   0          0           1           1            0
## 27        0   1          0           1           1            0
## 28        1   1          0           1           1            0
## 29        0   0          1           1           1            0
## 30        1   0          1           1           1            0
## 31        0   1          1           1           1            0
## 32        1   1          1           1           1            0
## 33        0   0          0           0           0            1
## 34        1   0          0           0           0            1
## 35        0   1          0           0           0            1
## 36        1   1          0           0           0            1
## 37        0   0          1           0           0            1
## 38        1   0          1           0           0            1
## 39        0   1          1           0           0            1
## 40        1   1          1           0           0            1
## 41        0   0          0           1           0            1
## 42        1   0          0           1           0            1
## 43        0   1          0           1           0            1
## 44        1   1          0           1           0            1
## 45        0   0          1           1           0            1
## 46        1   0          1           1           0            1
## 47        0   1          1           1           0            1
## 48        1   1          1           1           0            1
## 49        0   0          0           0           1            1
## 50        1   0          0           0           1            1
## 51        0   1          0           0           1            1
## 52        1   1          0           0           1            1
## 53        0   0          1           0           1            1
## 54        1   0          1           0           1            1
## 55        0   1          1           0           1            1
## 56        1   1          1           0           1            1
## 57        0   0          0           1           1            1
## 58        1   0          0           1           1            1
## 59        0   1          0           1           1            1
## 60        1   1          0           1           1            1
## 61        0   0          1           1           1            1
## 62        1   0          1           1           1            1
## 63        0   1          1           1           1            1
## 64        1   1          1           1           1            1

It’s apparent that this would take a while to complete this analysis. Therefore, it will be approximated with a fraction factorial design. The fractional factorial design chosen is a 2^6-3 design by utilizing the R package FrF2. The highest possible resolution for this design is Resolution III. The FRF2 package is also used to determine the aliasing structure of the fractional factorial design and the generators of the fractional factorial design.

design <-FrF2(8,factor.names = c("language","sex","Atmost1phy","Atleast1phy","Atmost1chem","Atleast1chem"),default.levels = c("0","1"))
design
##   language sex Atmost1phy Atleast1phy Atmost1chem Atleast1chem
## 1        1   1          0           1           0            0
## 2        0   1          1           0           0            1
## 3        0   0          1           1           0            0
## 4        0   0          0           1           1            1
## 5        0   1          0           0           1            0
## 6        1   0          0           0           0            1
## 7        1   0          1           0           1            0
## 8        1   1          1           1           1            1
## class=design, type= FrF2
summary(design)
## Call:
## FrF2(8, factor.names = c("language", "sex", "Atmost1phy", "Atleast1phy", 
##     "Atmost1chem", "Atleast1chem"), default.levels = c("0", "1"))
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   language sex Atmost1phy Atleast1phy Atmost1chem Atleast1chem
## 1        0   0          0           0           0            0
## 2        1   1          1           1           1            1
## 
## Design generating information:
## $legend
## [1] A=language     B=sex          C=Atmost1phy   D=Atleast1phy 
## [5] E=Atmost1chem  F=Atleast1chem
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   language sex Atmost1phy Atleast1phy Atmost1chem Atleast1chem
## 1        1   1          0           1           0            0
## 2        0   1          1           0           0            1
## 3        0   0          1           1           0            0
## 4        0   0          0           1           1            1
## 5        0   1          0           0           1            0
## 6        1   0          0           0           0            1
## 7        1   0          1           0           1            0
## 8        1   1          1           1           1            1
## class=design, type= FrF2
aliasprint(design)
## $legend
## [1] A=language     B=sex          C=Atmost1phy   D=Atleast1phy 
## [5] E=Atmost1chem  F=Atleast1chem
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

From the above result, it’s apparent that the generators of the fractional factorial design are D=AB,E=AC,F=BC. From the aliasing structure, it’s apparent that the main effects are aliased with some of the two factor interations, which are highly aliased with each other: AF=BE=CD. Because the design is a Resolution III design, only the main effects are a reliable source of analysis.

6020

From the aliased interactions (D=AB, E=AC, and F= BC), the generator I is calculated as I=ABD=ACE=BCF

Estimation

The main and interaction effects can be estimated by using a linear model and ANOVA analysis. This can be seen below, with all of the main effects and all of the two-factor interaction effects:

fit <- lm(Mathlevel2$sat~(Mathlevel2$language+Mathlevel2$sex+Mathlevel2$Atmost1phy+Mathlevel2$Atleast1phy+Mathlevel2$Atmost1chem+Mathlevel2$Atleast1chem)^2,data = Mathlevel2)
anova(fit)
## Analysis of Variance Table
## 
## Response: Mathlevel2$sat
##                                                 Df  Sum Sq Mean Sq F value
## Mathlevel2$language                              1   11963   11963  3.5144
## Mathlevel2$sex                                   1   66461   66461 19.5238
## Mathlevel2$Atmost1phy                            1   24640   24640  7.2384
## Mathlevel2$Atleast1phy                           1      75      75  0.0221
## Mathlevel2$Atmost1chem                           1     785     785  0.2306
## Mathlevel2$Atleast1chem                          1   12893   12893  3.7873
## Mathlevel2$language:Mathlevel2$sex               1    5161    5161  1.5162
## Mathlevel2$language:Mathlevel2$Atmost1phy        1    3010    3010  0.8841
## Mathlevel2$language:Mathlevel2$Atleast1phy       1    3927    3927  1.1535
## Mathlevel2$language:Mathlevel2$Atmost1chem       1    5475    5475  1.6082
## Mathlevel2$language:Mathlevel2$Atleast1chem      1       2       2  0.0004
## Mathlevel2$sex:Mathlevel2$Atmost1phy             1    6618    6618  1.9440
## Mathlevel2$sex:Mathlevel2$Atleast1phy            1    8810    8810  2.5879
## Mathlevel2$sex:Mathlevel2$Atmost1chem            1      92      92  0.0269
## Mathlevel2$sex:Mathlevel2$Atleast1chem           1    1083    1083  0.3181
## Mathlevel2$Atmost1phy:Mathlevel2$Atmost1chem     1    1115    1115  0.3276
## Mathlevel2$Atmost1phy:Mathlevel2$Atleast1chem    1     530     530  0.1558
## Mathlevel2$Atleast1phy:Mathlevel2$Atmost1chem    1       3       3  0.0007
## Mathlevel2$Atleast1phy:Mathlevel2$Atleast1chem   1     343     343  0.1008
## Residuals                                      589 2005025    3404        
##                                                   Pr(>F)    
## Mathlevel2$language                             0.061332 .  
## Mathlevel2$sex                                 1.183e-05 ***
## Mathlevel2$Atmost1phy                           0.007339 ** 
## Mathlevel2$Atleast1phy                          0.881890    
## Mathlevel2$Atmost1chem                          0.631250    
## Mathlevel2$Atleast1chem                         0.052117 .  
## Mathlevel2$language:Mathlevel2$sex              0.218691    
## Mathlevel2$language:Mathlevel2$Atmost1phy       0.347461    
## Mathlevel2$language:Mathlevel2$Atleast1phy      0.283258    
## Mathlevel2$language:Mathlevel2$Atmost1chem      0.205243    
## Mathlevel2$language:Mathlevel2$Atleast1chem     0.983238    
## Mathlevel2$sex:Mathlevel2$Atmost1phy            0.163760    
## Mathlevel2$sex:Mathlevel2$Atleast1phy           0.108216    
## Mathlevel2$sex:Mathlevel2$Atmost1chem           0.869807    
## Mathlevel2$sex:Mathlevel2$Atleast1chem          0.572944    
## Mathlevel2$Atmost1phy:Mathlevel2$Atmost1chem    0.567274    
## Mathlevel2$Atmost1phy:Mathlevel2$Atleast1chem   0.693169    
## Mathlevel2$Atleast1phy:Mathlevel2$Atmost1chem   0.978278    
## Mathlevel2$Atleast1phy:Mathlevel2$Atleast1chem  0.750933    
## Residuals                                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Diagnostic and Model Adequacy Checking

To look at just the main effects:

fit2 <- lm(Mathlevel2$sat~Mathlevel2$language+Mathlevel2$sex+Mathlevel2$Atmost1phy+Mathlevel2$Atleast1phy+Mathlevel2$Atmost1chem+Mathlevel2$Atleast1chem,data = Mathlevel2)
anova(fit2)
## Analysis of Variance Table
## 
## Response: Mathlevel2$sat
##                          Df  Sum Sq Mean Sq F value    Pr(>F)    
## Mathlevel2$language       1   11963   11963  3.5283  0.060812 .  
## Mathlevel2$sex            1   66461   66461 19.6012 1.133e-05 ***
## Mathlevel2$Atmost1phy     1   24640   24640  7.2671  0.007219 ** 
## Mathlevel2$Atleast1phy    1      75      75  0.0222  0.881657    
## Mathlevel2$Atmost1chem    1     785     785  0.2315  0.630571    
## Mathlevel2$Atleast1chem   1   12893   12893  3.8023  0.051645 .  
## Residuals               602 2041192    3391                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The conclusion remains the same because the main effects are aliased with the two-factor interactions, Below are the plots of the residuals.

qqnorm(residuals(fit))
qqline(residuals(fit))

plot(fitted(fit),residuals(fit))

The Q-Q plot shows the data probably follows a normal distribution. ##4. References to the literature [1]https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

5. Appendices

https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

The complete code for the project can be seen below:

library(Ecdat)
library(FrF2)
data("Mathlevel")
head(Mathlevel)
str(Mathlevel)
S <- as.factor(Mathlevel$sex)
FL <- as.factor(Mathlevel$language)
PC <- as.factor(Mathlevel$physiccourse)
CC <- as.factor(Mathlevel$chemistcourse)
SMS <- as.numeric(Mathlevel$sat)
#Boxplots
boxplot(SMS~S,xlab="Sex of Student",ylab="SAT Math Score")
boxplot(SMS~FL,xlab="Is the student proficient in a foreign language",ylab="SAT Math Score")
boxplot(SMS~PC,xlab= "Number of physic courses the student has taken",ylab="SAT Math Score")
boxplot(SMS~CC,xlab= "Number of chemistry courses the student has taken",ylab="SAT Math Score")

Atmost1phy <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atleast1phy <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atmost1chem <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
Atleast1chem <- matrix(0,nrow=nrow(Mathlevel),ncol=1)
#Placing relavent information into a new data frame for easier analysis
project3 <- data.frame(Mathlevel$sat,Mathlevel$language,Mathlevel$sex,Mathlevel$physiccourse,Mathlevel$chemistcourse)
colnames<-c("sat","language","sex","physiccourse","chemistcourse")
colnames(project3)<-colnames
Mathlevel2 <- data.frame(project3,Atmost1phy,Atleast1phy,Atmost1chem,Atleast1chem)
#Decomposing for 3-factor levels to 2-factor levels
for (i in 1:609){
  if(Mathlevel2$physiccourse[i]==1){
    Mathlevel2$Atmost1phy[i] <- 1
    Mathlevel2$Atleast1phy[i] <- 1
  }
  else if (Mathlevel2$physiccourse[i]==0){
    Mathlevel2$Atmost1phy[i] <- 0
    Mathlevel2$Atleast1phy[i] <- 1
  }
  else{
    Mathlevel2$Atmost1phy[i] <- 1
    Mathlevel2$Atleast1phy[i] <- 0
  }
   if(Mathlevel2$chemistcourse[i]==1){
    Mathlevel2$Atmost1chem[i] <- 1
    Mathlevel2$Atleast1chem[i] <- 1
  }
  else if (Mathlevel2$chemistcourse[i]==0){
    Mathlevel2$Atmost1chem[i] <- 0
    Mathlevel2$Atleast1chem[i] <- 1
  }
  else{
    Mathlevel2$Atmost1chem[i] <- 1
    Mathlevel2$Atleast1chem[i] <- 0
  }
}

head(Mathlevel2)
expand.grid(language=c(0,1),sex=c(0,1),Atmost1phy=c(0,1),Atleast1phy=c(0,1),Atmost1chem=c(0,1),Atleast1chem=c(0,1))
design <-FrF2(8,factor.names = c("language","sex","Atmost1phy","Atleast1phy","Atmost1chem","Atleast1chem"),default.levels = c("0","1"))
design
summary(design)
aliasprint(design)
#Anova and linear model
#main and interaction
fit <- lm(Mathlevel2$sat~(Mathlevel2$language+Mathlevel2$sex+Mathlevel2$Atmost1phy+Mathlevel2$Atleast1phy+Mathlevel2$Atmost1chem+Mathlevel2$Atleast1chem)^2,data = Mathlevel2)
anova(fit)
#just main
fit2 <- lm(Mathlevel2$sat~Mathlevel2$language+Mathlevel2$sex+Mathlevel2$Atmost1phy+Mathlevel2$Atleast1phy+Mathlevel2$Atmost1chem+Mathlevel2$Atleast1chem,data = Mathlevel2)
anova(fit2)
qqnorm(residuals(fit))
qqline(residuals(fit))
plot(fitted(fit),residuals(fit))