Bok, Joonhyuk

Rensselaer Polytechnic Institute

Dec 09 2016 version 3

1.1 System under test

Among a dataset from the Ecdat R Package, we select “Mathlevel” which would be useful for expecting SAT Math Score.

In Mathlevel data, ‘language’, ‘sex’, ‘physiccourse’ and ‘chemistcourse’ are selected as factors, which could explain the result of SAT Math Score and have 2 levels, 2 levels, 3 levels and 3 levels respectively. And ‘sat’ is chosen as a response variable.

The orginal dataset was collected for examining effect of level of math on learning Economics and consist of 8 variables and 609 observations. Original dataframe containing:

mathlevel highest level of math attained , an ordered factor with levels 170, 171a, 172, 171b, 172b, 221a, 221b
sat sat math score
language status of foreign language proficiency
sex male, female
major one of other, eco, oss (other social sciences), ns (natural sciences), hum (humanities)
mathcourse number of courses in advanced math (0 to 3)
physiccourse number of courses in physics (0 to 2)
chemistcourse number of courses in chemistry (0 to 2)

We will use some factors among them in order to conduct the analysis for the Project 3.

We can inspect the head and tail of the dataframe in order to see what the data look like.

load("C:/Users/bokjh3/Desktop/Ecdat_0.2-9/Ecdat/data/Mathlevel.rda")
head(Mathlevel, n=10)
##    mathlevel sat language    sex major mathcourse physiccourse
## 1        170 670       no   male    ns          1            2
## 2        170 660       no   male other          1            1
## 3        170 610       no female   eco          1            0
## 4        170 620      yes   male   eco          1            0
## 5        170 430       no   male   eco          0            1
## 6        170 580       no female   oss          2            1
## 7        170 550      yes female other          1            0
## 8        170 510       no female   eco          1            1
## 9        170 560      yes   male   hum          1            0
## 10       170 670       no   male   oss          1            0
##    chemistcourse
## 1              1
## 2              1
## 3              1
## 4              1
## 5              1
## 6              1
## 7              1
## 8              1
## 9              0
## 10             1
tail(Mathlevel, n=10)
##     mathlevel sat language    sex major mathcourse physiccourse
## 600      221b 660       no female    ns          2            1
## 601      221b 670       no female    ns          2            1
## 602      221b 670       no   male other          2            1
## 603      221b 660       no   male    ns          1            0
## 604      221b 590       no female    ns          2            1
## 605      221b 580       no female   oss          2            1
## 606      221b 770       no   male   oss          2            1
## 607      221b 660       no   male other          2            1
## 608      221b 710       no female   eco          2            0
## 609      221b 590       no female   oss          2            0
##     chemistcourse
## 600             1
## 601             1
## 602             1
## 603             1
## 604             1
## 605             1
## 606             1
## 607             1
## 608             1
## 609             1

1.2 Factor and Levels

4 factors selected are language, sex, physiccourse and chemistcourse. We convert variables of physiccourse and chemistcourse from numerics to factors for the analysis.

language status of foreign language proficiency
sex male, female
physiccourse number of courses in physics (0 to 2)
chemistcourse number of courses in chemistry (0 to 2)

Mathlevel$physiccourse <- as.factor(Mathlevel$physiccourse)
Mathlevel$chemistcourse <- as.factor(Mathlevel$chemistcourse)
str(Mathlevel)
## 'data.frame':    609 obs. of  8 variables:
##  $ mathlevel    : Ord.factor w/ 7 levels "170"<"171a"<"172a"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sat          : int  670 660 610 620 430 580 550 510 560 670 ...
##  $ language     : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 2 1 2 1 ...
##  $ sex          : Factor w/ 2 levels "male","female": 1 1 2 1 1 2 2 2 1 1 ...
##  $ major        : Factor w/ 5 levels "other","eco",..: 4 1 2 2 2 3 1 2 5 3 ...
##  $ mathcourse   : num  1 1 1 1 0 2 1 1 1 1 ...
##  $ physiccourse : Factor w/ 3 levels "0","1","2": 3 2 1 1 2 2 1 2 1 1 ...
##  $ chemistcourse: Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 1 2 ...

1.3 Continous variables

Sat is a continous variable. In order to conduct ANOVA, a continuous dependent variable must be choosed as the response variable among the variables in the dataset.

1.4 Response variables

The response variable is sat. The sat is defined as SAT Math Score. Sat is a continuous variable. We investigate the effects on factors such as language, sex, physiccourse, and chemistcourse on the response variable, sat.

1.5 The Data: How is it organized and what does it look like?

The “Mathlevel” data is a cross-section data from 1983 to 1986 in United States. The number of observations is 609. The dataset is organized by the following variables: mathlevel, sat, language, sex, major, mathcourse, physiccourse and chemistcourse.

We select language, sex, physiccourse and chemistcourse as factors, which have 2, 2, 3 and 3 levles respectively, and sat as a response variable, which is a continous variable.

2. Experimental Design

2.1 How will the experiment be organized and conducted to test the hypothesis?

Fractional factorial design is conducted in order to reduce the computing power and runs necessary to reach an appropriate conclusion. A fractional factorial design of the 2m-3 type will be utilized in order to make the calculations simple.

The factors with 3 levels will be decomposed into factors with 2-level for calculating the required data needed to obtain appropriate data. For these, the sum of the variables with 2 levels will yield the value with 3 levels.

The results from the factional factorial design will be evaluated for aliasing in order to decide the true amount of data which comes from more limited fractional design. Main and interaction effects are calculated using ANOVA analysis.

2.2 What is the rationale for this design?

A fractional factorial model is utilized in order to reduce the computing power required to get an accurate solution. Alias testing is used to decide the accuracy and resolution of the design. This enables the researcher to know the limitations of this level of design, and determine if another method is more suitable for the research.

2.3 Randomize: What is the Randomization Scheme?

Randomization is a technique used to balance the effect of extraneous or uncontrollable conditions that can impact the results of an experiment. In this experiment, we do not consider randomization becasue we do not have control over data collection, but we assume that this dataset satisfies random design assumptions.

2.4 Replicate: Are there replicates and/or repeated measures?

Replicates are multiple experimental runs with the same factor settings (levels). Replicates are subject to the same sources of variability, independently of each other. We can replicate combinations of factor levels, groups of factor level combinations, or entire designs. There is no replication in this research.

2.5 Block: Did you use blocking in the design?

In experimental design, blocking is a technique used to deal with nuisance factors that may affect the results of the experiment. The experiment is organized into blocks, where the nuisance factor is maintained at a constant level in each block. Blocking is not utilized in this research.

3. Statistical Analysis

3.1 Graphics and descriptive summary

At first, we look at the histogram of sat to see if it satisfies the assumption of normality. As you can see, the distribution meets the assumption of normality.

hist(Mathlevel$sat, main = "SAT Math Score")

And then, the levels of each factor are shown in a boxplot to analyze the main effects of factors over the response variable, sat.

boxplot(Mathlevel$sat~Mathlevel$language, xlab="language", ylab="SAT Math Score")
title("Impact of language on SAT Math Score")

boxplot(Mathlevel$sat~Mathlevel$sex, xlab="sex", ylab="SAT Math Score")
title("Impact of sex on SAT Math Score")

boxplot(Mathlevel$sat~Mathlevel$physiccourse, xlab="number of courses in physics", ylab="SAT Math Score")
title("Impact of physiccourse on SAT Math Score")

boxplot(Mathlevel$sat~Mathlevel$chemistcourse, xlab="number of courses in chemistry", ylab="SAT Math Score")
title("Impact of chemistcourse on SAT Math Score")

3.2 Treatment Structure

In this research, 2k fractional factorial design is utilized. The dataset consists of 4 factors of which 2 factors have 2 levels and the other 2 factors have 3 levels. These 2 factors with 3 levels will be decomposed into 2-level factors respectively by the following process. Six 2-level factors remain and will be utilized in the research.

x = Mathlevel[1,]
physiccourse = c(0,0)

L = levels(Mathlevel[,'physiccourse'])
for(i in seq(2,dim(Mathlevel)[1])){
  if(Mathlevel[i,'physiccourse']==L[1]){
    x = rbind(x,Mathlevel[i,])
    physiccourse = rbind(physiccourse,c(0,0))
  } 
  else if(Mathlevel[i,'physiccourse']==L[2]){
    x = rbind(x,Mathlevel[i,])
    x = rbind(x,Mathlevel[i,])
    physiccourse = rbind(physiccourse,c(1,0))
    physiccourse = rbind(physiccourse,c(0,1))
  } 
  else{
    x = rbind(x,Mathlevel[i,])
    physiccourse = rbind(physiccourse,c(1,1))
  }
}

x = cbind(x,physiccourse)
x = x[,c(1,2,3,4,5,6,8,9,10)]
colnames(x) = c(colnames(Mathlevel[1,c(1,2,3,4,5,6,8)]),'physiccourseA','physiccourseB')

Lc = levels(Mathlevel[,'chemistcourse'])

chemistcourse = c(1,0)
chemistcourse = rbind(chemistcourse,c(0,1))
dat = x[1,]
dat = rbind(dat,dat)



for(i in seq(2,dim(x)[1])){
  if(x[i,'chemistcourse']==Lc[1]){
    dat = rbind(dat,x[i,])
    chemistcourse = rbind(chemistcourse,c(0,0))
  } 
  else if(x[i,'chemistcourse']==Lc[2]){
    dat = rbind(dat,x[i,])
    dat = rbind(dat,x[i,])
    chemistcourse = rbind(chemistcourse,c(1,0))
    chemistcourse = rbind(chemistcourse,c(0,1))
  } 
  else{
    dat = rbind(dat,x[i,])
    chemistcourse = rbind(chemistcourse,c(1,1))
  }
}

dat = cbind(dat,chemistcourse)

colnames(dat) = c(colnames(x),'chemistcourseA','chemistcourseB')
dat = dat[,c(1,2,3,4,5,6,7,8,9,10,11)]

And, we look at the full factorial experimental design of 64 runs

expand.grid(language = c(0,1), sex = c(0,1), physiccourseA = c(0,1), physiccourseB = c(0,1), chemistcourseA = c(0,1), chemistcourseB = c(0,1))
##    language sex physiccourseA physiccourseB chemistcourseA chemistcourseB
## 1         0   0             0             0              0              0
## 2         1   0             0             0              0              0
## 3         0   1             0             0              0              0
## 4         1   1             0             0              0              0
## 5         0   0             1             0              0              0
## 6         1   0             1             0              0              0
## 7         0   1             1             0              0              0
## 8         1   1             1             0              0              0
## 9         0   0             0             1              0              0
## 10        1   0             0             1              0              0
## 11        0   1             0             1              0              0
## 12        1   1             0             1              0              0
## 13        0   0             1             1              0              0
## 14        1   0             1             1              0              0
## 15        0   1             1             1              0              0
## 16        1   1             1             1              0              0
## 17        0   0             0             0              1              0
## 18        1   0             0             0              1              0
## 19        0   1             0             0              1              0
## 20        1   1             0             0              1              0
## 21        0   0             1             0              1              0
## 22        1   0             1             0              1              0
## 23        0   1             1             0              1              0
## 24        1   1             1             0              1              0
## 25        0   0             0             1              1              0
## 26        1   0             0             1              1              0
## 27        0   1             0             1              1              0
## 28        1   1             0             1              1              0
## 29        0   0             1             1              1              0
## 30        1   0             1             1              1              0
## 31        0   1             1             1              1              0
## 32        1   1             1             1              1              0
## 33        0   0             0             0              0              1
## 34        1   0             0             0              0              1
## 35        0   1             0             0              0              1
## 36        1   1             0             0              0              1
## 37        0   0             1             0              0              1
## 38        1   0             1             0              0              1
## 39        0   1             1             0              0              1
## 40        1   1             1             0              0              1
## 41        0   0             0             1              0              1
## 42        1   0             0             1              0              1
## 43        0   1             0             1              0              1
## 44        1   1             0             1              0              1
## 45        0   0             1             1              0              1
## 46        1   0             1             1              0              1
## 47        0   1             1             1              0              1
## 48        1   1             1             1              0              1
## 49        0   0             0             0              1              1
## 50        1   0             0             0              1              1
## 51        0   1             0             0              1              1
## 52        1   1             0             0              1              1
## 53        0   0             1             0              1              1
## 54        1   0             1             0              1              1
## 55        0   1             1             0              1              1
## 56        1   1             1             0              1              1
## 57        0   0             0             1              1              1
## 58        1   0             0             1              1              1
## 59        0   1             0             1              1              1
## 60        1   1             0             1              1              1
## 61        0   0             1             1              1              1
## 62        1   0             1             1              1              1
## 63        0   1             1             1              1              1
## 64        1   1             1             1              1              1

Evidently, this would require a lot of time to examine, therefore we make similar to it with a fractional factorial design. The fractional factorial design is \(\sf{2_{Ⅲ}}\)6-3 which yields 8 experimental runs. The R package FrF2 provides functions for creating such a design. The highest resolution possible for this research (6 factors, 8 runs) is III. We can also utilize the FrF2 package to decide the aliasing structure of the fractional factorial design, and the generators of the fractional factorial design.

library(FrF2)
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
mathscore = FrF2(8,6,factor.names = c('language','sex','physiccourseA','physiccourseB', 'chemistcourseA', 'chemistcourseB'))
mathscore
##   language sex physiccourseA physiccourseB chemistcourseA chemistcourseB
## 1       -1  -1             1             1             -1             -1
## 2        1  -1            -1            -1             -1              1
## 3       -1   1             1            -1             -1              1
## 4       -1  -1            -1             1              1              1
## 5        1   1             1             1              1              1
## 6       -1   1            -1            -1              1             -1
## 7        1   1            -1             1             -1             -1
## 8        1  -1             1            -1              1             -1
## class=design, type= FrF2

For ISYE 6020:

From this, we learn that the generators of the fractional factorial design are D = AB, E = AC, F = BC. The generator for the fractional factorial experiment can also be calculated separately. According to the interactions D = AB, E = AC, and F = BC, the generators lead to a generator of I = ABD = ACE = BCF.

summary(mathscore)
## Call:
## FrF2(8, 6, factor.names = c("language", "sex", "physiccourseA", 
##     "physiccourseB", "chemistcourseA", "chemistcourseB"))
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   language sex physiccourseA physiccourseB chemistcourseA chemistcourseB
## 1       -1  -1            -1            -1             -1             -1
## 2        1   1             1             1              1              1
## 
## Design generating information:
## $legend
## [1] A=language       B=sex            C=physiccourseA  D=physiccourseB 
## [5] E=chemistcourseA F=chemistcourseB
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   language sex physiccourseA physiccourseB chemistcourseA chemistcourseB
## 1       -1  -1             1             1             -1             -1
## 2        1  -1            -1            -1             -1              1
## 3       -1   1             1            -1             -1              1
## 4       -1  -1            -1             1              1              1
## 5        1   1             1             1              1              1
## 6       -1   1            -1            -1              1             -1
## 7        1   1            -1             1             -1             -1
## 8        1  -1             1            -1              1             -1
## class=design, type= FrF2

The table below provides which factors are confounded. (the aliasing structure). Because this design is a resolution III, the aliasing structure shows that the main effects are aliased with some of the 2 factor interactions, which are highly aliased with each other: AF = BE = CD. Therefore, the main effects are reliable to analyze.

design.info(mathscore)$aliased
## $legend
## [1] "A=language"       "B=sex"            "C=physiccourseA" 
## [4] "D=physiccourseB"  "E=chemistcourseA" "F=chemistcourseB"
## 
## $main
## [1] "A=BD=CE" "B=AD=CF" "C=AE=BF" "D=AB=EF" "E=AC=DF" "F=BC=DE"
## 
## $fi2
## [1] "AF=BE=CD"

3.3 Testing

In this section, we conduct ANOVA test. We estimate main effects and all the 2-factor interaction effects.

model1 = lm(sat~(language+sex+physiccourseA+physiccourseB+chemistcourseA+chemistcourseB)^2,data=dat)
anova(model1)
## Analysis of Variance Table
## 
## Response: sat
##                                 Df  Sum Sq Mean Sq F value    Pr(>F)    
## language                         1   40659   40659 11.7450 0.0006236 ***
## sex                              1  198469  198469 57.3308 5.824e-14 ***
## physiccourseA                    1    7089    7089  2.0477 0.1526123    
## physiccourseB                    1   33684   33684  9.7302 0.0018410 ** 
## chemistcourseA                   1    2080    2080  0.6008 0.4383696    
## chemistcourseB                   1   18501   18501  5.3444 0.0209001 *  
## language:sex                     1   36675   36675 10.5940 0.0011554 ** 
## language:physiccourseA           1     977     977  0.2822 0.5953439    
## language:physiccourseB           1    4154    4154  1.1999 0.2734860    
## language:chemistcourseA          1    1138    1138  0.3288 0.5664473    
## language:chemistcourseB          1   10688   10688  3.0875 0.0790632 .  
## sex:physiccourseA                1    1952    1952  0.5638 0.4528316    
## sex:physiccourseB                1    8522    8522  2.4617 0.1168264    
## sex:chemistcourseA               1     315     315  0.0911 0.7627861    
## sex:chemistcourseB               1    2976    2976  0.8598 0.3539273    
## physiccourseA:physiccourseB      1    2168    2168  0.6262 0.4288667    
## physiccourseA:chemistcourseA     1       3       3  0.0009 0.9763914    
## physiccourseA:chemistcourseB     1      27      27  0.0077 0.9298829    
## physiccourseB:chemistcourseA     1      15      15  0.0042 0.9482711    
## physiccourseB:chemistcourseB     1     132     132  0.0380 0.8454004    
## chemistcourseA:chemistcourseB    1    7017    7017  2.0271 0.1546890    
## Residuals                     1818 6293597    3462                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

An ANOVA table for the main effects is shown below. The table shows that the physiccourseA and chemistcourseA do not have statistically significant effects on the SAT Math Score. However, we already know that the main effects for physiccourseA and chemistcourseA are confounded with the 2-factor interactions. The conclusion remains the same because the main effects are aliased with the 2-factor interactions.

model2 = lm(sat~language+sex+physiccourseA+physiccourseB+chemistcourseA+chemistcourseB,data=dat)
anova(model2)
## Analysis of Variance Table
## 
## Response: sat
##                  Df  Sum Sq Mean Sq F value    Pr(>F)    
## language          1   40659   40659 11.6992  0.000639 ***
## sex               1  198469  198469 57.1074 6.478e-14 ***
## physiccourseA     1    7089    7089  2.0397  0.153413    
## physiccourseB     1   33684   33684  9.6923  0.001879 ** 
## chemistcourseA    1    2080    2080  0.5985  0.439263    
## chemistcourseB    1   18501   18501  5.3236  0.021150 *  
## Residuals      1833 6370355    3475                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.4 Estimation

The parameters for the second model are shown in the table below.

coef(model2)
##    (Intercept)    languageyes      sexfemale  physiccourseA  physiccourseB 
##      611.15334       20.16151      -20.52596       11.43258       11.43258 
## chemistcourseA chemistcourseB 
##       10.52719       10.52719

3.5 Diagnostics / Model Adequacy Checking

Quantile-Quantile (Q-Q) plots are graphs used to verify the distributional assumption for a set of data. The relatively linear relationship for all data sets justifies the use of ANOVA to test for the significant difference. From the QQ plot, the residuals nearly form a linear line, and we can find out that the assumptions of normality are met.

qqnorm(residuals(model2))
qqline(residuals(model2))

Residuals vs. Fits Plot is a common graph used in residual analysis. It is a scatter plot of residuals as a function of fitted values, or the estimated responses. From the Residuals vs. Fits plot, the distribution of points seems random although we can see a few outliers and some linearity.

plot(fitted(model2),residuals(model2))

4. References to the literature

Montgomery, Douglas C. 2012. Design and Analysis of Experiments, 8th Edition.

5. Appendices

Raw data

The Mathlevel dataset can be found by installing and loading the Ecdat R package.

Complete and documented R code

#load("C:/Users/bokjh3/Desktop/Ecdat_0.2-9/Ecdat/data/Mathlevel.rda")

#head(Mathlevel, n=10)

#tail(Mathlevel, n=10)

#Mathlevel$physiccourse <- as.factor(Mathlevel$physiccourse)
#Mathlevel$chemistcourse <- as.factor(Mathlevel$chemistcourse)
#str(Mathlevel)

#hist(Mathlevel$sat, main = "SAT Math Score")

#boxplot(Mathlevel$sat~Mathlevel$language, xlab="language", ylab="SAT Math Score")
#title("Impact of language on SAT Math Score")

#boxplot(Mathlevel$sat~Mathlevel$sex, xlab="sex", ylab="SAT Math Score")
#title("Impact of sex on SAT Math Score")

#boxplot(Mathlevel$sat~Mathlevel$physiccourse, xlab="number of courses in physics", ylab="SAT Math Score")
#title("Impact of physiccourse on SAT Math Score")

#boxplot(Mathlevel$sat~Mathlevel$chemistcourse, xlab="number of courses in chemistry", ylab="SAT Math Score")
#title("Impact of chemistcourse on SAT Math Score")

#x = Mathlevel[1,]
#physiccourse = c(0,0)

#L = levels(Mathlevel[,'physiccourse'])
#for(i in seq(2,dim(Mathlevel)[1])){
#  if(Mathlevel[i,'physiccourse']==L[1]){
#    x = rbind(x,Mathlevel[i,])
#    physiccourse = rbind(physiccourse,c(0,0))
#  } 
#  else if(Mathlevel[i,'physiccourse']==L[2]){
#    x = rbind(x,Mathlevel[i,])
#    x = rbind(x,Mathlevel[i,])
#    physiccourse = rbind(physiccourse,c(1,0))
#    physiccourse = rbind(physiccourse,c(0,1))
#  } 
#  else{
#    x = rbind(x,Mathlevel[i,])
#    physiccourse = rbind(physiccourse,c(1,1))
#  }
#}

#x = cbind(x,physiccourse)
#x = x[,c(1,2,3,4,5,6,8,9,10)]
#colnames(x) = c(colnames(Mathlevel[1,c(1,2,3,4,5,6,8)]),'physiccourseA','physiccourseB')

#Lc = levels(Mathlevel[,'chemistcourse'])

#chemistcourse = c(1,0)
#chemistcourse = rbind(chemistcourse,c(0,1))
#dat = x[1,]
#dat = rbind(dat,dat)

#for(i in seq(2,dim(x)[1])){
#  if(x[i,'chemistcourse']==Lc[1]){
#    dat = rbind(dat,x[i,])
#    chemistcourse = rbind(chemistcourse,c(0,0))
#  } 
#  else if(x[i,'chemistcourse']==Lc[2]){
#    dat = rbind(dat,x[i,])
#    dat = rbind(dat,x[i,])
#    chemistcourse = rbind(chemistcourse,c(1,0))
#    chemistcourse = rbind(chemistcourse,c(0,1))
#  } 
#  else{
#    dat = rbind(dat,x[i,])
#    chemistcourse = rbind(chemistcourse,c(1,1))
#  }
#}

#dat = cbind(dat,chemistcourse)

#colnames(dat) = c(colnames(x),'chemistcourseA','chemistcourseB')
#dat = dat[,c(1,2,3,4,5,6,7,8,9,10,11)]

#expand.grid(language = c(0,1), sex = c(0,1), physiccourseA = c(0,1), physiccourseB = c(0,1), chemistcourseA = c(0,1), chemistcourseB = c(0,1))

#library(FrF2)

#mathscore = FrF2(8,6,factor.names = c('language','sex','physiccourseA','physiccourseB', 'chemistcourseA', 'chemistcourseB'))
#mathscore

#summary(mathscore)

#design.info(mathscore)$aliased

#model1 = lm(sat~(language+sex+physiccourseA+physiccourseB+chemistcourseA+chemistcourseB)^2,data=dat)
#anova(model1)

#model2 = lm(sat~language+sex+physiccourseA+physiccourseB+chemistcourseA+chemistcourseB,data=dat)
#anova(model2)

#coef(model2)

#qqnorm(residuals(model2))
#qqline(residuals(model2))

#plot(fitted(model2),residuals(model2))