Goals
  • Conduct Oneway ANOVA
  • Conduct pairwise t-tests
  • Make conclusions with statistical tests

Submission

This assignment is to be submitted as .pdf knitted file to Gradescope. If you have any questions about concepts, please contact Dr. Fehlberg during office hours or email at . If you have any questions related R and the homework assignments, please contact the Class TA.

Here is information about R Markdown (http://rmarkdown.rstudio.com) and an R Markdown Cheatsheet (https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) with some handy information. When you execute code within the notebook, the results appear beneath the code. You can also write html and Latex syntax within this document.

The One-way Analysis of Variance (ANOVA)

The One-way ANOVA is a statistical technique that allows us to compare mean differences of one outcome (dependent) variable across two or more groups (levels) of one independent variable (factor). If there are only two levels (e.g. Male/Female) of the independent (predictor) variable the results are analogous to Student’s t test. It is also true that ANOVA is a special case of the GLM or regression models so as the number of levels increase it might make more sense to try one of those approaches. ANOVA also allows for comparisons of mean differences across multiple factors (Factorial or N-way ANOVA) which we will address later.

There are several ways to run ANOVA in R, so let’s start with the simplest from the base R aov. While it’s possible to wrap the command in a print statement I recommend you always save the results out to an R object e.g., output.aov. It’s almost inevitable that further analysis will ensue and the R object has a lot of useful information. If you’re new to R a couple of quick notes: the dependent variable goes to the left of the tilde (~) and our independent or predictor variable(s) are placed on the right. aov is not limited to Oneway ANOVA so adding additional factors is possible.

ANOVA is a specialized case of a general linear model (GLM) and therefore the list object returned is actually of both the aov and lm classes. The names command will give you some sense of all the information contained in the list object. The summary command gives us the key ANOVA data we need and produces a classic ANOVA table. If you’re unfamiliar with them and want to know more especially where the numbers come from I recommend reviewing the Montgomery Text Ch3.

Problems:

Problem 1 (15 pts)

Imagine that you are interested in understanding whether knowing the brand of car tire can help you predict whether you will get more or less mileage before you need to replace them. We’ll draw what is hopefully a random sample of 60 tires from four different manufacturers and use the mean mileage by brand to help inform our thinking. While we expect variation across our sample we’re interested in whether the differences between the tire brands (groups) is significantly different than what we would expect in random variation within the groups.

Our research or testable hypothesis is then described \[\mu_{Firestone} \neq \mu_{Bridgestone} \neq \mu_{Milestar} \neq \mu_{Falken}\]

as at least one of the tire brand populations is different than the other three. Our null is basically “brand doesn’t matter in predicting tire mileage - all brands are the same”.

The following data set with 60 observations is available in tires.csv (file located on Canvas).

tires <- read.csv(file="tires.csv",header=TRUE)
str(tires)
## 'data.frame':    60 obs. of  2 variables:
##  $ Brands : chr  "Firestone" "Firestone" "Firestone" "Firestone" ...
##  $ Mileage: num  33 36.4 32.8 37.6 36.3 ...
shapiro.test(tires$Mileage[tires$Brands == "Milestar"])
## 
##  Shapiro-Wilk normality test
## 
## data:  tires$Mileage[tires$Brands == "Milestar"]
## W = 0.95228, p-value = 0.561
shapiro.test(tires$Mileage[tires$Brands == "Falken"])
## 
##  Shapiro-Wilk normality test
## 
## data:  tires$Mileage[tires$Brands == "Falken"]
## W = 0.87754, p-value = 0.04361
shapiro.test(tires$Mileage[tires$Brands == "Firestone"])
## 
##  Shapiro-Wilk normality test
## 
## data:  tires$Mileage[tires$Brands == "Firestone"]
## W = 0.95353, p-value = 0.5817
shapiro.test(tires$Mileage[tires$Brands == "Bridgestone"])
## 
##  Shapiro-Wilk normality test
## 
## data:  tires$Mileage[tires$Brands == "Bridgestone"]
## W = 0.95509, p-value = 0.6078
tires$Brands <- ordered(tires$Brands, levels=c("Firestone","Bridgestone","Milestar","Falken"))
str(tires)
## 'data.frame':    60 obs. of  2 variables:
##  $ Brands : Ord.factor w/ 4 levels "Firestone"<"Bridgestone"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Mileage: num  33 36.4 32.8 37.6 36.3 ...
xtabs( ~Brands, tires )
## Brands
##   Firestone Bridgestone    Milestar      Falken 
##          15          15          15          15
aggregate( Mileage ~ Brands, tires, mean )
outcome <- tires$Mileage
group <- tires$Brands
gp.means <- tapply(outcome,group,mean)
gp.means <- gp.means[group]
dev.from.gp.means <- outcome - gp.means
squared.devs <- dev.from.gp.means ^2

Y <- data.frame( group, outcome, gp.means,
                  dev.from.gp.means, squared.devs )
print(Y, digits = 2)
##          group outcome gp.means dev.from.gp.means squared.devs
## 1    Firestone      33       35            -1.801      3.2e+00
## 2    Firestone      36       35             1.636      2.7e+00
## 3    Firestone      33       35            -2.022      4.1e+00
## 4    Firestone      38       35             2.838      8.1e+00
## 5    Firestone      36       35             1.505      2.3e+00
## 6    Firestone      36       35             1.116      1.2e+00
## 7    Firestone      35       35            -0.099      9.8e-03
## 8    Firestone      32       35            -2.420      5.9e+00
## 9    Firestone      34       35            -1.168      1.4e+00
## 10   Firestone      36       35             1.620      2.6e+00
## 11   Firestone      36       35             1.631      2.7e+00
## 12   Firestone      35       35             0.037      1.4e-03
## 13   Firestone      38       35             3.529      1.2e+01
## 14   Firestone      31       35            -4.176      1.7e+01
## 15   Firestone      33       35            -2.224      4.9e+00
## 16 Bridgestone      34       32             1.743      3.0e+00
## 17 Bridgestone      32       32             0.215      4.6e-02
## 18 Bridgestone      35       32             3.226      1.0e+01
## 19 Bridgestone      28       32            -3.901      1.5e+01
## 20 Bridgestone      31       32            -0.483      2.3e-01
## 21 Bridgestone      31       32            -0.718      5.2e-01
## 22 Bridgestone      35       32             3.058      9.4e+00
## 23 Bridgestone      34       32             2.196      4.8e+00
## 24 Bridgestone      33       32             0.772      6.0e-01
## 25 Bridgestone      31       32            -0.899      8.1e-01
## 26 Bridgestone      28       32            -3.636      1.3e+01
## 27 Bridgestone      29       32            -2.596      6.7e+00
## 28 Bridgestone      33       32             1.295      1.7e+00
## 29 Bridgestone      32       32             0.585      3.4e-01
## 30 Bridgestone      31       32            -0.855      7.3e-01
## 31    Milestar      34       35            -0.316      1.0e-01
## 32    Milestar      33       35            -1.955      3.8e+00
## 33    Milestar      33       35            -1.346      1.8e+00
## 34    Milestar      37       35             2.100      4.4e+00
## 35    Milestar      37       35             2.212      4.9e+00
## 36    Milestar      35       35             0.320      1.0e-01
## 37    Milestar      35       35             0.193      3.7e-02
## 38    Milestar      33       35            -1.286      1.7e+00
## 39    Milestar      30       35            -4.334      1.9e+01
## 40    Milestar      36       35             1.373      1.9e+00
## 41    Milestar      35       35             0.022      4.9e-04
## 42    Milestar      36       35             1.356      1.8e+00
## 43    Milestar      41       35             6.289      4.0e+01
## 44    Milestar      32       35            -2.593      6.7e+00
## 45    Milestar      33       35            -2.035      4.1e+00
## 46      Falken      40       38             1.596      2.5e+00
## 47      Falken      39       38             0.937      8.8e-01
## 48      Falken      38       38             0.124      1.5e-02
## 49      Falken      38       38            -0.305      9.3e-02
## 50      Falken      37       38            -1.414      2.0e+00
## 51      Falken      37       38            -1.033      1.1e+00
## 52      Falken      37       38            -1.270      1.6e+00
## 53      Falken      37       38            -1.000      1.0e+00
## 54      Falken      40       38             2.199      4.8e+00
## 55      Falken      37       38            -0.618      3.8e-01
## 56      Falken      41       38             2.663      7.1e+00
## 57      Falken      37       38            -0.905      8.2e-01
## 58      Falken      38       38             0.005      2.5e-05
## 59      Falken      38       38            -0.244      6.0e-02
## 60      Falken      37       38            -0.735      5.4e-01
SSw <- sum( squared.devs )
print( SSw )
## [1] 249.3583
gp.means <- tapply(outcome,group,mean)
grand.mean <- mean(outcome)
dev.from.grand.mean <- gp.means - grand.mean
squared.devs <- dev.from.grand.mean ^2
gp.sizes <- tapply(outcome,group,length)
wt.squared.devs <- gp.sizes * squared.devs

Y <- data.frame( gp.means, grand.mean, dev.from.grand.mean, 
                  squared.devs, gp.sizes, wt.squared.devs )
print(Y, digits = 2)
##             gp.means grand.mean dev.from.grand.mean squared.devs gp.sizes
## Firestone         35         35              -0.036       0.0013       15
## Bridgestone       32         35              -3.055       9.3329       15
## Milestar          35         35              -0.074       0.0055       15
## Falken            38         35               3.165      10.0165       15
##             wt.squared.devs
## Firestone             0.019
## Bridgestone         139.994
## Milestar              0.082
## Falken              150.247
SSb <- sum( wt.squared.devs )
print( SSb )
## [1] 290.3425
# four groups and N = 60 total observations
pf( 18.6, df1 = 3, df2 = 56, lower.tail = FALSE)
## [1] 1.70257e-08
aov( formula = Mileage ~ Brands, data = tires ) 
## Call:
##    aov(formula = Mileage ~ Brands, data = tires)
## 
## Terms:
##                   Brands Residuals
## Sum of Squares  290.3425  249.3583
## Deg. of Freedom        3        56
## 
## Residual standard error: 2.110172
## Estimated effects are balanced
aov( tires$Mileage~ tires$Brands ) 
## Call:
##    aov(formula = tires$Mileage ~ tires$Brands)
## 
## Terms:
##                 tires$Brands Residuals
## Sum of Squares      290.3425  249.3583
## Deg. of Freedom            3        56
## 
## Residual standard error: 2.110172
## Estimated effects are balanced
aov( Mileage~ Brands, tires ) 
## Call:
##    aov(formula = Mileage ~ Brands, data = tires)
## 
## Terms:
##                   Brands Residuals
## Sum of Squares  290.3425  249.3583
## Deg. of Freedom        3        56
## 
## Residual standard error: 2.110172
## Estimated effects are balanced
my.anova <- aov( Mileage~ Brands, tires ) 
class( my.anova )
## [1] "aov" "lm"
print( my.anova )
## Call:
##    aov(formula = Mileage ~ Brands, data = tires)
## 
## Terms:
##                   Brands Residuals
## Sum of Squares  290.3425  249.3583
## Deg. of Freedom        3        56
## 
## Residual standard error: 2.110172
## Estimated effects are balanced
summary( my.anova )
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Brands       3  290.3   96.78   21.73 1.84e-09 ***
## Residuals   56  249.4    4.45                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tires$Brands <- ordered(tires$Brands, levels=c("Firestone","Bridgestone","Milestar","Falken"))
tire_aov <- aov(Mileage ~ Brands, data = tires)

summary(tire_aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Brands       3  290.3   96.78   21.73 1.84e-09 ***
## Residuals   56  249.4    4.45                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
str(tires)
## 'data.frame':    60 obs. of  2 variables:
##  $ Brands : Ord.factor w/ 4 levels "Firestone"<"Bridgestone"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Mileage: num  33 36.4 32.8 37.6 36.3 ...
tire_aov <- aov(Mileage ~ Brands, data = tires)
summary(tire_aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Brands       3  290.3   96.78   21.73 1.84e-09 ***
## Residuals   56  249.4    4.45                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_result2 <- aov(Mileage ~ Brands, data = tires)
summary(anova_result2) 
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Brands       3  290.3   96.78   21.73 1.84e-09 ***
## Residuals   56  249.4    4.45                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#summary(anova_result)

anova_result <- aov(tires$Mileage ~ tires$Brands) 
summary(anova_result) 
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## tires$Brands  3  290.3   96.78   21.73 1.84e-09 ***
## Residuals    56  249.4    4.45                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(anova_result)

# Run three seperate t test
#t.test( Firestone, Bridgestone, var.equal = TRUE )   # Student t-test

#Firestone <- with(tires, Mileage[Brand == "Firestone"])  # Mileage change due to Firestone
# Bridgestone <- with(tires, Mileage[Brand == "Bridgestone"])    # mood change due to placebo
pairwise.t.test( x = tires$Mileage,   # outcome variable
                  g = tires$Brands,        # grouping variable
                  p.adjust.method = "none"    # which correction to use?
 )
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  tires$Mileage and tires$Brands 
## 
##             Firestone Bridgestone Milestar
## Bridgestone 0.00025   -           -       
## Milestar    0.96092   0.00029     -       
## Falken      0.00011   5.9e-11     9.6e-05 
## 
## P value adjustment method: none
tukey_result <- TukeyHSD(anova_result)

plot(tukey_result)

| Columns | Contains | Type | |:-|:-|:-| |Brands|What brand of tire |factor| |Mileage|Tire life in thousands |num|

  1. Conduct an ANOVA and interpret the results.

    State and check the assumptions associated with ANOVA.

    samples are independent, equal variance, normal since p=0.518 for Firestone, p= 0.04361 Falken, and p= 0.6078 Bridgerstone

    Visualize your data.

    See above

    Conduct ANOVA and provide an interpretation.

    F((3,56) = 21.73, p< 0.001) One-way ANOVA showed a large difference between the mileage among brands.in other words there is a significant main effect of brands in mileage

    Perform a proper pairwise comparison test and control for multiple comparisons.

    ** **

    Interpret your results and clearly state your conclusions.

    the pairwise comparison indicates that Milestar-Firestone has a p value of 0.96092 which indicates a non significant difference between the mileage of tire brands. Milestar-Bridgestone had a p value of 0.00029 which indicates a significant difference between the mileage of tire brands. Falken-Firestone had a p value of 0.00011 which indicates a significant difference between the mileage of the tire brands. Falken-Bridgestone had a p value of 5.9e-11 which also indicates a significant difference between the mileage of the tire brands. as Falken tires brand has more miles. Firestone-Bridgestone had a p value of 0.00025 which is significantly different.

 

Problem 2 (15 pts)

  1. (15pt) An experiment was run to determine whether four specific firing temperatures affect the density of a certain type of brick. The experiment led to the following data.

    library(knitr)
    Density <- c(21.8, 21.9, 21.7, 21.6, 21.7, 21.7, 21.4, 21.5, 21.4, 21.9, 21.8, 21.8, 21.6, 21.5, 21.9, 21.7, 21.8, 21.4)
    Temperature <- c(100, 100, 100, 100, 100, 125, 125, 125, 125, 150, 150, 150, 150, 150, 175, 175, 175, 175)
    (brick <- data.frame(Temperature, Density))
    brick$Temperature <- ordered(brick$Temperature, levels=c("100", "125", "150", "175"))
    1. Does the firing temperature affect the denisty of the bricks? Use \(\alpha = 0.05\). temperatures affect the density of a certain type of brick. The experiment led to the following data.

      brick_aov <- aov(Density ~ Temperature, data = brick)
      summary(brick_aov)
      ##             Df Sum Sq Mean Sq F value Pr(>F)
      ## Temperature  3 0.1561 0.05204   2.024  0.157
      ## Residuals   14 0.3600 0.02571

      There is no difference between the firing temperature and the density of the bricks (F(3,14) = 2.02, p =.157).

    2. Analyze the residuals from this experiment. Are the analysis of variance assumptions satisfied?

      Yes it is satisfied given the results of shapiro test p>0.07151 and box plot. there is no statistical difference between the brick datasets and a normal distribution

      par(mfrow=c(2,2))
      plot(brick_aov)

      shapiro.test(brick$Density)
      ## 
      ##  Shapiro-Wilk normality test
      ## 
      ## data:  brick$Density
      ## W = 0.90544, p-value = 0.07151
      #ggplot(data=brick,mapping=aes(x=Temperature, y=Density,fill=Temperature))+geom_boxplot()
      
      boxplot(brick$Density ~ brick$Temperature) 
    3. Construct a graphical display of the treatment. Does this graph adequately summarize the results of the analysis of variance in part(a)?

      No, it might seem that 125 influence brick density but its difficult to tell a difference given the amount of samples

     

    Problem 3 (15 pts)

  2. (7pt) Four different designs for a digital computer circuit are being studied to compare the amount of noise present. The following data have been obtained:

    CircuitDesign <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
    Noise <- c(19, 20, 19, 30, 8, 80, 61, 73, 56, 80, 47, 26, 25, 35, 50, 95, 46, 83, 78, 97)
    (digital_computer_circuits <- data.frame(CircuitDesign, Noise))
    digital_computer_circuits$CircuitDesign <-ordered(digital_computer_circuits$CircuitDesign, levels = c("1", "2", "3", "4"))
    #ggplot(data = digital_computer_circuits, mapping = aes(x = CircuitDesign, y = Noise, fill = CircuitDesign))+ geom_boxplot()
    1. Is the amount of noise present the same for all four designs? Use \(\alpha = 0.05\).

      (circuit_design_aov <- aov(Noise ~CircuitDesign, data = digital_computer_circuits))
      ## Call:
      ##    aov(formula = Noise ~ CircuitDesign, data = digital_computer_circuits)
      ## 
      ## Terms:
      ##                 CircuitDesign Residuals
      ## Sum of Squares        12042.0    2948.8
      ## Deg. of Freedom             3        16
      ## 
      ## Residual standard error: 13.57571
      ## Estimated effects are balanced
      summary(circuit_design_aov)
      ##               Df Sum Sq Mean Sq F value  Pr(>F)    
      ## CircuitDesign  3  12042    4014   21.78 6.8e-06 ***
      ## Residuals     16   2949     184                    
      ## ---
      ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
      #ggplot(data = digital_computer_circuits, mapping = aes(x = CircuitDesign, y = Noise, fill = CircuitDesign))+ geom_boxplot()
      
      shapiro.test(digital_computer_circuits$Noise)
      ## 
      ##  Shapiro-Wilk normality test
      ## 
      ## data:  digital_computer_circuits$Noise
      ## W = 0.9343, p-value = 0.1867
      digital_computer_circuits$CircuitDesign <- ordered(digital_computer_circuits$CircuitDesign, levels=c("1", "2", "3", "4"))
      anova_result <- aov(digital_computer_circuits$Noise ~ digital_computer_circuits$CircuitDesign) 
      summary(anova_result) 
      ##                                         Df Sum Sq Mean Sq F value  Pr(>F)    
      ## digital_computer_circuits$CircuitDesign  3  12042    4014   21.78 6.8e-06 ***
      ## Residuals                               16   2949     184                    
      ## ---
      ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
      fit <- lm(digital_computer_circuits$Noise ~ digital_computer_circuits$CircuitDesign)
      plot(fit)

      par(mfrow=c(2,2))
      plot(circuit_design_aov)
      (F(3,16) = 21.8, p < .001) A main effect exists. Amount of noise is not the same for the circuit designs. The design predicts noise levels significantly.
    2. Analyze the residuals from this experiment. Are the analysis of variance assumptions satisfied?

      Yes, normality and residual assumptions were met but design 4 could be an outlier

    3. Which circuit design would you select for use? Low noise is best.

      Design 1