Anova

In order to study the e↵ect of automobile size on the noise pollution, the following data are randomly chosen from the air pollution data. The automobiles are categorized as small, medium, large, and noise level reading (decibels) are given in the following table. Size of automobile Small Medium Large 820 840 785 Noise Level 820 825 775 (decibels) 825 815 770 835 855 760 825 840 770 At the alpha = 0.05 level of significance, test for equality of population mean noise levels for different sizes of the automobiles. Comment on the assumptions.

decibel <- c(820,820,825,835,825,840,925,815,855,840,785,775,770,760,770)
ts = "small"
tm = "medium"
tl = "large"
type <-c(rep(ts,5),rep(tm,5),rep(tl,5))
df <- data.frame(type,decibel)
an1 = aov(decibel~type)
an1

## Call:
##    aov(formula = decibel ~ type)
## 
## Terms:
##                     type Residuals
## Sum of Squares  17663.33   7430.00
## Deg. of Freedom        2        12
## 
## Residual standard error: 24.88306
## Estimated effects may be unbalanced

summary(an1)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## type         2  17663    8832   14.26 0.000674 ***
## Residuals   12   7430     619                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print("We reject the null hypothesis that the population mean is the same.")

## [1] "We reject the null hypothesis that the population mean is the same."

The management of a grocery store observes various employees for work productivity. Following table gives the number of customers served by each of its four checkout lanes per hour.

Lane 1 Lane 2 Lane 3 Lane 4 16 11 8 21 18 14 12 16 22 10 17 17 21 10 10 23 15 14 13 17 10 15 (a) Construct an analysis-of-variance table and interpret the results. Indicate any assumptions that were necessary. (b) Test whether there is a difference between the mean number of customers served by the four employees at the 0.05 level.

library("onewaytests")

## Warning: package 'onewaytests' was built under R version 4.0.3

work <- c(16,18,22,21,15,11,14,10,10,14,10,8,12,17,10,13,15,21,16,17,23,17)
l1 = "Lane 1"
l2 = "Lane 2"
l3 = "Lane 3"
l4 = "Lane 4"
lane <- c(rep(l1,5),rep(l2,6),rep(l3,6),rep(l4,5))
df = data.frame(lane,work)
an1 = aov(work~lane,df)
an1

## Call:
##    aov(formula = work ~ lane, data = df)
## 
## Terms:
##                 lane Residuals
## Sum of Squares   241       147
## Deg. of Freedom    3        18
## 
## Residual standard error: 2.857738
## Estimated effects may be unbalanced

summary(an1)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## lane         3    241   80.33   9.837 0.00046 ***
## Residuals   18    147    8.17                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

an1$coefficients

## (Intercept)  laneLane 2  laneLane 3  laneLane 4 
##        18.4        -6.9        -5.9         0.4

print("We reject the null hypothesis and conclude that there is a difference in the mean number of customers served in the four different lanes.")

## [1] "We reject the null hypothesis and conclude that there is a difference in the mean number of customers served in the four different lanes."

A company claims that its medicine, brand A, provides faster relief from pain than another companys medicine, brand B. A random sample from each brand gave the following times (in minutes) for relief. Do the data present sufficient evidence to indicate that there is a difference in the mean time to relief for the two populations? Brand A 47 51 45 53 41 55 50 46 45 51 53 50 48 Brand B 44 48 42 45 44 42 49 46 45 48 39 49

Use the ANOVA approach to test the appropriate hypotheses. Use alpha = 0.01.
What assumptions are necessary for the conclusion in part?

Brand_A <- c(47, 51, 45, 53, 41, 55, 50, 46, 45, 51, 53, 50, 48)
Brand_B <- c(44, 48, 42, 45, 44, 42, 49, 46, 45, 48, 39, 49)
dfcomb <- data.frame(cbind(Brand_A,Brand_B))

## Warning in cbind(Brand_A, Brand_B): number of rows of result is not a multiple
## of vector length (arg 2)

dfstack <- stack(dfcomb)
av1 = aov.test(values~ind, data = dfstack,alpha = 0.01)

## 
##   One-Way Analysis of Variance (alpha = 0.01) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 7.80437 
##   num df     : 1 
##   denom df   : 24 
##   p.value    : 0.01007711 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

summary(av1)

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

print("We fail to reject the null hypothesis that that there no difference in the mean time to relief for the two populations ")

## [1] "We fail to reject the null hypothesis that that there no difference in the mean time to relief for the two populations "

print("We assume both the populations have approximately the same variance, and that the data is normally distributed.")

## [1] "We assume both the populations have approximately the same variance, and that the data is normally distributed."

In order to test the wear on four hyper-alloys, a test piece of each alloy was extracted from each of the three positions of a test machine. The reduction of weight in milligrams due to wear was determined on each piece, and the data are given in the following table.
```
     Position
```
Alloy type 1 2 3 1 241 270 274 2 195 241 218 3 235 273 230 4 234 236 227 At alpha = 0.05, test the following hypotheses, regarding the positions as blocks:

There is no difference in average wear for each material.
There is no difference in average wear for each position.
Interpret your final result and state any assumptions that were necessary to solve the problem.

pos1 <- c(241,195,235,234)
pos2 <- c(270,241,273,236)
pos3 <- c(274,218,230,227)
alt1 <- c(241,270,274)
alt2 <- c(195,241,218)
alt3 <- c(235,273,230)
alt4 <- c(234,236,227)
dfcombpos <- data.frame(cbind(pos1,pos2,pos3))
dfstackpos <- stack(dfcombpos)
dfcombalt <- data.frame(cbind(alt1,alt2,alt3,alt4))
dfstackalt <- stack(dfcombalt)
summary(aov.test(values~ind,data = dfstackalt))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 2.932027 
##   num df     : 3 
##   denom df   : 8 
##   p.value    : 0.09945922 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

summary(aov.test(values~ind,data = dfstackpos))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 1.755474 
##   num df     : 2 
##   denom df   : 9 
##   p.value    : 0.2271356 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

print("We fail to reject the null hypotheses in both cases.")

## [1] "We fail to reject the null hypotheses in both cases."

In order to see the e↵ect of hours of sleep on tests of di↵erent skill categories (vocabulary, reasoning, and arithmetic), tests consisting of 20 questions each in each category were given to 16 students, four each based on the hours of sleep they had on the previous night. Each right answer is given one point. The following table gives the cumulative scores of the each of the four students in each category. Hours of Category sleep Vocabulary Reasoning Arithmetic 0 44 33 35 4 54 38 18 6 48 42 43 8 55 52 50

Test at the 0.05 level whether the true mean performance for different hours of sleep is the same.
Also, test at the 0.05 level whether the true mean performance for each category of the test is the same.

vocab <- c(44,54,48,55)
reas <- c(33,38,42,52)
arth <- c(35,18,43,50)
h0 <- c(44,33,35)
h4 <- c(54,38,18)
h6 <- c(48,42,43)
h8 <- c(55,52,50)
dfcombcat <- data.frame(cbind(vocab,reas,arth))
dfstackcat <- stack(dfcombcat)
dfcombhour <- data.frame(cbind(h0,h4,h6,h8))
dfstackhour <- stack(dfcombhour)
summary(aov.test(values~ind,data = dfstackhour))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 1.707706 
##   num df     : 3 
##   denom df   : 8 
##   p.value    : 0.2421949 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

summary(aov.test(values~ind,data = dfstackcat))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 2.079041 
##   num df     : 2 
##   denom df   : 9 
##   p.value    : 0.1810192 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

The following table gives lower limits of income (approximated to the nearest $1000 and calculated as of March of the following year) of the top 5% of U.S. households by race from 1994 to 1998. Race Year 1994 1995 1996 1997 1998 All Races 110 113 120 127 132 Caucasian 113 117 123 130 136 African 81 80 85 87 94 Hispanic 82 80 86 93 98

Test at the 0.05 level whether the true lower limits of income for the top 5% of U.S. households for each race are the same for all 5 years.
Also, test at the 0.05 level that the true income lower limits of the top 5% of U.S. households for each year between 1994 and 1998 are the same.

rall <- c(110,113,120,127,132)
rc <- c(113,117,123,130,136)
raf <- c(81,80,85,87,94)
rh <- c(82,80,86,93,98)
y4 <- c(110,113,81,82)
y5 <- c(113,117,80,80)
y6 <- c(120,123,85,86)
y7 <- c(127,130,87,93)
y8 <- c(132,136,94,98)
dfcombrace <- data.frame(cbind(rall,rc,raf,rh))
dfstackrace <- stack(dfcombrace)
dfcombyear <- data.frame(cbind(y4,y5,y6,y7,y8))
dfstackyear <- stack(dfcombyear)
summary(aov.test(values~ind,dfstackrace))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 32.34952 
##   num df     : 3 
##   denom df   : 16 
##   p.value    : 5.025472e-07 
## 
##   Result     : Difference is statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

summary(aov.test(values~ind,dfstackyear))

## 
##   One-Way Analysis of Variance (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : values and ind 
## 
##   statistic  : 0.5778645 
##   num df     : 4 
##   denom df   : 15 
##   p.value    : 0.6831546 
## 
##   Result     : Difference is not statistically significant. 
## -------------------------------------------------------------

##           Length Class      Mode     
## statistic 1      -none-     numeric  
## parameter 2      -none-     numeric  
## p.value   1      -none-     numeric  
## alpha     1      -none-     numeric  
## method    1      -none-     character
## data      2      data.frame list     
## formula   3      formula    call

Anova

Aritra Halder

2/18/2021