Stats Midterm

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.4     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

##Question 1

A defense contractor is interested in studying an inspection process to detect failure or fatigue of transformer parts. Three levels of inspections are used by three randomly chosen inspectors. Five lots are used for each combination in the study. The factor levels are given in the data. The response is in failures per 1000 pieces. The data is available as the text file inspectionData.txt on Blackboard.

Inspection.data <- read.table(header = TRUE, text ="
Inspector   insplevel   failures
A   full    7.50
A   full    7.42
A   full    5.85
A   full    5.89
A   full    5.35
A   reduced 7.08
A   reduced 6.17
A   reduced 5.65
A   reduced 5.30
A   reduced 5.02
A   commercial  6.15
A   commercial  5.52
A   commercial  5.48
A   commercial  5.48
A   commercial  5.98
B   full    7.58
B   full    6.52
B   full    6.54
B   full    5.64
B   full    5.12
B   reduced 7.68
B   reduced 5.86
B   reduced 5.28
B   reduced 5.38
B   reduced 4.87
B   commercial  6.17
B   commercial  6.20
B   commercial  5.44
B   commercial  5.75
B   commercial  5.68
C   full    7.70
C   full    6.82
C   full    6.42
C   full    5.39
C   full    5.35
C   reduced 7.19
C   reduced 6.19
C   reduced 5.85
C   reduced 5.35
C   reduced 5.01
C   commercial  6.21
C   commercial  5.66
C   commercial  5.36
C   commercial  5.90
C   commercial  6.12")

ggplot(Inspection.data, aes(x=interaction(Inspector,insplevel, sep=":"), 
                       y=failures,
                       fill=interaction(Inspector,insplevel, sep=":"))) +
  geom_boxplot(show.legend = FALSE) +
  theme_minimal()

Write an appropriate model, with assumptions. Include a Hasse diagram.

Our model equation is: \(y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}\)

where,

\(y_{ijk}\) is our response from treatment \(a_i\) x \(b_j\) in block k \(\mu\) is our overall mean \(\alpha_i\) is the effect of treatment \(a_i\) \(\beta_j\) is the effect of treatment \(b_j\) \((\alpha\beta)_{ij}\) is the interaction between the treatments \(\epsilon_{ijk}\) is our random error.

Use analysis of variance to test the appropriate hypothesis for inspector, inspection level, and interaction

aovInspec <- summary(aov(failures~Inspector*insplevel, data = Inspection.data)
                     )
aovInspec

##                     Df Sum Sq Mean Sq F value Pr(>F)
## Inspector            2  0.025  0.0126   0.020  0.981
## insplevel            2  2.587  1.2937   2.015  0.148
## Inspector:insplevel  4  0.094  0.0236   0.037  0.997
## Residuals           36 23.114  0.6421

It appears that we would fail to reject the null hypothesis that \(H_o\): 0 at the \(\alpha\) = .05 level.The p-values are all above .05, although insplevel is somewhat close to being statistically significant.

##Question 2

In a study conducted by the Department of Health and Physical Education at the Virginia Polytechnic Institute and State University, 3 diets were assigned for a period of 3 days to each of 6 subjects in a randomized block design. The subjects, playing the role of blocks, were assigned the following three diets, in random order:
Diet 1: mixed fat and carbohydrates
Diet 2: high fat
Diet 3: high carbohydrates
at the end of the 3-day period each subject was put on a treadmill and the time to exhaustion, in seconds, was measured. The data is available as the text file treadmillExperiment.txt on Blackboard.

Perform an analysis of variance, separating out the diet, subject, and error sum of squares. Use a p-value to determine whether there was a difference among the diets.

diet.data <- read.table(header = TRUE, text ="
subject diet    time
1   1   84
1   2   91
1   3   122
2   1   35
2   2   48
2   3   53
3   1   91
3   2   71
3   3   110
4   1   57
4   2   45
4   3   71
5   1   56
5   2   61
5   3   91
6   1   45
6   2   61
6   3   122                
                        ")

summary(
  dietaov <- aov(time~ factor(subject) + factor(diet)
, data = diet.data))

##                 Df Sum Sq Mean Sq F value  Pr(>F)   
## factor(subject)  5   6033  1206.7   6.661 0.00559 **
## factor(diet)     2   4297  2148.5  11.859 0.00229 **
## Residuals       10   1812   181.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It does look like the subjects and the diets are significantly different from one another.

interaction.plot(x.factor=diet.data$diet,
                 trace.factor=diet.data$subject,
                 response=diet.data$time,
                 fun=mean,
                 xlab="Diet type", ylab="Time until Exhaustion",
                 trace.label = "Subject #",
                 col=c("red", "blue", "orange", "green", "purple", "pink"), lty=1, lwd=2)

It looks like the 3rd diet (high carbohydrates) gave all of the subjects the most energy. This makes sense because the body can break down carbohydrates and use them for energy faster than it can with proteins or fats.

##Question 3

Balanced Incomplete Blocks (20) Design an experiment in which two factors are to be investigated, one at two levels and one at three levels. The experiment must be run in blocks, with no more than four test runs per block. Up to 20 blocks can be used in the experiment. Give a simple description of how to select treatment combinations for each block. Balanced incomplete block designs are discussed in Oehlert, Chapter 14.

Block 1 A1 B1 B2
Block 2 B3 A2 B2
Block 3 A1 A2 B1
Block 4 B3 B1 B2
Block 5 B3 A1 A2
Block 6 B3 B1 A2
Block 7 A1 B2 B3
Block 8 B2 A2 B1

Question 4

Consider a \(2^{6−2}\) fractional factorial using I = ABDF = −BCDE.

Find the aliases of the main effects.

A is BCEF B is ACDF C is ABDE D is ABEF E is ACDE F is BCDF

Find the factor-level combinations used.

We use 62 factor-level combinations.

Show how you would block these combinations into two blocks of size eight.

I would randomly split the factor-level combinations in half. From those two groups, I would again randomly split them into three groups of 8 and on group of 7, as we have 31, not 32 combinations. I would then go back and subset each block so that each one has two subsets from each of the groups, leaving us with two blocks of size eight.

Question 5

An industrial engineer is studying the hand insertion of electronic components on printed circuit boards to improve the speed of the assembly operation. He has designed three assembly fixtures and two workplace layouts that seem promising. Operators are required to perform the assembly, and it is decided to randomly select four operators for each fixture-layout combination. However, because the workplaces are in different locations the four operators chosen for layout 1 are different individuals from the four operators chosen for layout 2. Because there are only three fixtures and two layouts, but the operators are chosen at random, this is a mixed model. The treatment combinations in this design are run in random order and two replicates are obtained.

Write the linear model for this design, with the effects denoted

\(\mu\) is the grand mean

\(T_i\) is the effect of the ith fixture

\(\beta_j\) is the effect of the jth layout

\(\gamma_{k(j)}\) is the effect of the kth operator within the jth level of layout

\((T\beta)_{ij}\) is the fixture-layout interaction –>

\((T\gamma)_{ik(j)}\) is the fixture-operator interaction within layout

\(\epsilon_{ijk}\) is the error term.

y = \(\mu+T_i+\beta_j+\gamma_{k(j)}+(T\beta)_{ij}+(T\gamma)_{ik(j)}+\epsilon_{ijk}\)

Why is there no \(\beta_\gamma\) interaction effect?

There is no interaction effect because Operator is nested in Layout.

Construct a Hasse diagram for the design, including the number of effects and degrees of freedom.
Assembly times from the experiment are given here:

Use ANOVA to find the significant effects, based on your model above. Verify your degrees of freedom, comparing your Hasse diagram and the results of the ANOVA.

library(readr)
test1dataq5 <- read_csv("test1dataq5.csv")

## Rows: 48 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): Operator, Layout, Fixture, Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(test1dataq5)

 summary(
   nested <- aov(Time~ factor(Fixture)+factor(Layout) + factor(Fixture)*factor(Layout) + factor(Layout)/ factor(Operator), data = as.data.frame(test1dataq5))
 )

##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(Fixture)                  2  82.79   41.40  12.232 8.84e-05 ***
## factor(Layout)                   1   4.08    4.08   1.207  0.27931    
## factor(Fixture):factor(Layout)   2  19.04    9.52   2.813  0.07325 .  
## factor(Layout):factor(Operator)  6  71.92   11.99   3.542  0.00738 ** 
## Residuals                       36 121.83    3.38                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What is the estimated variance contribution for the operators factor?

The MSE is 11.99.

Stats Midterm

Cray Lester

2023-02-28

Question 4

Question 5