Final Examination - Stat and DOE Using R

1. The amount of time that it takes to complete a certain job is known to be Normally distributed with a mean of 10 minutes and a standard deviation of 1 minute.

Plot the probability density function corresponding to job completion times

curve(dnorm(x,10,1),0,20)

What is the probability a randomly selected job will be completed in less than 11 minutes?

pnorm(11,10,1)

## [1] 0.8413447

There is an 84% chance of finishing in less than 11 minutes.

2. A critical measurement on the diameter of a part that is used in a subassembly is assumed to have a mean of 10mm. The management would like to test this hypothesis against the alternative that it is not equal to 10mm at an alpha= 0.05 level of significance. Towards this end, they have collected a random sample of n=100 parts and measured their diameter.

dataq2<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/diameter.csv",header=TRUE,na.strings="")

Generate a histogram and boxplot of the collected measurements

Histogram

hist(dataq2$Diameter,main="Parts Measurements",xlab="Diameter",ylab="Count",col="orange")

Boxplot

boxplot(dataq2$Diameter, main="Parts Measurements")

State the null and alternative hypothesis, perform the test, and state conclusions.

\(H_0\): Mu = 10
\(H_a\): Mu ≠ 10

t.test(dataq2$Diameter, mu=10, alternative ="two.sided", conf.level = 0.05)

## 
##  One Sample t-test
## 
## data:  dataq2$Diameter
## t = 7.6839, df = 99, p-value = 1.134e-11
## alternative hypothesis: true mean is not equal to 10
## 5 percent confidence interval:
##  10.16899 10.17177
## sample estimates:
## mean of x 
##  10.17038

At the 5% significance level, we can reject the null hypothesis that the true mean is 10 (p-value < 0.05)

3. Researchers at a textile production facility would like to test the hypothesis that the mean breaking strength of abraided fabric is different than that of unabraided fabric at an alpha=0.10 level of significance. Towards this end, they conducted an experiment in which they measured the breaking force of 8 samples of each type of fabric. Assume the populations are approximately Normally distributed and use a two-sample t-test with a pooled variance.

dataq3<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/Fabric.csv",header=TRUE,na.strings="")

Generate a side-by-side boxplot of the collected measurements on the breaking strength of abraided and unabraided fabric.

boxplot(dataq3$ï..Abraided,dataq3$Unabraided, main="Collected measurements", names=c("Abraided","Unabraided"),ylab="")

State the null and alternative hypothesis, perform the test, and state conclusions.
\(H_0\) = \(u_1\) - \(u_2\) = 0
\(H_a\) = \(u_1\) ≠ \(u_2\)

t.test(dataq3$ï..Abraided,dataq3$Unabraided,var.equal = TRUE, conf.level= 0.10)

## 
##  Two Sample t-test
## 
## data:  dataq3$ï..Abraided and dataq3$Unabraided
## t = -1.3729, df = 14, p-value = 0.1914
## alternative hypothesis: true difference in means is not equal to 0
## 10 percent confidence interval:
##  -7.871082 -6.528918
## sample estimates:
## mean of x mean of y 
##    36.375    43.575

At the 10% significance level, we dont reject the null hypothesis that the abraided fabric is different than that of unabraided fabric. (p-value < 0.10)

4. Consider a designed experiment in which the crop yield was measured at 2 levels of crop density/spacing (1=dense, 2=sparse) and 3 levels of fertilizer (1=typeA, 2=typeB, 3=typeC). A total of 96 observations were collected. A colleague of yours did some preliminary analysis of the data in R using the following code

dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/cropdata2.csv")
str(dat)

## 'data.frame':    96 obs. of  3 variables:
##  $ density   : int  1 2 1 2 1 2 1 2 1 2 ...
##  $ fertilizer: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ yield     : num  177 178 176 178 177 ...

dat$density<-as.fixed(dat$density)
dat$fertilizer<-as.fixed(dat$fertilizer)
interaction.plot(dat$fertilizer,dat$density,dat$yield)

mod<-lm(yield~density+fertilizer+density*fertilizer,dat)
gad(mod)

## Analysis of Variance Table
## 
## Response: yield
##                    Df  Sum Sq Mean Sq F value    Pr(>F)    
## density             1  5.1217  5.1217 15.1945 0.0001864 ***
## fertilizer          2  6.0680  3.0340  9.0011 0.0002732 ***
## density:fertilizer  2  0.4278  0.2139  0.6346 0.5325001    
## Residual           90 30.3367  0.3371                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

mod<-lm(yield~density+fertilizer,dat)
gad(mod)

## Analysis of Variance Table
## 
## Response: yield
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## density     1  5.1217  5.1217 15.3162 0.0001741 ***
## fertilizer  2  6.0680  3.0340  9.0731 0.0002533 ***
## Residual   92 30.7645  0.3344                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Is the interaction significant (alpha=0.05)?
No, there is no interaction with alpha 0.05 **(I’m not sure about this).
Are the main effects significant (alpha=0.05)?
Yes, because they are less than alpha 0.05 **(I’m not sure about this).
Regardless of how dense the crops are planted, which fertilizer would give you the greatest yield?
The third fertilizer, because no matter the density, the results are better than 1 and 2 respectively.
Suppose that you had to use fertilizer typeA, would you have a greater yield planting the crop dense or sparse?
Yes, if I use it in sparse crops, it should have better results than if I use it in dense crops..

5. Consider designing an experiment in which we wish to test whether there is a difference in the mean between 4 levels of a single factor (i.e. between 4 populations). Specifically, this is to be a Completely Randomized Design that will be analyzed using ANOVA. We would like to collect a sufficient number of samples such that the test with an alpha=0.05 level of significance would be able to detect with a power of 85% a mean difference that is 50% of the standard deviation.

Determine the number of samples to be collected

pwr.anova.test(k=4,n=NULL,f=0.5,sig.level=0.05, power=0.85)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 13.32146
##               f = 0.5
##       sig.level = 0.05
##           power = 0.85
## 
## NOTE: n is number in each group

The sample for each level will be 14 (n=14)

Propose a randomized data collection table for this experiment.

trt1 <- c("lvl1","lvl2","lvl3","lvl4")
design<-design.crd(trt=trt1,r=14, seed=6814224)
design$book

##    plots  r trt1
## 1    101  1 lvl2
## 2    102  1 lvl1
## 3    103  2 lvl1
## 4    104  2 lvl2
## 5    105  3 lvl2
## 6    106  4 lvl2
## 7    107  3 lvl1
## 8    108  1 lvl3
## 9    109  2 lvl3
## 10   110  3 lvl3
## 11   111  1 lvl4
## 12   112  4 lvl3
## 13   113  5 lvl3
## 14   114  5 lvl2
## 15   115  6 lvl3
## 16   116  6 lvl2
## 17   117  7 lvl2
## 18   118  4 lvl1
## 19   119  5 lvl1
## 20   120  2 lvl4
## 21   121  3 lvl4
## 22   122  6 lvl1
## 23   123  7 lvl1
## 24   124  4 lvl4
## 25   125  8 lvl1
## 26   126  7 lvl3
## 27   127  8 lvl3
## 28   128  9 lvl1
## 29   129 10 lvl1
## 30   130  5 lvl4
## 31   131  8 lvl2
## 32   132  9 lvl3
## 33   133  9 lvl2
## 34   134  6 lvl4
## 35   135 10 lvl3
## 36   136 10 lvl2
## 37   137  7 lvl4
## 38   138  8 lvl4
## 39   139  9 lvl4
## 40   140 11 lvl3
## 41   141 10 lvl4
## 42   142 11 lvl1
## 43   143 11 lvl4
## 44   144 12 lvl1
## 45   145 11 lvl2
## 46   146 13 lvl1
## 47   147 12 lvl3
## 48   148 12 lvl2
## 49   149 13 lvl2
## 50   150 12 lvl4
## 51   151 13 lvl4
## 52   152 14 lvl1
## 53   153 13 lvl3
## 54   154 14 lvl2
## 55   155 14 lvl4
## 56   156 14 lvl3

Final Examination - Stat and DOE Using R - CR

LUIS SANCHEZ

28/05/2021

1. The amount of time that it takes to complete a certain job is known to be Normally distributed with a mean of 10 minutes and a standard deviation of 1 minute.

Histogram

Boxplot