The amount of time that it takes to complete a certain job is known to be Normally distributed with a mean of 10 minutes and a standard deviation of 1 minute.
Plot the probability density function corresponding to job completion times What is the probability a randomly selected job will be completed in less than 11 minutes?
curve(pnorm(x,10, 1))
curve(dnorm(x,10, 1),5,15)
pnorm(11, 10, 1)
## [1] 0.8413447
The probability that a randomly selected job will be completed in less than 11 minutes is 0,8413
A critical measurement on the diameter of a part that is used in a subassembly is assumed to have a mean of 10mm. The management would like to test this hypothesis against the alternative that it is not equal to 10mm at an alpha= 0.05 level of significance. Towards this end, they have collected a random sample of n=100 parts and measured their diameter, which data is contained in the file https://raw.githubusercontent.com/tmatis12/datafiles/main/diameter.csv
Generate a histogram and boxplot of the collected measurements
State the null and alternative hypothesis, perform the test, and state conclusions.
dat <-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/diameter.csv")
diameter <- dat[1:100,]
boxplot(diameter,main = "Boxplot", xlab="Samples", ylab="Diameter")
?hist
hist(diameter, 10, main= "Histogram", xlab = "Diameter-mm")
\[ Ho:\mu Diameter = \ 10 mm \] \[ Ha:\mu Diameter \ne\ 10 mm\]
str(dat$Diameter)
## num [1:100] 10.3 10 10.2 10.2 10.4 ...
t.test(dat$Diameter, mu=10, alternative = "two.sided")
##
## One Sample t-test
##
## data: dat$Diameter
## t = 7.6839, df = 99, p-value = 1.134e-11
## alternative hypothesis: true mean is not equal to 10
## 95 percent confidence interval:
## 10.12638 10.21438
## sample estimates:
## mean of x
## 10.17038
Since p-value = 1.134e-11 we can reject H0 with a significance level of 95%
Researchers at a textile production facility would like to test the hypothesis that the mean breaking strength of abraided fabric is different than that of unabraided fabric at an alpha=0.10 level of significance.
Towards this end, they conducted an experiment in which they measured the breaking force of 8 samples of each type of fabric, which collected data may be found in the file https://raw.githubusercontent.com/tmatis12/datafiles/main/Fabric.csv Assume the populations are approximately Normally distributed and use a two-sample t-test with a pooled variance.
Generate a side-by-side boxplot of the collected measurements on the breaking strength of abraided and unabraided fabric State the null and alternative hypothesis, perform the test, and state conclusions
dat2 <-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/Fabric.csv")
fabric <-c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","2","2")
strength<-c(28.5, 20, 46, 34.5, 36.5, 52.5, 26.5, 46.5, 36.4, 55.0, 51.5, 38.7, 43.2, 48.4, 25.6, 49.8)
dat3<-data.frame(fabric, strength)
AA<- dat3[1:8,]
BB <- dat3[9:16,]
boxplot(AA$strength,BB$strength, names = c("Abraided", "Unabraided"), ylab="Breaking strength")
\[ Ho:\mu ,abraided = \mu ,unabraided \] \[ Ha:\mu ,abraided \ne\mu ,unabraided\]
t.test(AA$strength,BB$strength,var.equal=TRUE)
##
## Two Sample t-test
##
## data: AA$strength and BB$strength
## t = -1.3729, df = 14, p-value = 0.1914
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.448196 4.048196
## sample estimates:
## mean of x mean of y
## 36.375 43.575
Since p-value = 0.1914 fail to reject H0 with a 95% of significance
Consider a designed experiment in which the crop yield was measured at 2 levels of crop density/spacing (1=dense, 2=sparse) and 3 levels of fertilizer (1=typeA, 2=typeB, 3=typeC). A total of 96 observations were collected. A colleague of yours did some preliminary analysis of the data in R using the following code (you may copy and paste this code).
library(GAD)
## Loading required package: matrixStats
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
dat4<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/cropdata2.csv")
str(dat4)
## 'data.frame': 96 obs. of 3 variables:
## $ density : int 1 2 1 2 1 2 1 2 1 2 ...
## $ fertilizer: int 1 1 1 1 1 1 1 1 1 1 ...
## $ yield : num 177 178 176 178 177 ...
dat4$density<-as.fixed(dat4$density)
dat4$fertilizer<-as.fixed(dat4$fertilizer)
interaction.plot(dat4$fertilizer,dat4$density,dat4$yield)
mod<-lm(yield~density+fertilizer+density*fertilizer,dat4)
gad(mod)
## Analysis of Variance Table
##
## Response: yield
## Df Sum Sq Mean Sq F value Pr(>F)
## density 1 5.1217 5.1217 15.1945 0.0001864 ***
## fertilizer 2 6.0680 3.0340 9.0011 0.0002732 ***
## density:fertilizer 2 0.4278 0.2139 0.6346 0.5325001
## Residual 90 30.3367 0.3371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod<-lm(yield~density+fertilizer,dat4)
gad(mod)
## Analysis of Variance Table
##
## Response: yield
## Df Sum Sq Mean Sq F value Pr(>F)
## density 1 5.1217 5.1217 15.3162 0.0001741 ***
## fertilizer 2 6.0680 3.0340 9.0731 0.0002533 ***
## Residual 92 30.7645 0.3344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(dat4)
## density fertilizer yield
## 1:48 1:32 Min. :175.4
## 2:48 2:32 1st Qu.:176.5
## 3:32 Median :177.1
## Mean :177.0
## 3rd Qu.:177.4
## Max. :179.1
Is the interaction significant (alpha=0.05)?
For an alpha value of 0.05 the interaction is not significant since its Pr value is 0.5325
Are the main effects significant (alpha=0.05)?
Yes, the density and fertilizer effects are significant
Regardless of how dense the crops are planted, which fertilizer would give you the greatest yield? (justify your answer)
Fertilizer 3=typeC would give me the greatest yield
Suppose that you had to use fertilizer typeA, would you have a greater yield planting the crop dense or sparse? (justify your answer)
2=sparse because the mean of the yield is better than the yield for fertilizer 1=typeA when crop is dense
Consider designing an experiment in which we wish to test whether there is a difference in the mean between 4 levels of a single factor (i.e. between 4 populations).
Specifically, this is to be a Completely Randomized Design that will be analyzed using ANOVA.
We would like to collect a sufficient number of samples such that the test with an alpha=0.05 level of significance would be able to detect with a power of 85% a mean difference that is 50% of the standard deviation. Determine the number of samples to be collected and propose a randomized data collection table for this experiment.
library(pwr)
library(agricolae)
pwr.anova.test(k=4,n=NULL,f=0.5,sig.level=0.05, power=0.85)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 13.32146
## f = 0.5
## sig.level = 0.05
## power = 0.85
##
## NOTE: n is number in each group
According to the ANOVA test is necessary to collect 14 samples for each group
trt1<-c("lvl1","lvl2","lvl3","lvl4")
design<-design.rcbd(trt1,14,seed=2743536)
#Proposed Randomized Data Collection
design$book
## plots block trt1
## 1 101 1 lvl2
## 2 102 1 lvl3
## 3 103 1 lvl1
## 4 104 1 lvl4
## 5 201 2 lvl2
## 6 202 2 lvl1
## 7 203 2 lvl3
## 8 204 2 lvl4
## 9 301 3 lvl3
## 10 302 3 lvl1
## 11 303 3 lvl2
## 12 304 3 lvl4
## 13 401 4 lvl4
## 14 402 4 lvl1
## 15 403 4 lvl3
## 16 404 4 lvl2
## 17 501 5 lvl3
## 18 502 5 lvl2
## 19 503 5 lvl4
## 20 504 5 lvl1
## 21 601 6 lvl4
## 22 602 6 lvl2
## 23 603 6 lvl3
## 24 604 6 lvl1
## 25 701 7 lvl2
## 26 702 7 lvl4
## 27 703 7 lvl3
## 28 704 7 lvl1
## 29 801 8 lvl2
## 30 802 8 lvl4
## 31 803 8 lvl1
## 32 804 8 lvl3
## 33 901 9 lvl4
## 34 902 9 lvl1
## 35 903 9 lvl3
## 36 904 9 lvl2
## 37 1001 10 lvl1
## 38 1002 10 lvl2
## 39 1003 10 lvl4
## 40 1004 10 lvl3
## 41 1101 11 lvl1
## 42 1102 11 lvl3
## 43 1103 11 lvl4
## 44 1104 11 lvl2
## 45 1201 12 lvl4
## 46 1202 12 lvl2
## 47 1203 12 lvl1
## 48 1204 12 lvl3
## 49 1301 13 lvl1
## 50 1302 13 lvl3
## 51 1303 13 lvl2
## 52 1304 13 lvl4
## 53 1401 14 lvl1
## 54 1402 14 lvl3
## 55 1403 14 lvl4
## 56 1404 14 lvl2