library(s20x)
failTimes.df = read.table("failure.txt", header = TRUE)
failTimes.df$Company = factor(failTimes.df$Company)
onewayPlot(Days ~ Company, data = failTimes.df)
boxplot(Days ~ Company, data = failTimes.df)
Center: Company A has a higher mean/median number of days than Company B. Spread: Company A has a greater spread than Company B. Skew: A appears to be right-skewed, and B is slightly right-skewed.
Let’s try fit a linear regression model:
failTimes.fit = lm(Days ~ Company, data = failTimes.df)
modcheck(failTimes.fit)
failTimes.fit2 = lm(log(Days) ~ Company, data = failTimes.df)
modcheck(failTimes.fit2)
t.test(log(Days) ~ Company, var.equal = FALSE, data = failTimes.df)
##
## Welch Two Sample t-test
##
## data: log(Days) by Company
## t = 2.221, df = 24.973, p-value = 0.03565
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## 0.05339899 1.41680243
## sample estimates:
## mean in group A mean in group B
## 1.2429993 0.5078986
ci = t.test(log(Days) ~ Company, var.equal = FALSE, data = failTimes.df)$conf.int
exp(ci)
## [1] 1.054850 4.123913
## attr(,"conf.level")
## [1] 0.95
Methods and Assumption Checks:
We have a numerical measurement made on 2 distinct groups, and should
conduct a 2-sample t-test.
We are assuming observations are independent (such as assuming all
products are not in the same environment). Using a linear model, our
equality of variance is not satisfied, and the data is right-skewed, so
we should try a multiplicative model. After we apply the log
transformation, the equality of variance and normality assumptions
appear a little better, however are still not exactly equal, so we now
use the Welch 2-sample t-test on the transformed data.
The model fitted is \(\text{log(} rain_{ij})\ =\ \mu\ +\ \alpha_i +\ \varepsilon_{ij}\ \text{where}\ \mu\ \text{is the mean log Days,}\ \alpha_i\ \text{is the effect of being in company A or B, and}\ \varepsilon_{ij}\ \sim\text{iid}\ N(0, \sigma^2)\)
Executive Summary:
We have collected data to see if failure times for a product is
different between Company A and Company B.
In order to perform an analysis on the data, we required a
transformation, resulting in our results referring to median number of
days, and expressed in multiplicative terms.
We find that there is longer time till failure in Company A compared to Company B.
We estimate that median failure times for Company A is between 1.1 to 4.1 times longer than failure times for Company B.