211 DA Assignment Task 2

library(s20x)
failTimes.df = read.table("failure.txt", header = TRUE)
failTimes.df$Company = factor(failTimes.df$Company)

onewayPlot(Days ~ Company, data = failTimes.df)

boxplot(Days ~ Company, data = failTimes.df)

Center: Company A has a higher mean/median number of days than Company B. Spread: Company A has a greater spread than Company B. Skew: A appears to be right-skewed, and B is slightly right-skewed.

Let’s try fit a linear regression model:

failTimes.fit = lm(Days ~ Company, data = failTimes.df)
modcheck(failTimes.fit)

failTimes.fit2 = lm(log(Days) ~ Company, data = failTimes.df)
modcheck(failTimes.fit2)

t.test(log(Days) ~ Company, var.equal = FALSE, data = failTimes.df)

## 
##  Welch Two Sample t-test
## 
## data:  log(Days) by Company
## t = 2.221, df = 24.973, p-value = 0.03565
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  0.05339899 1.41680243
## sample estimates:
## mean in group A mean in group B 
##       1.2429993       0.5078986

ci = t.test(log(Days) ~ Company, var.equal = FALSE, data = failTimes.df)$conf.int
exp(ci)

## [1] 1.054850 4.123913
## attr(,"conf.level")
## [1] 0.95

Methods and Assumption Checks:

We have a numerical measurement made on 2 distinct groups, and should conduct a 2-sample t-test.

We are assuming observations are independent (such as assuming all products are not in the same environment). Using a linear model, our equality of variance is not satisfied, and the data is right-skewed, so we should try a multiplicative model. After we apply the log transformation, the equality of variance and normality assumptions appear a little better, however are still not exactly equal, so we now use the Welch 2-sample t-test on the transformed data.

The model fitted is \(\text{log(} rain_{ij})\ =\ \mu\ +\ \alpha_i +\ \varepsilon_{ij}\ \text{where}\ \mu\ \text{is the mean log Days,}\ \alpha_i\ \text{is the effect of being in company A or B, and}\ \varepsilon_{ij}\ \sim\text{iid}\ N(0, \sigma^2)\)

Executive Summary:

We have collected data to see if failure times for a product is different between Company A and Company B.

In order to perform an analysis on the data, we required a transformation, resulting in our results referring to median number of days, and expressed in multiplicative terms.

We find that there is longer time till failure in Company A compared to Company B.

We estimate that median failure times for Company A is between 1.1 to 4.1 times longer than failure times for Company B.