Reading the data

Convert a numeric or character column into a factor variable, using as.factor()

## [1] "factor"
## [1] "factor"
## [1] "factor"
## [1] "factor"

Checking t-test Assumptions

Assumption of Normality

We will check whether the dependent variable wt is normally distributed or not.

Q-Q Plot

We can see that all the data points are not on the 45 degree line, hence we assume that the distribuion of weight is not normal.

Shapiro-Wilk Test for Normality (sample size must be between 3 and 5000)

H0: mtcars$wt is normally distributed

## 
##  Shapiro-Wilk normality test
## 
## data:  mtcars$wt
## W = 0.94326, p-value = 0.09265

Here the test indicates that the assumption of normality is followed at 5% level of significance as the p-value > 0.05. We FAIL TO REJECT the Null hypothesis. But at 10% level of significance the assumption of normality is not followed as the the p-value (0.09265) < 0.10.

Two-sample Kolmogorov-Smirnov test for Normality

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x and mtcars$wt
## D = 0.9375, p-value = 1.22e-12
## alternative hypothesis: two-sided

Assumption of Equal Variance

We will check whether the variances across the two groups are same or not.

## 
##  F test to compare two variances
## 
## data:  wt by am
## F = 1.5876, num df = 18, denom df = 12, p-value = 0.4177
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5107978 4.3959133
## sample estimates:
## ratio of variances 
##           1.587613

From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variances are statistically significantly different. Therefore, we can assume the homogeneity of variances in the two groups (am = 1 & am = 0).

If the Assumption of Equal Variance is Violated?

We will use Welch Two Sample t-test. We can use this test by simply writting (var.equal = FALSE) using t.test() in R

Example

## 
##  Two Sample t-test
## 
## data:  wt by am
## t = 5.2576, df = 30, p-value = 1.125e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.8304317 1.8853577
## sample estimates:
## mean in group 0 mean in group 1 
##        3.768895        2.411000
## 
##  Welch Two Sample t-test
## 
## data:  wt by am
## t = 5.4939, df = 29.234, p-value = 6.272e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.8525632 1.8632262
## sample estimates:
## mean in group 0 mean in group 1 
##        3.768895        2.411000

Non-Parametric Tests

1. One-Sample Wilcoxon Signed Rank Test

The one-sample Wilcoxon signed rank test is a non-parametric alternative to one-sample t-test when the data cannot be assumed to be normally distributed. It’s used to determine whether the median of the sample is equal to a known standard value (i.e. theoretical value).

To compare whether the median weight of cars the differ from 3 (1000 lbs), a value determined in a previous study.

H0: The median weight of the cars is not different from theoretical median 3 (1000 lbs). H1: The median weight of the cars is different from theoretical median 3 (1000 lbs).

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  wt
## V = 319, p-value = 0.3081
## alternative hypothesis: true location is not equal to 3

The p-value of the test is 0.3081, which is greater than the significance level alpha = 0.05. So we fail to reject the null hypothesis, We can conclude that the median weight of the cars is not different from theoretical median 3 (1000 lbs)

2. Two-Sample Wilcoxon

The unpaired two-samples Wilcoxon test (also known as Wilcoxon rank sum test or Mann-Whitney test) is a non-parametric alternative to the unpaired two-samples t-test, which can be used to compare two independent groups of samples. It’s used when your data are not normally distributed.

To compare whether the median weight of the cars having am = 0 is significantly different from median weight of the cars having am = 1.

H0: The median weight of the cars having am = 0 is not significantly different from median weight of the cars having am = 1

H1: The median weight of the cars having am = 0 is significantly different from median weight of the cars having am = 1

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  wt by am
## W = 230.5, p-value = 4.347e-05
## alternative hypothesis: true location shift is not equal to 0

The p-value of the test is 4.347e-05, which is less than the significance level alpha = 0.05. We can reject the null hypothesis,and conclude that median weight of the cars having am = 0 is significantly different from median weight of the cars having am = 1