Exercise 4

For a two-sample, two-tailed t-test consider the following example data on men’s and women’s weights.

men_weights <- c(71.1, 73.4, 85.9, 75.5, 75.9, 87, 78.2, 66.1, 70.2)
women_weights <- c(52.8, 61.1, 56.8, 57, 55.6, 52.2, 63.9, 57.5, 45.2)

Create a data frame (call it ‘dat’) containing all the weights in the first column (‘weight’) and the gender identity in the second column (‘gender’).
Check for normality in the two groups separately using the hist as well as the qqnorm in conjunction with the qqline functions and the shapiro.test. If the normality criterion is not grossly violated, run a two-tailed t-test using the formula approach (refer to the lab notes or the last example in the help file for the t.test function if you don’t remember what the formula approach looks like).
Run a one-tailed t-test using the formula approach and hypothesise that women’s weights are lower than men’s (hint: to find out which group is considered x and which y when using the formula interface, use levels(dat$gender)…whatever gender is listed first is considered x in the t.test function).
Use the familiar functions in the dplyr package which we have used during the lab to create an aggregated data frame containing the means for each group as well as their standard errors (or standard deviations) and two further columns containing the mean + se and mean - se for easy plotting of the error bars. Next, use the barplot function in combination with the arrows function to create a barplot including statistical annotation (asterisks or lower case letters).

As an example for a one-sample t-test (consult the t.test help file and read up on the one-sample t-test) consider the following experiment.
A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights in cm:

11.5, 11.8, 15.7, 16.1, 14.1, 10.5, 15.2, 19.0, 12.8, 12.4, 19.2, 13.5, 16.5, 13.5, 14.4, 16.7, 10.9, 13.0, 15.1, 17.1, 13.3, 12.4, 8.5, 14.3, 12.9, 11.1, 15.0, 13.3, 15.8, 13.5, 9.3, 12.2, 10.3

Highlight and copy the data above (Ctrl + C on PC or Cmd + C on Mac) and read it into R using the scan() function.

On PC use the following code to read in data from the clipboard:

# On PC use:
sunflower <- scan(file = "clipboard", sep = ",") # on PC

On Mac use the following code to read in data from the clipboard:

# On Mac use:
sunflower <- scan(pipe(description = "pbpaste"), sep = ",")

Now write down the null and alternative hypotheses for the mean ($\mu$). Note: Because the treatment is expected to have a negative effect, you should phrase the alternative hypothesis accordingly.

Formulate a t-test for this scenario.

Can the null hypothesis be rejected?

What is the critical t-value for a significance level $\alpha$ of 0.05, given n – 1 = 32 degrees of freedom?

Consider this small dataset on depression scale measurements in nine patients taken at the first (x) and second (y) visit after initiation of a therapy (administration of a tranquilizer). The t-test procedure assumes independent samples. However, in this scenario measurements are taken from the same patient before and after therapy, so the samples are not independent. In this case, we need to account for the dependence

x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)

Run a paired t-test on the data (see help file of the t.test function).

Does the therapy show a statistically significant effect at the i) 0.05 significance level and the ii) 0.01 significance level?
What is the power of this paired t-test? (Hint: carefully read the help file for the power.t.test function) The power.t.test function uses the standard error of the difference in sample means, which is computed as the pooled estimate of the standard deviation (i.e. pooled across the two groups) divided by the number of observations per group. The pooled standard error can be calculated as follows [recall that the standard deviation $\sigma$ is given as $\sigma = \sqrt{var(x)}$]:

\[ se_{pooled} = \sqrt{\frac{var(x)}{n_x} + \frac{var(y)}{n_y}} \]

## Calculate the pooled standard error
se.pooled <- sqrt(var(x)/length(x) + var(y)/length(y))

Note: In the power.t.test function, the pooled standard error is referred to as the pooled standard deviation (because the standard error is nothing but the standard deviation divided by the square root of n) but don’t let that confuse you.

Next, run a single sample t-test using the difference between x and y (i.e. x – y) as input data (perform the test with the default setting for mu = 0).
What is the null hypothesis we are testing?
Compare the results to those of the paired t-test. What can you conclude about the inner workings of the paired t-test. The results are exactly the same suggesting that the paired t-test procedure simply computes the before-after differences and then performs a one sample t-test under the hood with the null hypothesis that the differences are zero.

Consider the built-in dataset ‘airquality’ and suppose we wanted to perform pairwise comparisons of ozone concentrations between all possible combinations of months. The dataset comprises five months, which yields ten pairwise comparisons. In other words, we would have to run ten t-tests by hand. R offers the pairwise.t.test function for this purpose.

data(airquality)
head(airquality)

month.abb # built-in vector of abbreviated month names

##  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
## [12] "Dec"

## Create a factor month
month <- factor(airquality$Month, labels = month.abb[5:9])
pairwise.t.test(x = airquality$Ozone, g = month)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  airquality$Ozone and month 
## 
##     May     Jun     Jul     Aug    
## Jun 1.00000 -       -       -      
## Jul 0.00026 0.05113 -       -      
## Aug 0.00019 0.04987 1.00000 -      
## Sep 1.00000 1.00000 0.00488 0.00388
## 
## P value adjustment method: holm

## How many pairwise comparisons are there with our 5 months?
## Check out the choose() function, which gives you the number of all unique pairwise combinations (k = 2)
choose(n = length(month.abb[5:9]), k = 2)

## [1] 10

## And the combn() function returns all pairwise comparisons
combn(x = month.abb[5:9], m = 2)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10]
## [1,] "May" "May" "May" "May" "Jun" "Jun" "Jun" "Jul" "Jul" "Aug"
## [2,] "Jun" "Jul" "Aug" "Sep" "Jul" "Aug" "Sep" "Aug" "Sep" "Sep"

The output reports adjusted P-values. Why do we need to adjust (correct) the P-values of these ten t-tests?

What is the P-value of the ozone comparison between July and September?

Is there a similar pairwise routine available for the non-parametric Wilcoxon test?

Read the help file for the p.adjust function. At some stage of your research career you will need to adjust for multiple testing. Here is a dummy example:

pvals <- c(0.03, 0.08, 0.1, 0.29, 0.57, 0.001, 0.00087)
pvals_adj <- p.adjust(pvals, method = "BH")
round(pvals_adj, digits = 3)

## [1] 0.070 0.140 0.140 0.338 0.570 0.004 0.004