Semester 2, 2025
Richard Hayes
Email: rjhayes@unimelb.edu.au
We might be interested to know whether a certain treatment has some significant effect on the central location (measured by the mean or the median) of a population, while in some other situations we might wish to compare the central locations of two distinct populations.
in the first case, this is referred to as a paired-sample design
the second case is called an independent measures design
in order to find out whether some newly designed golf clubs improve golfers’ performance, we ask a group of golfers to play a round on a familiar golf course with their own clubs and then another round with the new clubs.
in order to find out whether a particular real estate agency tends to overvalue the properties of potential vendors in order to secure more business, and we compare a sample of evaluations by this agency to the evaluations of the same properties by some independent property valuer.
In both of these examples, there is just one set of experimental units (golfers; properties), one variable of interest (golfers’ scores on the given course; appraised values of properties), and a single random sample of pairs of observations (pairs of scores with the old and new clubs, respectively; pairs of appraised values provided by the real estate agency and the independent property valuer, respectively). Most importantly, the sample elements (golfers; properties) are supposed to be selected randomly but the observations in any particular pair of observations are related to each other.
suppose that we are interested in the customer satisfaction levels of two competing paid television channels ‘A’ and ‘B’, and ask a sample of viewers who usually watch channel ‘A’ and another sample of viewers who usually watch ‘B’ to answer a few questions about their level of satisfaction
suppose we are interested in the relationship between job tenure and qualification at a company, and compare the length of time employees with a bachelor’s degree or higher have been working at the company with that of employees who do not have such a degree.
In these examples, there are two different sets of experimental units (viewers of the two television channels; employees with a bachelor’s degree or higher and employees without such degree), one variable of interest (customer satisfaction; job tenure), but two random samples (samples of the viewers of the two channels; samples of the two types of employees). Crucially, these random samples are supposed to be independent of each other.
Paired sample design
A pupilometer is a device used to observe changes in pupil dilations at the eye exposed to different visual stimuli. Since there is a direct correlation between the amount an individual‘s pupil dilates and his or her interest in the stimuli, marketingorganizations sometimes use pupilometer to help them evaluate potential consumer interest in new products, alternative package designs, and other factors (Optical Engineering, Mar. 1995).
The Design and Market Research Laboratories of the Container Corporation of America used a pupilometer to evaluate consumer reaction to different silverware patterns for a client. Suppose 15 consumers were chosen at random, and each was shown the same two silverware patterns. Their pupilometer readings (in millimetres) are saved in the t4e1
Excel file.
The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other).
The variable of interest is quantitative and continuous.
The measurement scale is interval or ratio.
\(\sigma_D\) is unknown but the population of the differences is normally distributed (atleast approximately).
a. What are the appropriate null and alternative hypotheses to test whether the mean amount of pupil dilation differs for the two patterns?
Suppose the researcher shows two silverware patterns one after the other to a client and after each experiment measures his/her pupil dilation. Denote these pupilometer readings as \(X1\) and \(X2\), respectively. These measurements form a pair of matching observations, and the experiment itself is based on a paired-sample design.
If \(D_i\) denotes the difference between the two measurements for client i, i.e. \(D_i=X_{1i}-X_{2i}\),and \(\mu_D\) the mean of population \(D\), then the question implies the following null and alternative hypotheses: \[H_0: \mu_D=0 \qquad H_A:\mu_D \neq 0\]
Launch RStudio, create a new RStudio project and script, import the data from the Excel file to RStudio and load it into your current project. The pupilometer measurements are named Pattern1
and Pattern2
. Calculate the differences between the corresponding measurements:
Since \(D\) is assumed \(\thicksim N\), use a t-test
One Sample t-test
data: t4e1$D
t = 5.7637, df = 14, p-value = 4.905e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.1502729 0.3283937
sample estimates:
mean of x
0.2393333
Alternatively,
Paired t-test
data: t4e1$Pattern1 and t4e1$Pattern2
t = 5.7637, df = 14, p-value = 4.905e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
0.1502729 0.3283937
sample estimates:
mean difference
0.2393333
The observed test statistic is \(t_{obs} = 5.7637\) and the p-value is practically zero, so \(H_0\) can be rejected at any significance level. Therefore we conclude that the mean amount of pupil dilation differs for the two patterns.
With 95% confidence, the difference in the mean pupil dilation between pattern 1 and pattern 2 is somewhere between 0.1503 and 0.3283 millimetres.
Last week on the tutorial you used two non-parametric alternatives of the t-test for a population mean, the one sample sign test and the one sample Wilcoxon signed ranks test for the population median. The same tests can be performed on \(D\), or on Pattern1
and Pattern2
.
Let’s start with the sign test. When it is performed on two samples, it has the following requirements:
The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other but the selected pairs are).
The variable of interest is qualitative or quantitative.
The measurement scale is at least ordinal.
Since the consumers were selected randomly and each was shown the same two silverware patterns, and pupilometer reading is a quantitative variable measured on a ratio scale, all requirements are satisfied.
The null and alternative hypotheses are
\[ H_0: \eta=0- \qquad h_A: \eta \neq 0\]
Now run
or equivalently
Dependent-samples Sign-Test
data: t4e1$Pattern1 and t4e1$Pattern2
S = 14, number of differences = 15, p-value = 0.0009766
alternative hypothesis: true median difference is not equal to 0
96.5 percent confidence interval:
0.12 0.31
sample estimates:
median of the differences
0.21
This time the test is labelled “Dependent-samples Sign-test”, but otherwise the two printouts are equivalent. The test statistic is \(S = 14\) and the p-value is less than 0.001, so \(H_0\) can be rejected at any reasonable significance level.
Therefore we conclude that the median amount of pupil dilation differs for the two patterns.
Let’s move on to the two-sample Wilcoxon signed ranks test. It assumes that
The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other but the elected pairs are).
The variable of interest is quantitative and continuous.
The measurement scale is interval or ratio.
The distribution of the differences is symmetric.
As we already saw, the first three requirements are satisfied in this example, so we need to consider only the fourth requirement.
We can do so like in Exercise 2 of Tutorial 3.
Checking symmetry
median mean SE.mean CI.mean.0.95 var std.dev
0.210 0.239 0.042 0.089 0.026 0.161
coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W
0.672 0.543 0.468 -0.187 -0.084 0.946
normtest.p
0.461
You should conclude that they do not cast any doubt on the assumption of symmetry.
It is important to acknowledge though that this conclusion is not very sound this time due to the small sample size (\(n =15\)).
Yet, we do not need to worry about this uncertainty at this stage because if the sign test and the Wilcoxon signed ranks test lead to the same conclusion, then it does not really matter whether the population of \(D\) is symmetric or not.
Independent Sample Design
Marketing strategists would like to predict consumers’ response to new products and their accompanying promotional schemes. Consequently, studies that examine the differences between buyers and non-buyers of a product are of interest. One classic study conducted by Shuchman and Riesz (Journal of Marketing Research, Feb. 1975) was aimed at characterizing the purchasers and non-purchasers of Crest toothpaste.
The researchers demonstrated that both the mean household size (number of persons) and mean household income were significantly larger for purchasers than for non-purchasers.
A similar study utilized independent random samples of size 20 on the age of the householder primarily responsible for buying toothpaste. Householders were categorized as non-purchaser or purchaser of a particular brand of toothpaste coded as N
and P
, respectively.
The data are saved in the t4e2
file.
The two-independent-sample t test and the corresponding confidence interval estimator for the difference between two population means are based on the following assumptions:
The data consists of two independent random samples of independent observations (i.e. both the samples and the observations within each sample are independent).
The variable of interest is quantitative and continuous.
The measurement scale is interval or ratio.
\(\sigma_1\) and \(\sigma_2\) are unknown but the sampled populations are normally distributed (at least approximately).
The main thing here is that after you have identified assumptions (i-iii) hold are the \(\sigma\)’s the same.
In this question, we can take the first assumption granted as it was explicitly mentioned that the study was based on “independent random samples”.
The variable of interest is the age of the householder primarily responsible for buying toothpaste. It is a quantitative and continuous variable.
Although the actual observations are rounded to the nearest year, hence they are discrete values, there are large number of possible values so, for the purpose of hypothesis testing, we can still treat this variable as being continuous.
As for normality, although the sample sizes are a bit small, let’s apply stat.desc
on the two samples separately.
We can select certain observations with the subset(x, cond)
function, where x
is the object to be sub-setted and cond
is a logical expression that indicates which elements of x
to keep e.g.:
# Load t4e2 dataset
t4e2 <- read_excel("t4e2.xlsx")
# --- Group-wise summary statistics ---
by(t4e2$Age, t4e2$Householder, mean)
t4e2$Householder: N
[1] 47.2
------------------------------------------------------------
t4e2$Householder: P
[1] 39.8
t4e2$Householder: N
[1] 13.62119
------------------------------------------------------------
t4e2$Householder: P
[1] 10.03992
then use
library(pastecs)
stat.desc(subset(t4e2$Age, t4e2$Householder == "N"),
basic = FALSE, desc = TRUE, norm = TRUE)
median mean SE.mean CI.mean.0.95 var std.dev
52.0000000 47.2000000 3.0457909 6.3749136 185.5368421 13.6211909
coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W
0.2885846 -0.4663852 -0.4553624 -1.2493938 -0.6294914 0.9203074
normtest.p
0.1004447
and
median mean SE.mean CI.mean.0.95 var std.dev
38.0000000 39.8000000 2.2449944 4.6988273 100.8000000 10.0399203
coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W
0.2522593 0.1375106 0.1342606 -1.2922388 -0.6510782 0.9499264
normtest.p
0.3659746
For both groups, the mean and the median are close to each other, the skewness and kurtosis statistics are smaller in absolute value than twice their standard errors, and the p-values of the Shapiro-Wilk tests are larger than 0.1.
All in all, these statistics do not cast any doubt on the normality assumption.
Recall, that we also assumed that population variances are unequal based on a simple comparison of the sample variances below.
The two sample variances are \(s_1^2 = (13.621)^2 = 185.532\) and \(s_2^2 = (10.040)^2 = 100.802\). Their ratio, \(s_1^2 / s_2^2 = 1.84\), seems to be too big to assume that the corresponding population variances are equal, so let’s follow the third scenario where the \(\sigma\)’s are not equal.
Assume, moreover,that the sampled populations are not extremely non-normal.
Next week you will be asked to perform a formal hypothesis test to see whether there is indeed a significant difference between the two population variances.
Why is this so important?
Well, it gets down to which standard error you use in the test statistic.
If \(\sigma_1^2= \sigma_2^2\) use
\[
\begin{align*}
s_p^2 & = \dfrac{(n_1-1)s_1^2+(n_2-1)S_2^2}{n_1+n_2-3} \Rightarrow \\
s_{\bar{x}_1-\bar{x}_2} & = s_p\sqrt{\dfrac{1}{n_1}+\dfrac{1}{n_2}}
\end{align*}
\]
If \(\sigma_1^2 \neq \sigma_2^2\) use
\[s_{\bar{x}_1-\bar{x}_2} =\sqrt{\dfrac{1}{n_1}+\dfrac{1}{n_2}}
\]
with a degrees of freedom adjustment
\[ df = \dfrac{(s_{\bar{x}_1-\bar{x}_2})^2}{(s_{\bar{x}_1})^2/(n_1-1)+ (s_{\bar{x}_2})^2/(n_2-1)} \]
If we go to part (c), using R , we can test whether |
\(H_0: \mu_1=\mu_2 \Leftrightarrow \mu_1-\mu_2=0 \qquad \mu_1 \neq \mu_2 \Leftrightarrow \mu_1 \neq \mu2=0\) |
using |
::: {.cell} |
{.r .cell-code style='font-size: 1.75em'} t.test(t4e2$Age ~ t4e2$Householder, conf.level = 0.90) |
::: {.cell-output .cell-output-stdout} |
```{style=‘font-size: 1.0em’} |
Welch Two Sample t-test |
data: t4e2\(Age by t4e2\)Householder t = 1.9557, df = 34.94, p-value = 0.05853 alternative hypothesis: true difference in means between group N and group P is not equal to 0 90 percent confidence interval: 1.006765 13.793235 sample estimates: mean in group N mean in group P 47.2 39.8 ``` |
::: ::: |
If we thought the variances wer eequal then we would use
Two Sample t-test
data: t4e2$Age by t4e2$Householder
t = 1.9557, df = 38, p-value = 0.05788
alternative hypothesis: true difference in means between group N and group P is not equal to 0
90 percent confidence interval:
1.020752 13.779248
sample estimates:
mean in group N mean in group P
47.2 39.8
What if we intend to compare the central locations of two populations which are very nonnormal, or do not have means because they are measured on an ordinal scale, or do have means but we prefer to use the medians to measure their central locations? In these cases, we should use some nonparametric test for the difference between the population medians. The simplest option is the Wilcoxon rank-sum test.
The Wilcoxon rank-sum test is similar to the Wilcoxon signed ranks test, but instead of classifying the observations based on their relative positions to the hypothesized median, it classifies the observations according to some characteristic of the experimental units (in the current example according to being or not the toothpaste purchaser in the household).
The Wilcoxon rank-sum test is is based on the following assumptions:
The data consists of two independent random samples of independent observations (i.e. both the samples and the observations within each sample are independent).
The variable of interest is quantitative and continuous.
… and the measurement scale is at least ordinal.
The two sampled populations that differ at most with respect to their central locations measured by the medians (i.e. they are identical in shape and spread).
The hypotheses are:
\[H_0:\eta_1-\eta_2=0 \qquad \eta_1-\eta_2 \neq 0\]
Then run
Exact Wilcoxon rank sum test
data: t4e2$Age by t4e2$Householder
W = 272, p-value = 0.05129
alternative hypothesis: true mu is not equal to 0
The reported test statistic is \(W = 272\) and the p-value is \(0.05129\), so \(H_0\) can be rejected at the 10% significance level and conclude that at the 10% significance level there is a difference in the median age of purchasers and non-purchasers.