ECON20003
Quantitative Methods 2

Semester 2, 2025

Tutorial 4
Comparing the Central Locations of Two Populations

Contact Details




Richard Hayes




Email: rjhayes@unimelb.edu.au




Comparing central locations - 2 populations






We might be interested to know whether a certain treatment has some significant effect on the central location (measured by the mean or the median) of a population, while in some other situations we might wish to compare the central locations of two distinct populations.

  1. in the first case, this is referred to as a paired-sample design

  2. the second case is called an independent measures design

Paired-Sample Design exmples



  • in order to find out whether some newly designed golf clubs improve golfers’ performance, we ask a group of golfers to play a round on a familiar golf course with their own clubs and then another round with the new clubs.

  • in order to find out whether a particular real estate agency tends to overvalue the properties of potential vendors in order to secure more business, and we compare a sample of evaluations by this agency to the evaluations of the same properties by some independent property valuer.

In both of these examples, there is just one set of experimental units (golfers; properties), one variable of interest (golfers’ scores on the given course; appraised values of properties), and a single random sample of pairs of observations (pairs of scores with the old and new clubs, respectively; pairs of appraised values provided by the real estate agency and the independent property valuer, respectively). Most importantly, the sample elements (golfers; properties) are supposed to be selected randomly but the observations in any particular pair of observations are related to each other.

Paired-Sample Design examples



  • suppose that we are interested in the customer satisfaction levels of two competing paid television channels ‘A’ and ‘B’, and ask a sample of viewers who usually watch channel ‘A’ and another sample of viewers who usually watch ‘B’ to answer a few questions about their level of satisfaction

  • suppose we are interested in the relationship between job tenure and qualification at a company, and compare the length of time employees with a bachelor’s degree or higher have been working at the company with that of employees who do not have such a degree.

In these examples, there are two different sets of experimental units (viewers of the two television channels; employees with a bachelor’s degree or higher and employees without such degree), one variable of interest (customer satisfaction; job tenure), but two random samples (samples of the viewers of the two channels; samples of the two types of employees). Crucially, these random samples are supposed to be independent of each other.

Mind-map





Exercise 1



Paired sample design

A pupilometer is a device used to observe changes in pupil dilations at the eye exposed to different visual stimuli. Since there is a direct correlation between the amount an individual‘s pupil dilates and his or her interest in the stimuli, marketingorganizations sometimes use pupilometer to help them evaluate potential consumer interest in new products, alternative package designs, and other factors (Optical Engineering, Mar. 1995).

The Design and Market Research Laboratories of the Container Corporation of America used a pupilometer to evaluate consumer reaction to different silverware patterns for a client. Suppose 15 consumers were chosen at random, and each was shown the same two silverware patterns. Their pupilometer readings (in millimetres) are saved in the t4e1 Excel file.

Paired 2 sample t test assumptions



  1. The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other).

  2. The variable of interest is quantitative and continuous.

  3. The measurement scale is interval or ratio.

  4. \(\sigma_D\) is unknown but the population of the differences is normally distributed (atleast approximately).


a. What are the appropriate null and alternative hypotheses to test whether the mean amount of pupil dilation differs for the two patterns?

Suppose the researcher shows two silverware patterns one after the other to a client and after each experiment measures his/her pupil dilation. Denote these pupilometer readings as \(X1\) and \(X2\), respectively. These measurements form a pair of matching observations, and the experiment itself is based on a paired-sample design.

If \(D_i\) denotes the difference between the two measurements for client i, i.e. \(D_i=X_{1i}-X_{2i}\),and \(\mu_D\) the mean of population \(D\), then the question implies the following null and alternative hypotheses: \[H_0: \mu_D=0 \qquad H_A:\mu_D \neq 0\]


  1. Conduct the test in part (a) using \(\alpha=0.05\), assuming that the population of \(D\) is normally distributed. Interpret the results.

Launch RStudio, create a new RStudio project and script, import the data from the Excel file to RStudio and load it into your current project. The pupilometer measurements are named Pattern1 and Pattern2. Calculate the differences between the corresponding measurements:

library(readxl)  
t4e1$D = t4e1$Pattern1 - t4e1$Pattern2

Since \(D\) is assumed \(\thicksim N\), use a t-test

t.test(t4e1$D)

    One Sample t-test

data:  t4e1$D
t = 5.7637, df = 14, p-value = 4.905e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 0.1502729 0.3283937
sample estimates:
mean of x 
0.2393333 

Alternatively,

t.test(t4e1$Pattern1, t4e1$Pattern2, paired = TRUE)

    Paired t-test

data:  t4e1$Pattern1 and t4e1$Pattern2
t = 5.7637, df = 14, p-value = 4.905e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.1502729 0.3283937
sample estimates:
mean difference 
      0.2393333 


The observed test statistic is \(t_{obs} = 5.7637\) and the p-value is practically zero, so \(H_0\) can be rejected at any significance level. Therefore we conclude that the mean amount of pupil dilation differs for the two patterns.

  1. Interpret the 95% confidence interval for the difference between the two pupil dilation population measurements, i.e. for population \(D\).

With 95% confidence, the difference in the mean pupil dilation between pattern 1 and pattern 2 is somewhere between 0.1503 and 0.3283 millimetres.

  1. Is the paired-sample design used for this study preferable to the independent measures design? For independent samples we could select 30 consumers, divide them into two groups of 15, and show each group a different pattern. Explain your preference.

    As often, the paired-sample design is preferred to the independent samples design. There can be much variation in pupil dilation from person to person which could disguise the variation due to the different patterns shown to the consumers.


  1. In part (b) it was assumed that the population of differences is normally distributed. Since the sample size is only 15, this assumption is fairly crucial.
    However, given this small sample size, the usual diagnostics for normality can be unreliable and misleading.
    For this reason perform the appropriate non-parametric test(s) for the median of the differences between the two pupil dilation measurements.
    Do you arrive at a similar conclusion to the one in part (b)?

Last week on the tutorial you used two non-parametric alternatives of the t-test for a population mean, the one sample sign test and the one sample Wilcoxon signed ranks test for the population median. The same tests can be performed on \(D\), or on Pattern1 and Pattern2.

Let’s start with the sign test. When it is performed on two samples, it has the following requirements:

  1. The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other but the selected pairs are).

  2. The variable of interest is qualitative or quantitative.

  3. The measurement scale is at least ordinal.


Since the consumers were selected randomly and each was shown the same two silverware patterns, and pupilometer reading is a quantitative variable measured on a ratio scale, all requirements are satisfied.

The null and alternative hypotheses are
\[ H_0: \eta=0- \qquad h_A: \eta \neq 0\]

Now run

library(DescTools)
SignTest(t4e1$D)

    One-sample Sign-Test

data:  t4e1$D
S = 14, number of differences = 15, p-value = 0.0009766
alternative hypothesis: true median is not equal to 0
96.5 percent confidence interval:
 0.12 0.31
sample estimates:
median of the differences 
                     0.21 


or equivalently

SignTest(t4e1$Pattern1, t4e1$Pattern2) 

    Dependent-samples Sign-Test

data:  t4e1$Pattern1 and t4e1$Pattern2
S = 14, number of differences = 15, p-value = 0.0009766
alternative hypothesis: true median difference is not equal to 0
96.5 percent confidence interval:
 0.12 0.31
sample estimates:
median of the differences 
                     0.21 

This time the test is labelled “Dependent-samples Sign-test”, but otherwise the two printouts are equivalent. The test statistic is \(S = 14\) and the p-value is less than 0.001, so \(H_0\) can be rejected at any reasonable significance level.

Therefore we conclude that the median amount of pupil dilation differs for the two patterns.


Let’s move on to the two-sample Wilcoxon signed ranks test. It assumes that

  1. The data is a random sample of independent pairs of observations (i.e. the before and after samples are not independent of each other but the elected pairs are).

  2. The variable of interest is quantitative and continuous.

  3. The measurement scale is interval or ratio.

  4. The distribution of the differences is symmetric.

As we already saw, the first three requirements are satisfied in this example, so we need to consider only the fourth requirement.

We can do so like in Exercise 2 of Tutorial 3.


Checking symmetry

hist(t4e1$D, freq = FALSE, col = "yellow")
lines(seq(-0.1, 0.6, by = 0.01),
      dnorm(seq(-0.1, 0.6, by = 0.01), mean(t4e1$D), sd(t4e1$D)),
      col = "red") 


qqnorm(t4e1$D, main = "Normal Q-Q Plot",
       xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
       col = "forestgreen")
qqline(t4e1$D, col = "blue")


library(pastecs)
round(stat.desc(t4e1$D, basic = FALSE, desc = TRUE, norm = TRUE), 3)
      median         mean      SE.mean CI.mean.0.95          var      std.dev 
       0.210        0.239        0.042        0.089        0.026        0.161 
    coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W 
       0.672        0.543        0.468       -0.187       -0.084        0.946 
  normtest.p 
       0.461 

You should conclude that they do not cast any doubt on the assumption of symmetry.
It is important to acknowledge though that this conclusion is not very sound this time due to the small sample size (\(n =15\)).

Yet, we do not need to worry about this uncertainty at this stage because if the sign test and the Wilcoxon signed ranks test lead to the same conclusion, then it does not really matter whether the population of \(D\) is symmetric or not.


library(exactRankTests)
wilcox.exact(t4e1$D)

    Exact Wilcoxon signed rank test

data:  t4e1$D
V = 119, p-value = 0.0001221
alternative hypothesis: true mu is not equal to 0

Exercise 2



Independent Sample Design

Marketing strategists would like to predict consumers’ response to new products and their accompanying promotional schemes. Consequently, studies that examine the differences between buyers and non-buyers of a product are of interest. One classic study conducted by Shuchman and Riesz (Journal of Marketing Research, Feb. 1975) was aimed at characterizing the purchasers and non-purchasers of Crest toothpaste.

The researchers demonstrated that both the mean household size (number of persons) and mean household income were significantly larger for purchasers than for non-purchasers.

A similar study utilized independent random samples of size 20 on the age of the householder primarily responsible for buying toothpaste. Householders were categorized as non-purchaser or purchaser of a particular brand of toothpaste coded as N and P, respectively.

The data are saved in the t4e2 file.



The two-independent-sample t test and the corresponding confidence interval estimator for the difference between two population means are based on the following assumptions:

  1. The data consists of two independent random samples of independent observations (i.e. both the samples and the observations within each sample are independent).

  2. The variable of interest is quantitative and continuous.

  3. The measurement scale is interval or ratio.

  4. \(\sigma_1\) and \(\sigma_2\) are unknown but the sampled populations are normally distributed (at least approximately).



The main thing here is that after you have identified assumptions (i-iii) hold are the \(\sigma\)’s the same.

In this question, we can take the first assumption granted as it was explicitly mentioned that the study was based on “independent random samples”.

The variable of interest is the age of the householder primarily responsible for buying toothpaste. It is a quantitative and continuous variable.
Although the actual observations are rounded to the nearest year, hence they are discrete values, there are large number of possible values so, for the purpose of hypothesis testing, we can still treat this variable as being continuous.

As for normality, although the sample sizes are a bit small, let’s apply stat.desc on the two samples separately.
We can select certain observations with the subset(x, cond) function, where x is the object to be sub-setted and cond is a logical expression that indicates which elements of x to keep e.g.:

# Load t4e2 dataset
t4e2 <- read_excel("t4e2.xlsx")
# --- Group-wise summary statistics ---
by(t4e2$Age, t4e2$Householder, mean)
t4e2$Householder: N
[1] 47.2
------------------------------------------------------------ 
t4e2$Householder: P
[1] 39.8
by(t4e2$Age, t4e2$Householder, sd)
t4e2$Householder: N
[1] 13.62119
------------------------------------------------------------ 
t4e2$Householder: P
[1] 10.03992

then use

library(pastecs)
stat.desc(subset(t4e2$Age, t4e2$Householder == "N"),
          basic = FALSE, desc = TRUE, norm = TRUE)
      median         mean      SE.mean CI.mean.0.95          var      std.dev 
  52.0000000   47.2000000    3.0457909    6.3749136  185.5368421   13.6211909 
    coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W 
   0.2885846   -0.4663852   -0.4553624   -1.2493938   -0.6294914    0.9203074 
  normtest.p 
   0.1004447 

and

stat.desc(subset(t4e2$Age, t4e2$Householder == "P"),
          basic = FALSE, desc = TRUE, norm = TRUE)
      median         mean      SE.mean CI.mean.0.95          var      std.dev 
  38.0000000   39.8000000    2.2449944    4.6988273  100.8000000   10.0399203 
    coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W 
   0.2522593    0.1375106    0.1342606   -1.2922388   -0.6510782    0.9499264 
  normtest.p 
   0.3659746 

For both groups, the mean and the median are close to each other, the skewness and kurtosis statistics are smaller in absolute value than twice their standard errors, and the p-values of the Shapiro-Wilk tests are larger than 0.1.

All in all, these statistics do not cast any doubt on the normality assumption.

Recall, that we also assumed that population variances are unequal based on a simple comparison of the sample variances below.

The two sample variances are \(s_1^2 = (13.621)^2 = 185.532\) and \(s_2^2 = (10.040)^2 = 100.802\). Their ratio, \(s_1^2 / s_2^2 = 1.84\), seems to be too big to assume that the corresponding population variances are equal, so let’s follow the third scenario where the \(\sigma\)’s are not equal.

Assume, moreover,that the sampled populations are not extremely non-normal.

Next week you will be asked to perform a formal hypothesis test to see whether there is indeed a significant difference between the two population variances.

Why is this so important?

Well, it gets down to which standard error you use in the test statistic.

If \(\sigma_1^2= \sigma_2^2\) use

\[ \begin{align*} s_p^2 & = \dfrac{(n_1-1)s_1^2+(n_2-1)S_2^2}{n_1+n_2-3} \Rightarrow \\ s_{\bar{x}_1-\bar{x}_2} & = s_p\sqrt{\dfrac{1}{n_1}+\dfrac{1}{n_2}} \end{align*} \]
If \(\sigma_1^2 \neq \sigma_2^2\) use

\[s_{\bar{x}_1-\bar{x}_2} =\sqrt{\dfrac{1}{n_1}+\dfrac{1}{n_2}} \]
with a degrees of freedom adjustment

\[ df = \dfrac{(s_{\bar{x}_1-\bar{x}_2})^2}{(s_{\bar{x}_1})^2/(n_1-1)+ (s_{\bar{x}_2})^2/(n_2-1)} \]


If we go to part (c), using R , we can test whether
\(H_0: \mu_1=\mu_2 \Leftrightarrow \mu_1-\mu_2=0 \qquad \mu_1 \neq \mu_2 \Leftrightarrow \mu_1 \neq \mu2=0\)
using
::: {.cell}
{.r .cell-code style='font-size: 1.75em'} t.test(t4e2$Age ~ t4e2$Householder, conf.level = 0.90)
::: {.cell-output .cell-output-stdout}
```{style=‘font-size: 1.0em’}
Welch Two Sample t-test
data: t4e2\(Age by t4e2\)Householder t = 1.9557, df = 34.94, p-value = 0.05853 alternative hypothesis: true difference in means between group N and group P is not equal to 0 90 percent confidence interval: 1.006765 13.793235 sample estimates: mean in group N mean in group P 47.2 39.8 ```
::: :::


If we thought the variances wer eequal then we would use

t.test(t4e2$Age ~ t4e2$Householder, var.equal = TRUE, conf.level = 0.90)

    Two Sample t-test

data:  t4e2$Age by t4e2$Householder
t = 1.9557, df = 38, p-value = 0.05788
alternative hypothesis: true difference in means between group N and group P is not equal to 0
90 percent confidence interval:
  1.020752 13.779248
sample estimates:
mean in group N mean in group P 
           47.2            39.8 


What if we intend to compare the central locations of two populations which are very nonnormal, or do not have means because they are measured on an ordinal scale, or do have means but we prefer to use the medians to measure their central locations? In these cases, we should use some nonparametric test for the difference between the population medians. The simplest option is the Wilcoxon rank-sum test.

The Wilcoxon rank-sum test is similar to the Wilcoxon signed ranks test, but instead of classifying the observations based on their relative positions to the hypothesized median, it classifies the observations according to some characteristic of the experimental units (in the current example according to being or not the toothpaste purchaser in the household).


The Wilcoxon rank-sum test is is based on the following assumptions:

  1. The data consists of two independent random samples of independent observations (i.e. both the samples and the observations within each sample are independent).

  2. The variable of interest is quantitative and continuous.

  3. … and the measurement scale is at least ordinal.

  4. The two sampled populations that differ at most with respect to their central locations measured by the medians (i.e. they are identical in shape and spread).

  1. Assume this time that the populations are not normally distributed and perform the Wilcoxon rank-sum test to see whether there is a difference in the median age of purchasers and non-purchasers (use \(\alpha = 0.10\)). Perform the test with R.

The hypotheses are:

\[H_0:\eta_1-\eta_2=0 \qquad \eta_1-\eta_2 \neq 0\]


Then run

library(exactRankTests)
wilcox.exact(t4e2$Age ~ t4e2$Householder)

    Exact Wilcoxon rank sum test

data:  t4e2$Age by t4e2$Householder
W = 272, p-value = 0.05129
alternative hypothesis: true mu is not equal to 0

The reported test statistic is \(W = 272\) and the p-value is \(0.05129\), so \(H_0\) can be rejected at the 10% significance level and conclude that at the 10% significance level there is a difference in the median age of purchasers and non-purchasers.