Department of Environmental Science, AUT

Hypotheses, Models, Population, Sample: Prerequisites

Hypotheses, Models, Population, Sample

Content you should have understood before watching this video:

None, you can probably follow this video without having watched the any other ones of this series.

Generating and Testing Theories

Hypotheses, Models, Population, Sample
  • Theory

A hypothesised general principle or set of principles that explains known findings about a topic and from which new hypotheses can be generated. E.g.: ‘Biodiversity decreases towards the poles.’

  • Hypothesis

A prediction from a theory. E.g.: ‘In any one animal phylum, there are less species found between 30 and 60 degrees north/south than between 0 and 30 degrees north/south.’

Note that the terms ‘null hypothesis’ (H\(_0\)) and ‘alternative hypothesis’ (H\(_A\)) are something else (see later)

Testable and non-testable hypotheses

Hypotheses, Models, Population, Sample
  • Testable (scientifically usable) hypotheses
    • There are no rats on Rangitoto Island
    • The Beatles sold more records than any other band
    • More people like this product with a sour taste rather than a bitter taste
    • Patients taking this medication live longer than those who don’t
  • Non-testable (non-scientific) hypotheses
    • There are 8 rats on Rangitoto Island
    • There are rats on Rangitoto Island
    • The Beatles were the best band ever
    • Most people like this product with a sour taste
    • This medication has helped so many people, so surely it must be good


  • Falsification: a good scientific hypothesis is quantifiable and falsifiable!

Statistical models

Hypotheses, Models, Population, Sample
  • A statistical model is a way of simplifying an observed process or mechanism

\(outcome_i = (model) + error_i\)

\(person_1 = 170 cm + 3 cm\)

…

\(person_{10} = 170 cm -1 cm\)

What can we use (statistical) models for?

Hypotheses, Models, Population, Sample

Models can be used to test hypotheses, but they can also summarise information, or predict values where data are missing (predictive model).

For example:

Hypotheses, Models, Population, Sample

  • Are x and y correlated? (hypothesis testing)
  • What is the mean of x, the mean of y? (summary information)
  • What is y when x = 5? (predictive model)

The simplest statistical model

Hypotheses, Models, Population, Sample

The mean!

  • The mean summarises data
  • The mean can be used to predict future outcomes of a variable (E.g. if 200 people died on average every year on the road in NZ over the past 3 years, we can use this value to predict the road toll in the following year)
  • The mean is a hypothetical value (i.e. it doesn’t have to be a value that actually exists in the data set, e.g. the mean of 192, 188, and 220 is 200).
  • The mean can be used to test whether it is different from a certain value (e.g. is the road toll in NZ in 2012-2015 different from the one 1962-1965?)

As such, the mean is a simple statistical model.

The mean: simple example

Hypotheses, Models, Population, Sample
  • Collect some data

\(1, 3, 4, 3, 2\)

  • Add them up:

\(\sum\limits_{i=1}^n x_i = 1 + 3 + 4 + 3 + 2 = 13\)

  • Divide by the number of scores, \(n\):

\(\bar{X} = \frac{\sum\limits_{i=1}^n x_i}{n} = \frac{13}{5} = 2.6\)

The mean is a statistical model !

\(outcome_i = (model) + error_i\)

\(outcome_{person1} = (model) + error_{person1}\)

\(1 = 2.6 + (-1.6)\)

Quick summary

Hypotheses, Models, Population, Sample
  • What is a (statistical) model?
    • It is a summary/simplification of what is going on in reality
    • It can be used for a number of purposes: to test hypotheses, to predict, or to summarise
  • How is the mean a statistical model?
    • It summarises information contained in many data points
    • It can be used to make a prediction
    • It can be used to test a hypothesis

Population versus sample

Hypotheses, Models, Population, Sample

Consider these three experiments:

  1. You are asked to determine the germination rate of 1 kg of grass seeds. What is your sample? What is your population?

  2. You are asked to determine the germination rate of grass seeds ‘supergrass’ sold at the warehouse. What could be your sample? What is the population?

  3. You are asked to test the efficiency of a lung cancer treatment. What could be your sample? What is the population?

Population versus sample

Hypotheses, Models, Population, Sample
  • In example (1), you are simply referring to the 1 kg of grass seeds, this is your sample, but it is also your population that you are making your inference on.

  • In (2), your population are all grass seeds sold all over New Zealand during this season. Your sample could be one sachet of seeds per warehouse branch.

  • In (3), your population are possibly all present and future lung cancer patients globally. Your population is fictitious. Your sample may be 20 lung cancer patients in Auckland.

When selecting your sample, think of the population you would like to make an inference on!

The most important in a nutshell

Hypotheses, Models, Population, Sample
  • A theory is different from a hypothesis. A theory is a broader concept, a hypothesis must be immediately testable and falsifyable.
  • A null hypothesis is different from a scientific hypothesis (discussed later)
  • A statistical model can have three purposes: it can test hypotheses, summarise information, and/or predict values
  • A sample is a subset of the entire population (not necessarily humans!)
  • When taking a sample, we have to keep the population we want to study in mind!