Department of Environmental Science, AUT

Distributions: Prerequisites

Hypotheses, Models, Population, Sample

Content you should have understood before watching this video:

  • Number 1, ‘Variables’
  • Number 2, ‘Variation’
  • Number 3, ‘Measuring Variation’
  • Number 4, ‘Standard Deviation and Standard Error’

What is a distribution in a statistical sense?

Distributions

Distributions in R

Distributions
hist(rnorm(1000),  xlab = 'Score or Quantile',
     ylab = 'Density or Probability', main = "",
     axes = F, cex = 1.5); axis(1, cex = 1.5)

The uniform distribution

Distributions

  • Parameters to specify are the minimum and the maximum
  • Equal probablities between the min and the max

The poisson distribution

Distributions

  • Parameter to specify is lambda (\(\lambda\)), which represents both the mean and the variance
  • Count data: only integers, so for categorical (ordinal) variables

The poisson distribution - how can lambda be the mean AND the variance?

Distributions

If lambda (\(\lambda\)) increases, this means that both the spread gets larger, AND the mean gets larger! Let us illustrate this with an example:

The normal distribution

Distributions

  • The standard deviation is symmetrical, both its tails extend infinitely

  • The two parameters are the mean and the standard deviation

  • The standard normal distribution has mean 0 and standard deviation 1

  • In R, you can create normally distributed random numbers using the function rnorm()

  • The normal distribution has superior importance! (Central Limit Theorem, assumptions of standard parametric tests)

The Central Limit Theorem

The most important in a nutshell

Distributions
  • To see what distribution a variable might follow, you have to look at its histogram
  • A distribution can be visualised with a histogram (looks like a city skyline) OR a density plot (looks like a curve)
  • You should be familiar with a few common distribution - the uniform, the Poisson, and definitely the normal distribution
  • You should know their parameters and be able to simulate numbers in R that follow those distributions