August 23, 2015

Timetable second half (tentative)

Week 6
Date Week Lab Topic
19.7. 1 - Introduction, R, Rmarkdown
26.7. 2 1 Hypotheses, variables, variation
2.8. 3 2 Data, measuring variation, the normal distribution, quartiles, quantiles, probabilities
9.8. 4 3 Sebastian absent
16.8. 5 4 Type I and type II error
23.8. 6 5 Week of midsemester test

Recap week 5

Week 6

A normal distribution versus a Paranormal distribution :)

What we will do today

Week 6
  • Quick recap of the box1-box2 example
  • Midsemester example questions, MC and short answer
  • Lab report 3
  • A little simulation of why we divide by \(n-1\), not \(n\) when calculating the standard deviation
  • General questions/issues

Recap week 5

Week 6

pnorm(q = 160, mean = 164, sd = 6)
[1] 0.2524925

Is a value that we'd get 25% of the time by chance rare?

The story with the two boxes

Week 6

Example questions

Week 6

Without using R or a calculator, what is the approximate probability to draw a random number larger than 8 from a normal distribution with mean 4 and standard deviation 1?

  1. About 50%
  2. About 10%
  3. Less than 5%
  4. Very small, less than 1%

Example questions

Week 6

If you were to show the effect of a binomial variable on a continuous variable, which of the following would be best?

  1. A histogram
  2. A box plot
  3. Such a data set cannot be visualised
  4. None of the above

Example questions

Week 6

What is the reason for dividing by \(n-1\) rather than \(n\) when calculating the variance of a sample?

  1. To aovid division by zero
  2. To make sure the sum of squared residuals (deviances) is not zero
  3. To avoid underestimating the variance in small sample sizes
  4. To make sure the variance has the same units as the original variable

Example questions

Week 6

To numerically describe Auckland house prices, which set of metrics would you use?

  1. The mean and the range
  2. Only the minimum and the maximum
  3. The median and perhaps the first and third quartile
  4. The mode and the maximum
  5. None of the above

Example questions

Week 6

Which of the following is not adding non-systematic variation?

  1. If the experimenter becomes tired and starts making random mistakes
  2. If a cheaper, less accurate machine is bought to analyse the samples
  3. If patients are by mistake administered too low a dosis of a medication
  4. If suddenly the experimental subjects are taken from a more heterogeneous population

Example questions

Week 6

If you were commissioned to estimate the mean size of snapper in the Hauraki Gulf (your population), a good sample to take would be

  1. 200 snapper from Waiheke island
  2. 200 snapper taken from random spots around the Hauraki Gulf
  3. 200 snapper from random places all around New Zealand
  4. 500 snapper randomly taken from around the Hauraki Gulf

Example questions

Week 6

If the sample size increases

  1. So does the standard error
  2. The standard error remains the same
  3. The standard error decreases
  4. The variance increases
  5. The standard deviation increases

Example questions

Week 6

Which line of code provides the answer to the question 'What is the probability of being between 185 and 200 cm tall for a male of a population with mean body height 175 cm and standard deviation of 7?'

  1. pnorm(q = 200, mean = 175, sd = 7) - pnorm(q = 185, mean = 175, sd = 7)
  2. pnorm(q = 185, mean = 175, sd = 7) - pnorm(q = 200, mean = 175, sd = 7)
  3. pnorm(q = 200, mean = 185, sd = 7) - pnorm(q = 185, mean = 175, sd = 7)
  4. pnorm(q = 200, mean = 175, sd = 7) + pnorm(q = 185, mean = 175, sd = 7)
  5. None of the above

Example questions (short answer)

Week 6

Imagine you are researching the size of great white sharks around New Zealand waters. Assume this variable is normally distributed with mean 3 m and standard deviation 0.5 m.

  1. Sketch such a distribution, label all axes and indicate the mean and the standard deviation.
  2. At least how big are the biggest 5% of your population of sharks? Shade this probability in your sketch.
  3. You realise that in actual fact, great white shark size does not follow a normal distribution. The actual distribution is heavily skewed to the right. Sketch such a distribution and show how your estimate from (2) might have been terribly wrong.

Example questions (short answer)

Week 6

In a normally distributed population of mean 40 and standard deviation 5, what value would you consider 'unusually' large? Why? (Justify your answer)

Lab report 3

Week 6

See demonstration in RStudio…

Why we divide by \(n-1\), not \(n\) when calculating the standard deviation

Week 6
pop = rnorm(100000)

sd1 = NULL
sd2 = NULL

for (i in 1:10000) {
  s1 = sample(pop, 5)
  sd1 = append(sd1, sqrt(sum((s1 - mean(s1))^2)/(length(s1) - 1)))
  sd2 = append(sd2, sqrt(sum((s1 - mean(s1))^2)/length(s1)))
}

par(mfrow = c(1, 2))
hist(sd1, xlim = c(0, 2))
abline(v = 1, col = 'red')
abline(v = mean(sd1), col = 'green')
hist(sd2, xlim = c(0, 2))
abline(v = 1, col = 'red')
abline(v = mean(sd2), col = 'green')

=> This is not part of the test/exam!

Why we divide by \(n-1\), not \(n\) when calculating the standard deviation

Week 6