What is statistics?

M. Drew LaMar
January 23, 2017

“Maturity of mind is the capacity to endure uncertainty.”

- John Finley

Course Announcements

Course Announcements

  • Homework #2 (due Monday, January 30, 5 pm):
    • Whitlock & Scluter: Chapter 1
      • Practice Problems (do NOT turn these in): #1, 3, 4, 9, 12
      • Assignment Problems (do turn these in): #14, 16-20 (all), 24
    • Whitlock & Scluter: Chapter 2
      • Practice Problems (do NOT turn these in): #3, 8, 11-16 (all)
      • Assignment Problems (do turn these in): #20, 23, 29, 32-36

Course Announcements

Reading Quiz (Class Discussion)

Q1. What feature of an estimate—precision or accuracy—is most strongly affected when individuals differing in the variable of interest do not have an equal chance of being selected?

Answer: Accuracy

Reading Quiz (Class Discussion)

Q2. Is the following study observational or experimental?

Psychologists tested whether the frequency of illegal drug use differs between people suffering from schizophrenia and those not having the disease. They measured drug use in a group of schizophrenia patients and compared it with that in a similar sized group of randomly chosen people.

Sub-questions:

  • What is the explanatory and response variable?
  • Are they categorical or numerical?

Answer: Observational

Reading Quiz (Class Discussion)

Q2. Is the following study observational or experimental?

Psychologists tested whether the frequency of illegal drug use differs between people suffering from schizophrenia and those not having the disease. They measured drug use in a group of schizophrenia patients and compared it with that in a similar sized group of randomly chosen people.

Explanatory variable has values “schizophrenic” and “not schizophrenic”, which is a categorical variable.

Observational because treatment groups (or values of the explanatory variable) not assigned randomly by scientist!!

Experimental vs Observational

Definition: A study is experimental if the researcher assigns treatments randomly to individuals, whereas a study is observational if the assignment of treatments is not made by the researcher.

Populations vs Samples

Definition: A parameter is a quantity describing a population, whereas an estimate or statistic is a related quantity calculated from a sample.

Parameter examples: Averages, proportions, measures of variation, and measures of relationship

What is statistics?

Statistics is a technology that describes and measures aspects of nature from samples.

Statistics lets us quantify the uncertainty of these measures.

Statistics makes it possible to determine the likely magnitude of measurements departure from the “truth”.

Statistics is about estimation, the process of inferring an unknown quantity of a target population using sample data.

What is statistics?

The two sides of the statistical coin:

  • Parameter estimation
  • Hypothesis testing
Definition: A statistical hypothesis is a specific claim regarding a population parameter.
Definition: Hypothesis testing uses data to evaluate evidence for or against statistical hypotheses.

What is statistics? Parameter estimation

The two sides of the statistical coin:

  • Parameter estimation
  • Hypothesis testing

Example: A trapping study measures the rate of fruit fall in forest clear-cuts.

What is statistics? Hypothesis testing

The two sides of the statistical coin:

  • Parameter estimation
  • Hypothesis testing

Example: A clinical trial is carried out to determine whether taking large doses of vitamin C ben- efits health of advanced cancer patients.

What is probability?

alt text alt text

Probability comes first!

…well, most of the time.

  • Many statistical techniques require assumptions about where your data is coming from (i.e. properties of the population)
  • In other words, an assumed probability model describes the population
  • Statistical techniques that are based on probability models are called parametric techniques, while those that are not are called non-parametric techniques.
Quote: “Huh?”
- Student

Data as Information (recap)

For your question, there is desired and undesired information in your data.

Goals:

  • Get accurate information by reducing bias
  • Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)
Definition: Bias is a systematic discrepancy between the estimates we would obtain, if we could sample a population again and again, and the true population characteristic.

Data as Information (recap)

For your question, there is desired and undesired information in your data.

Goals:

  • Get accurate information by reducing bias
  • Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)
Definition: Sampling error is the difference between an estimate and the population parameter being estimated caused by chance.

Precision vs Accuracy

Random sampling

The main assumptions of all statistical techniques is that your data come from a random sample.

Definition: In a random sample, each member of a population has an equal and independent chance of being selected.


Random sampling

  1. minimizes bias (equal) and
  2. makes it possible to measure the amount of (quantify precision) sampling error (independent)

Random sampling (Class discussion)

In a recent study, researchers took electrophysiological measurements from the brains of two rhesus macaques (monkeys). Forty neurons were tested in each monkey, yielding a total of 80 measurements.

  1. Do the 80 neurons constitute a random sample? Why or why not?

    Lack of independence

  2. If the 80 measurements were analyzed as though they constituted a random sample, what consequences would this have for the estimate of the measurement in the monkey population?

    Incorrect precision of estimate (most likely underestimated)