Alban Guillaumet, Troy University
“Maturity of mind is the capacity to endure uncertainty.”
- John Finley
Definition of statistics
Sampling populations
Type of data, variables and studies
Statistics is the study of methods:
for measuring aspects of populations from samples; and
for quantifying the uncertainty of the measurements.
Statistical methods are essential in almost every area of biology.
1 - Estimation, the process of inferring an unknown quantity of a population using sample data
1 - Estimation
Definition: Astatistical hypothesis is a specific claim regarding a population parameter.
Definition:Hypothesis testing uses data to evaluate evidence for or against statistical hypotheses.
The image below shows the density of bullet holes on different sections of British Royal Air Forces planes return to base from aerial sorties during WWII.
Question: According to you, wich plane sections should be reinforced ?
Answer: On the engines and cockpit!
Remember: the data are a sample of planes, and it is biased. The whole population also includes all the planes that did NOT make it back to the base.
a population is the entire collection of individuals. For example:
A sample is a smaller set of individuals selected from the population. For example:
Note that the basic unit of sampling is NOT always an actual individual (e.g., a gene, but also a a single family, a species, etc.)
The researcher uses a sample to draw conclusion that, hopefully, will apply to the whole population
Good samples are therefore a foundation of good science.
Sampling error is the difference between anestimate and thepopulation parameter being estimated caused bychance
Ideally, our estimate is also accurate (or unbiased), meaning that the average of estimates that we might obtain is centered on the true population value.
The main assumptions of all statistical techniques is that your data come from a random sample.
Definition: In a
random sample , each member of a population has anequal andindependent chance of being selected.
1 - Every unit in the population must have an equal chance of being included in the sample.
Question: Give a few examples of sampling that does not satisfy this condition
Examples: pick the tallest flowers, the easiest birds to capture (territorial males), people answering the phone.
These hard-to-sample individuals might differ in their characteristics from those of the rest of the population, so underrepresentating them would lead to bias.
2 - The selection of units must be independent, i.e. the selection of any one member of the population cannot be influenced by the selection of any other member.
Question: Give a few examples of sampling that does not satisfy this condition
Examples: members of the same household for survey, same family members for a response to treatment, select closely related species to investigate body size change with aridity.
With non-independent sampling, our sample size is effectively smaller than we think. This will cause to miscalculate the precision of the estimates.
In a recent study, researchers took electrophysiological measurements from the brains of two rhesus macaques (monkeys). Forty neurons were tested in each monkey, yielding a total of 80 measurements.
Do the 80 neurons constitute a random sample? Why or why not?
A sample of convenience is a collection of individuals easily available to the researcher.
What is the main problem associated with samples of convenience?
Response: bias.
Random samples are difficult to obtain, and researcher must sometimes deal with suboptimal samples. It is important to acknowledge the limitations of the study, and attempt to correct for them in future work.
bullet holes in planes!
collapse of North Atlantic cod fishery due to excessive allowable catches by fishing boats.
volunteer bias in human studies (e.g., low-income categories if participants are paid, older people are more likely to be at home and respond to the phone, etc.)
Now that we have a sample, we can start our study and measure variables.
Definition:Variables are characteristics that differ among objects of interest, such as individuals
Definition:
Data are the measurements of one or more variables made on a sample of objects of interest.
Numerical variable (quantitative: magnitude on a numerical scale)
numeric data type in R
Categorical variable (qualitative)
factor data type in R
Ex. of numerical variables:
Ex. of categorical variables:
A frequent goal in statistics is to test how well one, or a set of variables (called explanatory), affects or predicts another variable of interest that we call the response variable.
which is which?
Definition: A study isexperimental if the researcher assigns treatments randomly to individuals, whereas a study isobservational if the assignment of treatments is not made by the researcher.
Q. Is the following study observational or experimental?
Psychologists tested whether the amount of illegal drug use differs between people suffering from schizophrenia and those not having the disease. They measured drug use in a group of schizophrenia patients and compared it with that in a similar sized group of randomly chosen people.
Sub-Q:
Psychologists tested whether the amount of illegal drug use differs between people suffering from schizophrenia and those not having the disease. They measured drug use in a group of schizophrenia patients and compared it with that in a similar sized group of randomly chosen people.
Explanatory variable has values “schizophrenic” and “not schizophrenic”, which is a categorical variable.
Response variable, “drug use” is a numerical variable.
Observational because treatment groups (or values of the explanatory variable) not assigned randomly by scientist!
Quote: We don’t include men in our calculation because they are not at risk for cervical cancer and by the same measure, we shouldn’t include women who don’t have a cervix
- Anne F. Rositch
What happens if you include females who do not have a cervix in your sample?
Sample is biased! -> Loss of accuracy (underestimate)
Quote: The researchers found that black women have a mortality rate of 10.1 per 100,000. For white women, the rate is 4.7 per 100,000. Past estimates had those rates at 5.7 and 3.2, respectively. The new death rate for black women in the US is on par with that of developing countries.
Discuss: Does this warrant the “much deadlier” headline?
Discuss: Why do you think the death rate for black woman in the US is higher?