Content you should have understood before watching this video:
- Number 1, ‘Variables’
Variables
- What is a variable?
- A variable has a name, and values, e.g.:
| Variable name | Possible values | Units |
|---|---|---|
| Smoker | Yes / No or 0 / 1 | NA |
| Time | 4, 65, 9.4 | seconds |
| Hair colour | brown, blond, black | NA |
| Concentration | 4.6, 1.9, 4.0 | mg L\(^{-1}\) |
Kinds of variables
- Categorical (entities are divided into distinct categories):
- Binary variable: There are only two categories, e.g. dead or alive, present or absent
- Nominal variable: There are more than two categories, e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian
- Ordinal variable: The same as a nominal variable but the categories have a logical order. e.g. whether people got a fail, a pass, a merit or a distinction in their exam
- Continuous (entities get a distinct score): e.g. human body height
Kinds of variables
- ‘Predictor variable’, or ‘independent variable’
- The proposed cause
- A manipulated variable (in experiments)
- Coke concentration on the next slide
- usually plotted on the x-axis
- ‘Response variable’, or ‘dependent variable’
- The proposed effect (‘outcome’)
- Measured, not manipulated (in experiments)
- Bacteria count in the example on the next slide
- usually plotted on the y-axis
Example: Coca-Cola kills bacteria
Example: Coca-Cola kills bacteria. Coca-Cola concentration (the predictor) is on the x-axis, Bacteria count (the response) is on the y-axis.
Kinds of variables
| Type of variable | Categorical (Binomial) | Categorical (Nominal) | Categorical (Ordinal) | Continuous |
|---|---|---|---|---|
| Predictor | smoker, sex, handedness | state of mind, gender | age class, rank | long jump results, body weight |
| Response | survival, handedness | employment type, hair colour | income bracket, clutch size | cholesterol level, body weight |
A variable has got a name, and values, examples:
- Variable name: handedness, values: left, right
- Variable name: body weight, values: 63.4, 88.2, …
- A categorical predictor variable is often called a ‘factor’, its values ‘factor levels’
Variables - easy?
How many variables? What are their names? What are their values?
Variables - easy?
We have two variables in this data set:
| Variable name | Values | Units |
|---|---|---|
| Site | 1, 2, 3 | NA |
| Soil moisture | 25.3, 16.4, 20.5, 17.2, … | Vol % |
Variables - easy?
Once properly organised, the data set should look like this:
| Soil moisture | Site |
|---|---|
| 25.3 | 1 |
| 16.4 | 1 |
| 20.5 | 1 |
| 17.2 | 2 |
| … | … |
All variable names are always at the top, the values listed underneath, units not needed here, they can be noted elsewhere!
Variables: THE most important in a nutshell
ANY data analysis procedure starts with these steps, not following this protocol causes 80% of the problems encoutered!
Identify what the variables are (and what kind, e.g. continuous etc.)
Put variable names in the first row of your data table
Fill in the values for the variables
Identify predictor and response variable(s)