1 Types of validity

Lecture 1 (p.32) + Textbook (2.6, pp.22-25 [35-38])

1.1 Internal validity

Lecture slides:

The extent to which you are able draw the correct conclusions about the causal relationships between variables.

Understanding internal and external validity (https://www.verywellmind.com/internal-and-external-validity-4584479)

Internal validity is the extent to which a study establishes a trustworthy cause-and-effect relationship between a treatment and an outcome. It also reflects that a given study makes it possible to eliminate alternative explanations for a finding. For example, if you implement a smoking cessation program with a group of individuals, how sure can you be that any improvement seen in the treatment group is due to the treatment that you administered?

Internal validity is not a “yes or no” type of concept. Instead, we consider how confident we can be with the findings of a study, based on whether it avoids traps that may make the findings questionable.

As a brief summary, you can only assume cause-and-effect when you meet the following three criteria in your study:
- The cause preceded the effect in terms of time.
- The cause and effect vary together.
- There are no other likely explanations for this relationship that you have observed.

Factors That Improve Internal Validity:
- Randomization.
- Random selection of participants.
- Blinding.
- Experimental manipulation.
- Study protocol.

Factors That Threaten Internal Validity:
- Confounding.
- Historical events. - Maturation.
- Testing.
- Instrumentation.
- Statistical regression.
- Attrition.
- Diffusion.
- Experimenter bias.


1.2 External validity

Lecture slides:

The generalizability of your findings. To what extent do you expect to see the same pattern of results in “real life” as you saw in your study.

Understanding internal and external validity (https://www.verywellmind.com/internal-and-external-validity-4584479)

External validity refers to how well the outcome of a study can be expected to apply to other settings. In other words, this type of validity refers to how generalizable the findings are. For instance, do the findings apply to other people, settings, situations, and time periods?

Factors that Improve External Validity:
- Inclusion and exclusion criteria.
- Psychological realism.
- Replication.
- Field experiments.
- Reprocessing or calibration.

Factors That Threaten External Validity:
- Situational factors.
- Pre- and post-test effects.
- Sample features.
- Selection bias.


1.3 Construct validity

Lecture slides:

Whether you’re actually measuring what you want to be measuring.

The four types of validity https://www.scribbr.com/methodology/types-of-validity/

A construct refers to a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organizations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.

Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.


1.4 Face validity

Lecture slides:

Whether or not a measure “looks like” it’s doing what it’s supposed to.

Face Validity https://methods.sagepub.com/Reference//encyc-of-research-design/n147.xml

It requires investigators to step outside of their current research context and assess their observations from a commonsense perspective. A typical application of face validity occurs when researchers obtain assessments from current or future individuals who will be directly affected by programs premised on their research findings. An example of testing for face validity is the assessment of a proposed new patient tracking system by obtaining observations from local community health care providers who will be responsible for implementing the program and getting feedback on how they think the new program may work.


1.5 Ecological validity

Lecture slides:

The entire set up of the study should closely approximate the real world scenario that is being investigated.

What is Ecological Validity? https://www.statisticshowto.com/ecological-validity/

Ecological Validity is a specific type of external validity. External validity refers to your ability to generalize your experimental results across populations, places, and time; ecological validity is limited to how the experimental results apply to today’s society.


1.6 Threat to validity

1.6.1 Confound

Textbook:

A confound is an additional, often unmeasured variable that turns out to be related to both the predictors and the outcomes. The existence of confounds threatens the internal validity of the study because you can’t tell whether the predictor causes the outcome, or if the confounding variable causes it, etc.


1.6.2 Artifact

Textbook:

A result is said to be “artifactual” if it only holds in the special situation that you happened to test in your study. The possibility that your result is an artifact describes a threat to your external validity, because it raises the possibility that you can’t generalise your results to the actual population that you care about.


2 Measurement scales

Lecture 1 (p.28) Textbook (2.2, pp.14-18 [27-31])

2.1 Categorical measurement

Lecture slides:

Entities are divided into distinct categories.


2.1.1 Binary scale

Lecture slides:

There are only two categories.
Example: gender (male or female)


2.1.2 Nominal scale

Lecture slides:

There are more than two categories.
Example: Flower types (Tulip, Rose, Lavender)


2.1.3 Ordinal scale

Lecture slides:

The same as a nominal variable but the categories have a logical order.
Example: Exam result (Fail, Pass)


2.2 Continuous measurement

Lecture slides:

Entities get a distinct score.


2.2.1 Interval scale

Lecture slides:

Equal intervals on the variable represent equal differences in the property being measured.
Example: the difference between 6 and 8 is equivalent to the difference between 13 and 15.

Textbook:

The differences between the numbers are interpretable.


2.2.2 Ratio scale

Lecture slides:

The same as an interval variable, but the ratios of scores on the scale must also make sense and have true 0 value.
Example: e.g. a score of 16 on an anxiety scale means that the person is, in reality, twice as anxious as someone scoring 8.

Temperature is not ratio scale.
The \(0^o\) does not mean “no temperature at all”: it actually means “the temperature at which water freezes”. As a consequence, it becomes pointless to try to multiply and divide temperatures. It is wrong to say that \(20^o\) is twice as hot as \(10^o\), just as it is weird and meaningless to try to claim that \(20^o\) is negative two times as hot as \(10^o\).

Textbook:

Notice that multiplication and division also make sense here too.


3 Reliability

Lecture 1 (p.29) Textbook (2.3, p.19 [32])

Lecture slides:

The ability of the measure to produce the same results under the same conditions.


3.1 Test-retest reliability

Lecture slides:

The ability of a measure to produce consistent results when the same entities are tested at two different points in time.

Textbook:

This relates to consistency over time: if we repeat the measurement at a later date, do we get a the same answer?


3.2 Inter-rater reliability

Lecture slides:

Consistency across people. Do they produce the same answer?

Textbook:

This relates to consistency across people: if someone else repeats the measurement (e.g., someone else rates my intelligence) will they produce the same answer?


3.3 Parallel forms reliability

Lecture slides:

Do different measures that are supposed to measure the same thing actually measure it the same? (Two different eye trackers).

Textbook:

This relates to consistency across theoretically-equivalent measurements: if I use a di erent set of bathroom scales to measure my weight, does it give the same answer?


3.4 Internal consistency reliability

Lecture slides:

Do things that are supposed to measure the same thing actually measure it? (Multiple questions measuring IQ).

Textbook:

If a measurement is constructed from lots of different parts that perform similar functions (e.g., a personality questionnaire result is added up across several questions) do the individual parts tend to give similar answers.


4 R - Introduction

4.1 Creating valid variable names in R

Lecture slides:

  • Variables are used to store information
  • They also provide a way of labelling information
  • Use the “assignment operator” <- to create one

R codes

weight <- 52.3        # numeric variable
height <- 152         # numeric variable
country <- "Vietnam"  # character variable
winter <- FALSE       # logical variable 
cool <- TRUE          # logical variable 

4.2 Using a working directory

  • The primary file format used by R is .Rdata
  • It is a saved workspace
  • It contains whatever data sets, variables, functions etc that the workspace included when the file was created

4.3 Loading data in R

4.3.1 Load an .Rdata file

  • Hard way: use the load() functionmanually
  • Easy way #1: double click on the .Rdata file, and (as long as Rstudio is the default application for Rdata files) it will load automatically
  • Easy way #2: open using the Rstudio menus (File > Open file)

4.3.2 Load a csv file

  • function read.csv()
  • Environment > Import Dataset > From Text
  • File > Import Dataset > From Text

4.4 Logical and arithmetic operators in R

4.4.1 Arithmetic operators

  • Applied to numeric varialbes only
  • Return a number, Inf, -Inf, NA

R codes

# + addition 
10 + 5
[1] 15
2.34 + 234.23
[1] 236.57
1 + 0
[1] 1
1 + NA
[1] NA
# - subtraction 
10 - 5
[1] 5
2.34 - 234.23
[1] -231.89
1 - 0
[1] 1
1 - NA
[1] NA
# * multiplication 
10 * 5
[1] 50
2.34 * 234.23
[1] 548.0982
1 * 0
[1] 0
1 * NA
[1] NA
# / division 
10 / 5
[1] 2
2.34 / 234.23
[1] 0.009990181
1 / 0
[1] Inf
1 * NA
[1] NA
# ^ taking powers
10 ^ 5
[1] 100000
2.34 ^ 234.23
[1] 302991385600521639242888846260608466428208844040262028844280606688664022804480620048006
1 ^ 0
[1] 1
1 ^ NA
[1] 1

4.4.2 Logical statements

  • Return the result as TRUE or FALSE

R codes

# == equality 
1 == 2
[1] FALSE
2 + 3 == 5
[1] TRUE
# != inequality 
1 != 2
[1] TRUE
2 + 3 != 5
[1] FALSE
# > greater than 
1 > 2
[1] FALSE
2 + 3 > 5
[1] FALSE
# >= greater than or equal to 
1 >= 2
[1] FALSE
2 + 3 >= 5
[1] TRUE
# < less than 
1 < 2
[1] TRUE
2 + 3 < 5
[1] FALSE
# <= less than or equal to
1 <= 2
[1] TRUE
2 + 3 <= 5
[1] TRUE
# & AND 
1 == 2 & 2 + 3 == 5
[1] FALSE
1^0 == 1 & 2 + 3 == 5
[1] TRUE
1^0 == 0 & 1*0 == 1 
[1] FALSE
# | OR 
1 == 2 | 2 + 3 == 5
[1] TRUE
1^0 == 1 | 2 + 3 == 5
[1] TRUE
1^0 == 0 | 1*0 == 1
[1] FALSE
# ! NOT
!(1 == 2 & 2 + 3 == 5)
[1] TRUE
!(1^0 == 1 & 2 + 3 == 5)
[1] FALSE
!(1^0 == 0 & 1*0 == 1)
[1] TRUE
!(1 == 2 | 2 + 3 == 5)
[1] FALSE
!(1^0 == 1 | 2 + 3 == 5)
[1] FALSE
!(1^0 == 0 | 1*0 == 1)
[1] TRUE

4.5 Vector operations

Lecture slides

  • Vectors are variables that store multiple pieces of information

  • Create vectors using c(). c() combines a set of values, and stores them as a vector

R codes

## 
# numeric vector
weights <- c(45, 59, 36, 78)

# character vector
names <- c("Donald Trump", "Selena Gomez", "Lionel Messi", "Steve Jobs")

# logical vector
checking <- c(TRUE, FALSE, FALSE, FALSE)
  • Extract specific elements using [ ]

R codes

weights[1]
[1] 45
names[c(2,4)]
[1] "Selena Gomez" "Steve Jobs"  
checking[2]
[1] FALSE
checking[2:4]
[1] FALSE FALSE FALSE

4.6 Data frame and matrix operations and structures

Lecture slides

  • Data frames are the way R stores a typical data set
  • It is a collection of variables “bundled” together
  • Organized into a “case by variable” matrix.
  • Each row is a “case”.
  • Each column is a named “variable”. Operations in variables also work with data frame column.

R codes

df <- data.frame(name = names,
                 weight = weights,
                 result = checking)

# data frame df have 3 columns (3 vectors or 3 variables) and 4 rows.

df
          name weight result
1 Donald Trump     45   TRUE
2 Selena Gomez     59  FALSE
3 Lionel Messi     36  FALSE
4   Steve Jobs     78  FALSE

4.7 Differences between inf, NA, NaN and NULL in R

  • NULL is a special “value” in R that means “this variable does not exist” or “it has no value”.
  • NA, which means “the variable exists (and in principle has a value), but that value is missing/unknown”

4.8 Factors in R

  • A “factor” is a nominal scale variable

5 R - Data manipulation

5.1 How to extract particular elements of a vector


5.2 Manipulating and extracting from character strings in R


5.3 Coercing variable types in R


6 Measures of central tendency

Lecture 4 (pp.7-11) Textbook (5.1, pp.114-122 [127-135])

6.1 Mean

Textbook

The mean of a set of observations is just a normal, old-fashioned average: add all of the values up, and then divide by the total number of values.

Lecture slides

The mean is also the value from which the (squared) scores deviate least (it has the least error).

Formula

\[ mean(X) = \bar{X} = \frac{1}{N}\sum_{i=1}^{N}X_i \]

R function

Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}

mean(df$var3)
[1] 209.6

6.2 Median

Textbook

The median of a set of observations is just the middle value.

R function

Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}

median(df$var3)
[1] 4.5

6.3 Trimmed mean

Textbook

To calculate a trimmed mean, what you do is “discard” the most extreme examples on both ends (i.e., the largest and the smallest), and then take the mean of everything else. The goal is to preserve the best characteristics of the mean and the median: just like a median, you aren’t highly influenced by extreme outliers, but like the mean, you “use” more than one of the observations.

Generally, we describe a trimmed mean in terms of the percentage of observation on either side that are discarded. So, for instance, a 10% trimmed mean discards the largest 10% of the observations and the smallest 10% of the observations, and then takes the mean of the remaining 80% of the observations.

R function

Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}

mean(df$var3, trim = 0.1)
[1] 12
mean(df$var3, trim = 0.05)
[1] 209.6
mean(df$var3, trim = 0.2)
[1] 4.333333

6.4 Mode

Textbook

The mode of a sample is the value that occurs most frequently.

R function

Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}

modeOf(df$var3)
[1] 5

6.5 Illustration

Textbook

  • If your data are nominal scale, you probably shouldn’t be using either the mean or the median.
  • If your data are ordinal scale, you’re more likely to want to use the median than the mean. The median only makes use of the order information in your data (i.e., which numbers are bigger), but doesn’t depend on the precise numbers involved.
  • For interval and ratio scale data, either one is generally acceptable. Which one you pick depends a bit on what you’re trying to achieve. The mean has the advantage that it uses all the information in the data (which is useful when you don’t have a lot of data), but it’s very sensitive to extreme values.


7 Measures of dispersion/spread

Lecture 4 (pp.16-19) Textbook (5.2, pp.123-131 [136-144])

7.1 Range

Textbook:

The range of a variable is very simple: it’s the biggest value minus the smallest value.

The range is the simplest way to quantify the notion of “variability”. However, if the data set has outliers, the range is influenced by these cases. For example, the dataset {-100; 2; 3; 4; 5; 6; 7; 8; 9; 10} has a range of 110, but if the outlier -110 were removed, we would have a range of only 8.

R function

Given var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}

range(df$var2)
[1] -101  101

7.2 Quantile/Percentile

Textbook:

They’re more commonly called percentiles.
Example: the 10th percentile of a data set is the smallest number x such that 10% of the data is less than x.

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
And var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}

quantile(x = df$var1, probs = 0.5)
50% 
5.5 
quantile(x = df$var1, probs = 0.1)
10% 
1.9 
quantile(x = df$var1, probs = c(0.25, 0.75))
 25%  75% 
3.25 7.75 
quantile(x = df$var2, probs = c(0.25, 0.75))
   25%    75% 
-10.75  10.75 

7.3 Quartile and Interquartile range

Textbook:

The interquartile range (IQR) is like the range, but instead of calculating the difference between the biggest and smallest value, it calculates the difference between the 25th quantile (1st quartile) and the 75th quantile (3nd quartile).

How to interpret the IQR: the interquartile range is the range spanned by the “middle half” of the data. That is, one quarter of the data falls below the 25th percentile, one quarter of the data is above the 75th percentile, leaving the “middle half” of the data lying in between the two. And the IQR is the range covered by that middle half.

The median of a data set is its 50th quantile (2nd quartile).

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
And var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}

IQR(x = df$var1)
[1] 4.5
IQR(x = df$var2)
[1] 21.5

7.4 Illustration


7.5 Deviation

Lecture slides:

A deviation is the difference between the mean and an actual data point.
Deviations can be calculated by taking each score and subtracting the mean from it:

Formula

\[ deviation = X_i - \bar{X} \]

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

df$var1 - mean(df$var1)
 [1] -4.5 -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5  4.5

7.6 Sum of squared errors

Lecture slides:

Deviations cancel out because some are positive and others negative. Therefore, we square each deviation. If we add these squared deviations we get the sum of squared errors (SS).

The sum of squares is a good measure of overall variability, but is dependent on the number of scores.

Formula

\[ SS = \sum_{i=1}^N (X - \bar{X})^2 \]

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

sum((df$var1 - mean(df$var1))^2)
[1] 82.5

7.7 Variance

Textbook:

We calculate the average variability by dividing by the number of scores (\(N-1\)). This value is called the variance (\(s^2\)).

The variance has one problem: it is measured in units squared, which isn’t a very meaningful metric.

Formula

\[ Var(X) = s^2 = \dfrac{1}{N-1}SS = \dfrac{1}{N-1}\sum_{i=1}^N (X - \bar{X})^2 \]

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

var(df$var1)
[1] 9.166667

7.8 Standard Deviation

Lecture slides:

We take the square root value of variance to obtain the standard deviation.

Textbook:

As a consequence, most of us just rely on a simple rule of thumb: in general, you should expect 68% of the data to fall within 1 standard deviation of the mean, 95% of the data to fall within 2 standard deviation of the mean, and 99.7% of the data to fall within 3 standard deviations of the mean. This rule tends to work pretty well most of the time, but it’s not exact: it’s actually calculated based on an assumption that the histogram is symmetric and “bell shaped”.

Formula

\[ \hat{\sigma} = \sqrt{s^2} = \sqrt{\frac{1}{N-1}\sum_{i=1}^N (X - \bar{X})^2} \]

R function

Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

sd(df$var1)
[1] 3.02765

7.9 Interpretation

  • The ‘fit’ of the mean to the data
  • The variability in the data
  • How well the mean represents the observed data
  • Error

7.10 Illustration

Same mean - different SD

Different mean - different SD


8 Measures of shape

Lecture 4 (pp.41-45) Textbook (5.3, pp.131-133 [144-146])

8.1 Skewness

Skewness is a measure of asymmetry.

8.1.1 Negative skewness

Textbook

If the data tend to have a lot of extreme small values (i.e., the lower tail is “longer” than the upper tail) and not so many extremely large values (left panel)


8.1.2 Positive skewness

Textbook

If there are more extremely large values than extremely small ones (right panel)


R function

Given var4: {-1, -2, -3, -4, 2, 2, 2, 2, 2, 2, 2, 2, 34, 34, 35, -23, -12, -1, 0, 0, 0, 0, 32, 35, 35, 35, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, -234, -23, -23, -25, -67, -78, -89}

skew(var4)
[1] -3.582078
hist(var4)


8.2 Kurtosis

Textbook

Kurtosis is a measure of the “pointiness” of a data set.

8.2.1 Platykurtic

Textbook

The data are not pointy enough, so the kurtosis is negative.


8.2.2 Mesokurtic

Textbook

The data are just pointy enough


8.2.3 Leptokurtic

Textbook

The data are too pointy, so the kurtosis is positive.


R function

Given var3 of df: {-1, -2, -3, -4, 2, 2, 2, 2, 2, 2, 2, 2, 34, 34, 35, -23, -12, -1, 0, 0, 0, 0, 32, 35, 35, 35, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, -234, -23, -23, -25, -67, -78, -89}

kurtosi(var4)
[1] 15.98769
hist(var4)


8.3 Illustration

9 R - graphing

9.1 Scatterplots

df <- data.frame(bmi = c(15, 21, 25, 29, 35),
                 mood = c(-1, 0, 3, 5, 12)
                 )

plot(x = df$bmi, 
     y = df$mood,
     main = "Scatter plot - Relationship between BMI and Mood",  # graph title
     xlab = "BMI",                                               # x-axis title
     ylab = "Mood",                                              # y-axis title
     pch = 20,                                                   # marker shape  
     col = "blue")                                               # marker color 

9.2 Line graph

df <- data.frame(bmi = c(15, 21, 25, 29, 35),
                 mood = c(-1, 0, 3, 5, 12)
                 )

plot(x = df$bmi, 
     y = df$mood,
     type = "b",
     main = "Scatter plot - Relationship between BMI and Mood",  # graph title
     xlab = "BMI",                                               # x-axis title
     ylab = "Mood",                                              # y-axis title
     pch = 20,                                                   # marker shape  
     col = "blue")                                               # marker color 

9.3 Box plot

df <- data.frame(bmi = c(18, 21, 27, 22, 33, 35),
                 gender = c("male", "female", "female", "male", "female", "male")
                 )

boxplot(
  formula = bmi ~ gender,
  data = df,
  main = "Box plot - BMI by gender",                          # graph title
  xlab = "Gender",                                            # x-axis title
  ylab = "BMI",                                               # y-axis title
  col = "#27aae1")                                            # box color 

9.4 Histogram

df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22),
                 gender = c("male", "female", "female", "male", "female", "male", "male", "female", "male", "female", "male", "female", "male", "male")
                 )
hist(df$bmi,
     main = "Histogram - BMI",
     xlab = "BMI",
     col = "#009688")

9.5 Bar

df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22),
                 bmi_lv = c(2,2,2,3,2,3,1,3,2,2,3,1,3,2)
                 )
                 
barplot(height = table(df$bmi_lv),
        names.arg = c("Low", "Medium", "High"),
        xlab = "BMI",
        ylab = "Frequency",
        main = "Barplot - Frequency of BMI",
        col = c("#BDBDBD", "#009933", "#900c3f")
        )

9.6 Bar plots of group means with 95% CI

\[ CI_{95} = \bar{X} \pm \left( 1.96 \times \dfrac{\sigma}{\sqrt{N}} \right) \]

df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22, 14, 15),
                 gender = c("male", "female", "female", "female", "female", "male", "male", "female", "male", "female", "male", "female", "male", "male", "female", "male"),
                 bmi_lv = c("medium","medium","medium","high","medium","high","low","high","medium","medium","high","low","high","medium", "low", "low")
                 )
                 
bars(data = df,
     formula = bmi ~ gender,
     xLabels = c("Female", "Male"),
     yLabel = "BMI",
     main = "Barplot of group means with CI - BMI by gender"
     )

lsr::bars(
  data = df,
  formula = bmi ~ bmi_lv + gender,
  xLabels = c("High", "Low", "Medium"),
  yLabel = "BMI",
     main = "Barplot of group means with CI - BMI by gender and level"
)


9.7 Good graphs

  • Show the data.
  • Induce the reader to think about the data being presented (rather than some other aspect of the graph).
  • Avoid distorting the data.
  • Present many numbers with minimum ink.
  • Make large data sets (assuming you have one) coherent.
  • Encourage the reader to compare different pieces of data.
  • Reveal data.

9.8 Report in APA format

Need to report: - mean and standard deviation (and/or ranges) - sample (group) size - distribution characteristics if there is some concern - Spacing and italics matter!

Example:

The age of participants ranged from 18 to 70 years (\(M = 25.5\), \(SD = 7.94\)). Age was non-normally distributed, with skewness of \(1.87\) (\(SE = 0.05\)) and kurtosis of \(3.93\) (\(SE = 0.10\))


10 Confidence intervals

10.1 Meaning

Lecture slides

  • Need to quantify the amount of uncertainty that is associated with our estimates of population parameters.
  • Typically we want to say there is a 95% chance the true population mean lies within a certain window of values (e.g., between 105 and 114).
  • There is a 95% chance that a normallydistributed quantity lies within 2 standard deviations of the mean.

\[ CI_{95} = \bar{X} \pm \left( 1.96 \times \dfrac{\sigma}{\sqrt{N}} \right) \]

10.2 Interpretation

Lecture slides

  • If we replicated the experiment over and over again and computed a 95% confidence interval for each replication, then 95% of those intervals would contain the true mean.

  • If the null hypothesis value lies outside the 95% CI the p value is less than .05, so p is less than the significance level and we reject the null hypothesis.

  • If the null hypothesis value lies inside the 95% CI, the p value is greater than .05, so we do not reject the null.


11 Sample and population

  • \(\bar{X}\) is sample mean. It can be calculated from raw data.
  • \(\mu\) is true population mean. It is almost never known.
  • \(\hat{\mu}\) is estimated population mean. \(\hat{\mu} = \bar{X}\)

11.1 Central limit theorem

  • Given a sufficiently sized sample, the following claims are typically true:
    • The mean of the sampling distribution is the same as the mean of the population
    • The standard deviation of the sampling distribution (the standard error) gets smaller as the sample size increases
    • The shape of the sampling distribution becomes normal as the sample size increases.

12 Hypothesis testing

Textbook

Statistical hypotheses must be mathematically precise, and they must correspond to specific claims about the characteristics of the data generating mechanism (i.e., the “population”)

Lecture slides

The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results from a research study.

Hypothesis testing is a technique often used to help determine whether - a specific treatment/variable/experimental manipulation has an effect on the individuals in a population.

In its most abstract form, hypothesis testing really a very simple idea: the researcher has some theory about the world, and wants to determine whether or not the data actually support that theory.

12.1 Null vs alternative hypotheses

Textbook

The statistical hypothesis (the “null” hypothesis, \(H_0\)) that corresponds to the exact opposite of what I want to believe, and then focus exclusively on that, almost to the neglect of the thing I’m actually interested in (which is now called the “alternative” hypothesis, \(H_1\)).

The important thing to recognise is that the goal of a hypothesis test is not to show that the alternative hypothesis is (probably) true; the goal is to show that the null hypothesis is (probably) false.

Lecture slides

The null hypothesis (H0) is a claim of “no difference in the population” - Or that an effect is zero

The null hypothesis, H0, typically states that the independent variable/treatment has no effect (no change, no difference). According to the null hypothesis, the population mean after treatment is the same is it was before treatment.

The \(\alpha\) level establishes a criterion, or “cut-off”, for making a decision about the null hypothesis. Convention is .05.

The critical region consists of outcomes that are very unlikely to occur if the null hypothesis is true.

The alternative hypothesis (Ha) claims “H0 is false”

12.2 Test statistic

Lecture slides

The test statistic (e.g., a z-score) forms a ratio comparing the obtained difference between the sample mean and the hypothesized population mean versus the amount of difference we would expect without any treatment effect (the standard error).

A large value for the test statistic shows that the obtained mean difference is more than would be expected if there is no effect.

12.3 p-values

Lecture slides

  • The P-value answers the question: What is the probability of the observed test statistic or one more extreme IF H0 is true?

  • This corresponds to the area under the curve in the tail of the Standard Normal distribution beyond the zstat.

  • Convert z statistics to P-value : For Ha: μ > μ0 ⇒P = Pr(Z > zstat) = right-tail beyond zstat For Ha: μ < μ0 ⇒P = Pr(Z < zstat) = left tail beyond zstat For Ha: μ ≠ μ0 ⇒P = 2 × one-tailed P-value

  • Conventions:

    • p > .05: Fail to reject the null hypothesis
    • p < .05: Reject the null hypothesis
  • One-sided Ha ⇒ AUC in tail beyond zstat

  • Two-sided Ha ⇒ consider potential deviations in both directions ⇒ double the one-sided P-value

12.4 Interpretation

Lecture slides

  • If it is large enough to be in the critical region, we conclude that the difference is significant or that the treatment/manipulation has a significant effect. \(\rightarrow\) In this case we reject the null hypothesis.
  • If the mean difference is relatively small, then the test statistic will have a low value (close to 0). \(\rightarrow\) In this case, we conclude that the evidence from the sample is not sufficient, and the decision is fail to reject the null hypothesis.


12.5 Types of statistical tests

12.6 Parametric & non-parametric test

https://help.xlstat.com/s/article/what-is-the-difference-between-a-parametric-and-a-nonparametric-test?language=en_US

What is the difference between a parametric and a nonparametric test?

Parametric tests assume underlying statistical distributions in the data. Therefore, several conditions of validity must be met so that the result of a parametric test is reliable. For example, Student’s t-test for two independent samples is reliable only if each sample follows a normal distribution and if sample variances are homogeneous.

Nonparametric tests do not rely on any distribution. They can thus be applied even if parametric conditions of validity are not met.

Parametric tests often have nonparametric equivalents. You will find different parametric tests with their equivalents when they exist in this grid.

What is the advantage of using a nonparametric test? Nonparametric tests are more robust than parametric tests. In other words, they are valid in a broader range of situations (fewer conditions of validity).

What is the advantage of using a parametric test? The advantage of using a parametric test instead of a nonparametric equivalent is that the former will have more statistical power than the latter. In other words, a parametric test is more able to lead to a rejection of H0. Most of the time, the p-value associated to a parametric test will be lower than the p-value associated to a nonparametric equivalent that is run on the same data.

12.7 Paired & unpaired samples

https://socratic.org/questions/what-is-a-paired-and-unpaired-t-test-what-are-the-differences

T-tests are useful for comparing the means of two samples. There are two types: paired and unpaired.

Paired means that both samples consist of the same test subjects. A paired t-test is equivalent to a one-sample t-test.

Unpaired means that both samples consist of distinct test subjects. An unpaired t-test is equivalent to a two-sample t-test.

For example, if you wanted to conduct an experiment to see how drinking an energy drink increases heart rate, you could do it two ways.

The “paired” way would be to measure the heart rate of 10 people before they drink the energy drink and then measure the heart rate of the same 10 people after drinking the energy drink. These two samples consist of the same test subjects, so you would perform a paired t-test on the means of both samples.

The “unpaired” way would be to measure the heart rate of 10 people before drinking an energy drink and then measure the heart rate of some other group of people who have drank energy drinks. These two samples consist of different test subjects, so you would perform an unpaired t-test on the means of both samples.

12.8 Post-hoc comparison

https://statisticsbyjim.com/anova/post-hoc-tests-anova/

Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least three group means, statistically significant results indicate that not all of the group means are equal. However, ANOVA results do not identify which particular differences between pairs of means are significant. Use post hoc tests to explore differences between multiple group means while controlling the experiment-wise error rate.

https://www.graphpad.com/support/faqid/1091/

The ANOVA calculations test the null hypothesis that all groups of data really are sampled from distributions that have the same mean (so any observed differences are just due to coincidences of random sampling). Testing this hypothesis is rarely the reason you did the experiment. Instead, you want to look within the data, comparing this group with that group… So you want to make multiple comparisons. There are several ways you can do this:

  • All possible comparisons, including averages of groups. So you might compare the average of groups A and B with the average of groups C, D and E. Or compare group A, to the average of B-F. Scheffe’s test (not currently offered by any GraphPad program) does this.
  • All possible pairwise comparisons. Compare the mean of every group with the mean of every other group. Prism and InStat can do these comparisons with Tukey or Newman-Keuls comparisons.
  • All against a control. If group A is the control, you may only want to compare A with B, A with C, A with D… but not compare B with C or C with D. Prism and InStat do this with Dunnett’s test
  • Only a few comparisons based on your scientific goals. So you might want to compare A with B and B with C and that’s it. Prism and InStat use Bonferroni’s test for this.

12.9 Effect size

https://www.simplypsychology.org/effect-size.html

Effect size is a quantitative measure of the magnitude of the experimenter effect. The larger the effect size the stronger the relationship between two variables.

You can look at the effect size when comparing any two groups to see how substantially different they are. Typically, research studies will comprise an experimental group and a control group. The experimental group may be an intervention or treatment which is expected to effect a specific outcome.

For example, we might want to know the effect of a therapy on treating depression. The effect size value will show us if the therapy as had a small, medium or large effect on depression.

https://www.physport.org/recommendations/Entry.cfm?ID=93385

Effect size is not the same as statistical significance: significance tells how likely it is that a result is due to chance, and effect size tells you how important the result is.

12.10 Interaction effect

https://methods.sagepub.com/reference/encyclopedia-of-survey-research-methods/n226.xml

An interaction effect is the simultaneous effect of two or more independent variables on at least one dependent variable in which their joint effect is significantly greater (or significantly less) than the sum of the parts. The presence of interaction effects in any kind of survey research is important because it tells researchers how two or more independent variables work together to impact the dependent variable. Including an interaction term effect in an analytic model provides the researcher with a better representation and understanding of the relationship between the dependent and independent variables. Further, it helps explain more of the variability in the dependent variable.