Lecture 1 (p.32) + Textbook (2.6, pp.22-25 [35-38])
Lecture slides:
The extent to which you are able draw the correct conclusions about the causal relationships between variables.
Understanding internal and external validity (https://www.verywellmind.com/internal-and-external-validity-4584479)
Internal validity is the extent to which a study establishes a trustworthy cause-and-effect relationship between a treatment and an outcome. It also reflects that a given study makes it possible to eliminate alternative explanations for a finding. For example, if you implement a smoking cessation program with a group of individuals, how sure can you be that any improvement seen in the treatment group is due to the treatment that you administered?
Internal validity is not a “yes or no” type of concept. Instead, we consider how confident we can be with the findings of a study, based on whether it avoids traps that may make the findings questionable.
As a brief summary, you can only assume cause-and-effect when you meet the following three criteria in your study:
- The cause preceded the effect in terms of time.
- The cause and effect vary together.
- There are no other likely explanations for this relationship that you have observed.
Factors That Improve Internal Validity:
- Randomization.
- Random selection of participants.
- Blinding.
- Experimental manipulation.
- Study protocol.
Factors That Threaten Internal Validity:
- Confounding.
- Historical events. - Maturation.
- Testing.
- Instrumentation.
- Statistical regression.
- Attrition.
- Diffusion.
- Experimenter bias.
Lecture slides:
The generalizability of your findings. To what extent do you expect to see the same pattern of results in “real life” as you saw in your study.
Understanding internal and external validity (https://www.verywellmind.com/internal-and-external-validity-4584479)
External validity refers to how well the outcome of a study can be expected to apply to other settings. In other words, this type of validity refers to how generalizable the findings are. For instance, do the findings apply to other people, settings, situations, and time periods?
Factors that Improve External Validity:
- Inclusion and exclusion criteria.
- Psychological realism.
- Replication.
- Field experiments.
- Reprocessing or calibration.
Factors That Threaten External Validity:
- Situational factors.
- Pre- and post-test effects.
- Sample features.
- Selection bias.
Lecture slides:
Whether you’re actually measuring what you want to be measuring.
The four types of validity https://www.scribbr.com/methodology/types-of-validity/
A construct refers to a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.
Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organizations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.
Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?
To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.
Lecture slides:
Whether or not a measure “looks like” it’s doing what it’s supposed to.
Face Validity https://methods.sagepub.com/Reference//encyc-of-research-design/n147.xml
It requires investigators to step outside of their current research context and assess their observations from a commonsense perspective. A typical application of face validity occurs when researchers obtain assessments from current or future individuals who will be directly affected by programs premised on their research findings. An example of testing for face validity is the assessment of a proposed new patient tracking system by obtaining observations from local community health care providers who will be responsible for implementing the program and getting feedback on how they think the new program may work.
Lecture slides:
The entire set up of the study should closely approximate the real world scenario that is being investigated.
What is Ecological Validity? https://www.statisticshowto.com/ecological-validity/
Ecological Validity is a specific type of external validity. External validity refers to your ability to generalize your experimental results across populations, places, and time; ecological validity is limited to how the experimental results apply to today’s society.
Textbook:
A confound is an additional, often unmeasured variable that turns out to be related to both the predictors and the outcomes. The existence of confounds threatens the internal validity of the study because you can’t tell whether the predictor causes the outcome, or if the confounding variable causes it, etc.
Textbook:
A result is said to be “artifactual” if it only holds in the special situation that you happened to test in your study. The possibility that your result is an artifact describes a threat to your external validity, because it raises the possibility that you can’t generalise your results to the actual population that you care about.
Lecture 1 (p.28) Textbook (2.2, pp.14-18 [27-31])
Lecture slides:
Entities are divided into distinct categories.
Lecture slides:
There are only two categories.
Example: gender (male or female)
Lecture slides:
There are more than two categories.
Example: Flower types (Tulip, Rose, Lavender)
Lecture slides:
The same as a nominal variable but the categories have a logical order.
Example: Exam result (Fail, Pass)
Lecture slides:
Entities get a distinct score.
Lecture slides:
Equal intervals on the variable represent equal differences in the property being measured.
Example: the difference between 6 and 8 is equivalent to the difference between 13 and 15.
Textbook:
The differences between the numbers are interpretable.
Lecture slides:
The same as an interval variable, but the ratios of scores on the scale must also make sense and have true 0 value.
Example: e.g. a score of 16 on an anxiety scale means that the person is, in reality, twice as anxious as someone scoring 8.
Temperature is not ratio scale.
The \(0^o\) does not mean “no temperature at all”: it actually means “the temperature at which water freezes”. As a consequence, it becomes pointless to try to multiply and divide temperatures. It is wrong to say that \(20^o\) is twice as hot as \(10^o\), just as it is weird and meaningless to try to claim that \(20^o\) is negative two times as hot as \(10^o\).
Textbook:
Notice that multiplication and division also make sense here too.
Lecture 1 (p.29) Textbook (2.3, p.19 [32])
Lecture slides:
The ability of the measure to produce the same results under the same conditions.
Lecture slides:
The ability of a measure to produce consistent results when the same entities are tested at two different points in time.
Textbook:
This relates to consistency over time: if we repeat the measurement at a later date, do we get a the same answer?
Lecture slides:
Consistency across people. Do they produce the same answer?
Textbook:
This relates to consistency across people: if someone else repeats the measurement (e.g., someone else rates my intelligence) will they produce the same answer?
Lecture slides:
Do different measures that are supposed to measure the same thing actually measure it the same? (Two different eye trackers).
Textbook:
This relates to consistency across theoretically-equivalent measurements: if I use a dierent set of bathroom scales to measure my weight, does it give the same answer?
Lecture slides:
Do things that are supposed to measure the same thing actually measure it? (Multiple questions measuring IQ).
Textbook:
If a measurement is constructed from lots of different parts that perform similar functions (e.g., a personality questionnaire result is added up across several questions) do the individual parts tend to give similar answers.
Lecture slides:
R codes
weight <- 52.3 # numeric variable
height <- 152 # numeric variable
country <- "Vietnam" # character variable
winter <- FALSE # logical variable
cool <- TRUE # logical variable
R codes
# + addition
10 + 5
[1] 15
2.34 + 234.23
[1] 236.57
1 + 0
[1] 1
1 + NA
[1] NA
# - subtraction
10 - 5
[1] 5
2.34 - 234.23
[1] -231.89
1 - 0
[1] 1
1 - NA
[1] NA
# * multiplication
10 * 5
[1] 50
2.34 * 234.23
[1] 548.0982
1 * 0
[1] 0
1 * NA
[1] NA
# / division
10 / 5
[1] 2
2.34 / 234.23
[1] 0.009990181
1 / 0
[1] Inf
1 * NA
[1] NA
# ^ taking powers
10 ^ 5
[1] 100000
2.34 ^ 234.23
[1] 302991385600521639242888846260608466428208844040262028844280606688664022804480620048006
1 ^ 0
[1] 1
1 ^ NA
[1] 1
R codes
# == equality
1 == 2
[1] FALSE
2 + 3 == 5
[1] TRUE
# != inequality
1 != 2
[1] TRUE
2 + 3 != 5
[1] FALSE
# > greater than
1 > 2
[1] FALSE
2 + 3 > 5
[1] FALSE
# >= greater than or equal to
1 >= 2
[1] FALSE
2 + 3 >= 5
[1] TRUE
# < less than
1 < 2
[1] TRUE
2 + 3 < 5
[1] FALSE
# <= less than or equal to
1 <= 2
[1] TRUE
2 + 3 <= 5
[1] TRUE
# & AND
1 == 2 & 2 + 3 == 5
[1] FALSE
1^0 == 1 & 2 + 3 == 5
[1] TRUE
1^0 == 0 & 1*0 == 1
[1] FALSE
# | OR
1 == 2 | 2 + 3 == 5
[1] TRUE
1^0 == 1 | 2 + 3 == 5
[1] TRUE
1^0 == 0 | 1*0 == 1
[1] FALSE
# ! NOT
!(1 == 2 & 2 + 3 == 5)
[1] TRUE
!(1^0 == 1 & 2 + 3 == 5)
[1] FALSE
!(1^0 == 0 & 1*0 == 1)
[1] TRUE
!(1 == 2 | 2 + 3 == 5)
[1] FALSE
!(1^0 == 1 | 2 + 3 == 5)
[1] FALSE
!(1^0 == 0 | 1*0 == 1)
[1] TRUE
Lecture slides
Vectors are variables that store multiple pieces of information
Create vectors using c(). c() combines a set of values, and stores them as a vector
R codes
##
# numeric vector
weights <- c(45, 59, 36, 78)
# character vector
names <- c("Donald Trump", "Selena Gomez", "Lionel Messi", "Steve Jobs")
# logical vector
checking <- c(TRUE, FALSE, FALSE, FALSE)
R codes
weights[1]
[1] 45
names[c(2,4)]
[1] "Selena Gomez" "Steve Jobs"
checking[2]
[1] FALSE
checking[2:4]
[1] FALSE FALSE FALSE
Lecture slides
R codes
df <- data.frame(name = names,
weight = weights,
result = checking)
# data frame df have 3 columns (3 vectors or 3 variables) and 4 rows.
df
name weight result
1 Donald Trump 45 TRUE
2 Selena Gomez 59 FALSE
3 Lionel Messi 36 FALSE
4 Steve Jobs 78 FALSE
Lecture 4 (pp.7-11) Textbook (5.1, pp.114-122 [127-135])
Textbook
The mean of a set of observations is just a normal, old-fashioned average: add all of the values up, and then divide by the total number of values.
Lecture slides
The mean is also the value from which the (squared) scores deviate least (it has the least error).
Formula
\[ mean(X) = \bar{X} = \frac{1}{N}\sum_{i=1}^{N}X_i \]
R function
Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}
mean(df$var3)
[1] 209.6
Textbook
The median of a set of observations is just the middle value.
R function
Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}
median(df$var3)
[1] 4.5
Textbook
To calculate a trimmed mean, what you do is “discard” the most extreme examples on both ends (i.e., the largest and the smallest), and then take the mean of everything else. The goal is to preserve the best characteristics of the mean and the median: just like a median, you aren’t highly influenced by extreme outliers, but like the mean, you “use” more than one of the observations.
Generally, we describe a trimmed mean in terms of the percentage of observation on either side that are discarded. So, for instance, a 10% trimmed mean discards the largest 10% of the observations and the smallest 10% of the observations, and then takes the mean of the remaining 80% of the observations.
R function
Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}
mean(df$var3, trim = 0.1)
[1] 12
mean(df$var3, trim = 0.05)
[1] 209.6
mean(df$var3, trim = 0.2)
[1] 4.333333
Textbook
The mode of a sample is the value that occurs most frequently.
R function
Given var3 of df: {-1000, -10, 2, 3, 4, 5, 5, 7, 80, 3000}
modeOf(df$var3)
[1] 5
Textbook
Lecture 4 (pp.16-19) Textbook (5.2, pp.123-131 [136-144])
Textbook:
The range of a variable is very simple: it’s the biggest value minus the smallest value.
The range is the simplest way to quantify the notion of “variability”. However, if the data set has outliers, the range is influenced by these cases. For example, the dataset {-100; 2; 3; 4; 5; 6; 7; 8; 9; 10} has a range of 110, but if the outlier -110 were removed, we would have a range of only 8.
R function
Given var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}
range(df$var2)
[1] -101 101
Textbook:
They’re more commonly called percentiles.
Example: the 10th percentile of a data set is the smallest number x such that 10% of the data is less than x.
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
And var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}
quantile(x = df$var1, probs = 0.5)
50%
5.5
quantile(x = df$var1, probs = 0.1)
10%
1.9
quantile(x = df$var1, probs = c(0.25, 0.75))
25% 75%
3.25 7.75
quantile(x = df$var2, probs = c(0.25, 0.75))
25% 75%
-10.75 10.75
Textbook:
The interquartile range (IQR) is like the range, but instead of calculating the difference between the biggest and smallest value, it calculates the difference between the 25th quantile (1st quartile) and the 75th quantile (3nd quartile).
How to interpret the IQR: the interquartile range is the range spanned by the “middle half” of the data. That is, one quarter of the data falls below the 25th percentile, one quarter of the data is above the 75th percentile, leaving the “middle half” of the data lying in between the two. And the IQR is the range covered by that middle half.
The median of a data set is its 50th quantile (2nd quartile).
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
And var2 of df: {-101, -100, -11, -10, -1, 1, 10, 11, 100, 101}
IQR(x = df$var1)
[1] 4.5
IQR(x = df$var2)
[1] 21.5
Lecture slides:
A deviation is the difference between the mean and an actual data point.
Deviations can be calculated by taking each score and subtracting the mean from it:
Formula
\[ deviation = X_i - \bar{X} \]
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
df$var1 - mean(df$var1)
[1] -4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5
Lecture slides:
Deviations cancel out because some are positive and others negative. Therefore, we square each deviation. If we add these squared deviations we get the sum of squared errors (SS).
The sum of squares is a good measure of overall variability, but is dependent on the number of scores.
Formula
\[ SS = \sum_{i=1}^N (X - \bar{X})^2 \]
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
sum((df$var1 - mean(df$var1))^2)
[1] 82.5
Textbook:
We calculate the average variability by dividing by the number of scores (\(N-1\)). This value is called the variance (\(s^2\)).
The variance has one problem: it is measured in units squared, which isn’t a very meaningful metric.
Formula
\[ Var(X) = s^2 = \dfrac{1}{N-1}SS = \dfrac{1}{N-1}\sum_{i=1}^N (X - \bar{X})^2 \]
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
var(df$var1)
[1] 9.166667
Lecture slides:
We take the square root value of variance to obtain the standard deviation.
Textbook:
As a consequence, most of us just rely on a simple rule of thumb: in general, you should expect 68% of the data to fall within 1 standard deviation of the mean, 95% of the data to fall within 2 standard deviation of the mean, and 99.7% of the data to fall within 3 standard deviations of the mean. This rule tends to work pretty well most of the time, but it’s not exact: it’s actually calculated based on an assumption that the histogram is symmetric and “bell shaped”.
Formula
\[ \hat{\sigma} = \sqrt{s^2} = \sqrt{\frac{1}{N-1}\sum_{i=1}^N (X - \bar{X})^2} \]
R function
Given var1 of df: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
sd(df$var1)
[1] 3.02765
Same mean - different SD
Different mean - different SD
Lecture 4 (pp.41-45) Textbook (5.3, pp.131-133 [144-146])
Skewness is a measure of asymmetry.
Textbook
If the data tend to have a lot of extreme small values (i.e., the lower tail is “longer” than the upper tail) and not so many extremely large values (left panel)
Textbook
If there are more extremely large values than extremely small ones (right panel)
R function
Given var4: {-1, -2, -3, -4, 2, 2, 2, 2, 2, 2, 2, 2, 34, 34, 35, -23, -12, -1, 0, 0, 0, 0, 32, 35, 35, 35, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, -234, -23, -23, -25, -67, -78, -89}
skew(var4)
[1] -3.582078
hist(var4)
Textbook
Kurtosis is a measure of the “pointiness” of a data set.
Textbook
The data are not pointy enough, so the kurtosis is negative.
Textbook
The data are just pointy enough
Textbook
The data are too pointy, so the kurtosis is positive.
R function
Given var3 of df: {-1, -2, -3, -4, 2, 2, 2, 2, 2, 2, 2, 2, 34, 34, 35, -23, -12, -1, 0, 0, 0, 0, 32, 35, 35, 35, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, -234, -23, -23, -25, -67, -78, -89}
kurtosi(var4)
[1] 15.98769
hist(var4)
df <- data.frame(bmi = c(15, 21, 25, 29, 35),
mood = c(-1, 0, 3, 5, 12)
)
plot(x = df$bmi,
y = df$mood,
main = "Scatter plot - Relationship between BMI and Mood", # graph title
xlab = "BMI", # x-axis title
ylab = "Mood", # y-axis title
pch = 20, # marker shape
col = "blue") # marker color
df <- data.frame(bmi = c(15, 21, 25, 29, 35),
mood = c(-1, 0, 3, 5, 12)
)
plot(x = df$bmi,
y = df$mood,
type = "b",
main = "Scatter plot - Relationship between BMI and Mood", # graph title
xlab = "BMI", # x-axis title
ylab = "Mood", # y-axis title
pch = 20, # marker shape
col = "blue") # marker color
df <- data.frame(bmi = c(18, 21, 27, 22, 33, 35),
gender = c("male", "female", "female", "male", "female", "male")
)
boxplot(
formula = bmi ~ gender,
data = df,
main = "Box plot - BMI by gender", # graph title
xlab = "Gender", # x-axis title
ylab = "BMI", # y-axis title
col = "#27aae1") # box color
df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22),
gender = c("male", "female", "female", "male", "female", "male", "male", "female", "male", "female", "male", "female", "male", "male")
)
hist(df$bmi,
main = "Histogram - BMI",
xlab = "BMI",
col = "#009688")
df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22),
bmi_lv = c(2,2,2,3,2,3,1,3,2,2,3,1,3,2)
)
barplot(height = table(df$bmi_lv),
names.arg = c("Low", "Medium", "High"),
xlab = "BMI",
ylab = "Frequency",
main = "Barplot - Frequency of BMI",
col = c("#BDBDBD", "#009933", "#900c3f")
)
\[ CI_{95} = \bar{X} \pm \left( 1.96 \times \dfrac{\sigma}{\sqrt{N}} \right) \]
df <- data.frame(bmi = c(18, 21, 27, 35, 22, 33, 13, 34, 22, 22, 33, 13, 34, 22, 14, 15),
gender = c("male", "female", "female", "female", "female", "male", "male", "female", "male", "female", "male", "female", "male", "male", "female", "male"),
bmi_lv = c("medium","medium","medium","high","medium","high","low","high","medium","medium","high","low","high","medium", "low", "low")
)
bars(data = df,
formula = bmi ~ gender,
xLabels = c("Female", "Male"),
yLabel = "BMI",
main = "Barplot of group means with CI - BMI by gender"
)
lsr::bars(
data = df,
formula = bmi ~ bmi_lv + gender,
xLabels = c("High", "Low", "Medium"),
yLabel = "BMI",
main = "Barplot of group means with CI - BMI by gender and level"
)
Need to report: - mean and standard deviation (and/or ranges) - sample (group) size - distribution characteristics if there is some concern - Spacing and italics matter!
Example:
The age of participants ranged from 18 to 70 years (\(M = 25.5\), \(SD = 7.94\)). Age was non-normally distributed, with skewness of \(1.87\) (\(SE = 0.05\)) and kurtosis of \(3.93\) (\(SE = 0.10\))
Lecture slides
\[ CI_{95} = \bar{X} \pm \left( 1.96 \times \dfrac{\sigma}{\sqrt{N}} \right) \]
Lecture slides
If we replicated the experiment over and over again and computed a 95% confidence interval for each replication, then 95% of those intervals would contain the true mean.
If the null hypothesis value lies outside the 95% CI the p value is less than .05, so p is less than the significance level and we reject the null hypothesis.
If the null hypothesis value lies inside the 95% CI, the p value is greater than .05, so we do not reject the null.
Textbook
Statistical hypotheses must be mathematically precise, and they must correspond to specific claims about the characteristics of the data generating mechanism (i.e., the “population”)
Lecture slides
The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results from a research study.
Hypothesis testing is a technique often used to help determine whether - a specific treatment/variable/experimental manipulation has an effect on the individuals in a population.
In its most abstract form, hypothesis testing really a very simple idea: the researcher has some theory about the world, and wants to determine whether or not the data actually support that theory.
Textbook
The statistical hypothesis (the “null” hypothesis, \(H_0\)) that corresponds to the exact opposite of what I want to believe, and then focus exclusively on that, almost to the neglect of the thing I’m actually interested in (which is now called the “alternative” hypothesis, \(H_1\)).
The important thing to recognise is that the goal of a hypothesis test is not to show that the alternative hypothesis is (probably) true; the goal is to show that the null hypothesis is (probably) false.
Lecture slides
The null hypothesis (H0) is a claim of “no difference in the population” - Or that an effect is zero
The null hypothesis, H0, typically states that the independent variable/treatment has no effect (no change, no difference). According to the null hypothesis, the population mean after treatment is the same is it was before treatment.
The \(\alpha\) level establishes a criterion, or “cut-off”, for making a decision about the null hypothesis. Convention is .05.
The critical region consists of outcomes that are very unlikely to occur if the null hypothesis is true.
The alternative hypothesis (Ha) claims “H0 is false”
Lecture slides
The test statistic (e.g., a z-score) forms a ratio comparing the obtained difference between the sample mean and the hypothesized population mean versus the amount of difference we would expect without any treatment effect (the standard error).
A large value for the test statistic shows that the obtained mean difference is more than would be expected if there is no effect.
Lecture slides
The P-value answers the question: What is the probability of the observed test statistic or one more extreme IF H0 is true?
This corresponds to the area under the curve in the tail of the Standard Normal distribution beyond the zstat.
Convert z statistics to P-value : For Ha: μ > μ0 ⇒P = Pr(Z > zstat) = right-tail beyond zstat For Ha: μ < μ0 ⇒P = Pr(Z < zstat) = left tail beyond zstat For Ha: μ ≠ μ0 ⇒P = 2 × one-tailed P-value
Conventions:
One-sided Ha ⇒ AUC in tail beyond zstat
Two-sided Ha ⇒ consider potential deviations in both directions ⇒ double the one-sided P-value
Lecture slides
What is the difference between a parametric and a nonparametric test?
Parametric tests assume underlying statistical distributions in the data. Therefore, several conditions of validity must be met so that the result of a parametric test is reliable. For example, Student’s t-test for two independent samples is reliable only if each sample follows a normal distribution and if sample variances are homogeneous.
Nonparametric tests do not rely on any distribution. They can thus be applied even if parametric conditions of validity are not met.
Parametric tests often have nonparametric equivalents. You will find different parametric tests with their equivalents when they exist in this grid.
What is the advantage of using a nonparametric test? Nonparametric tests are more robust than parametric tests. In other words, they are valid in a broader range of situations (fewer conditions of validity).
What is the advantage of using a parametric test? The advantage of using a parametric test instead of a nonparametric equivalent is that the former will have more statistical power than the latter. In other words, a parametric test is more able to lead to a rejection of H0. Most of the time, the p-value associated to a parametric test will be lower than the p-value associated to a nonparametric equivalent that is run on the same data.
https://socratic.org/questions/what-is-a-paired-and-unpaired-t-test-what-are-the-differences
T-tests are useful for comparing the means of two samples. There are two types: paired and unpaired.
Paired means that both samples consist of the same test subjects. A paired t-test is equivalent to a one-sample t-test.
Unpaired means that both samples consist of distinct test subjects. An unpaired t-test is equivalent to a two-sample t-test.
For example, if you wanted to conduct an experiment to see how drinking an energy drink increases heart rate, you could do it two ways.
The “paired” way would be to measure the heart rate of 10 people before they drink the energy drink and then measure the heart rate of the same 10 people after drinking the energy drink. These two samples consist of the same test subjects, so you would perform a paired t-test on the means of both samples.
The “unpaired” way would be to measure the heart rate of 10 people before drinking an energy drink and then measure the heart rate of some other group of people who have drank energy drinks. These two samples consist of different test subjects, so you would perform an unpaired t-test on the means of both samples.
https://statisticsbyjim.com/anova/post-hoc-tests-anova/
Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least three group means, statistically significant results indicate that not all of the group means are equal. However, ANOVA results do not identify which particular differences between pairs of means are significant. Use post hoc tests to explore differences between multiple group means while controlling the experiment-wise error rate.
https://www.graphpad.com/support/faqid/1091/
The ANOVA calculations test the null hypothesis that all groups of data really are sampled from distributions that have the same mean (so any observed differences are just due to coincidences of random sampling). Testing this hypothesis is rarely the reason you did the experiment. Instead, you want to look within the data, comparing this group with that group… So you want to make multiple comparisons. There are several ways you can do this:
https://www.simplypsychology.org/effect-size.html
Effect size is a quantitative measure of the magnitude of the experimenter effect. The larger the effect size the stronger the relationship between two variables.
You can look at the effect size when comparing any two groups to see how substantially different they are. Typically, research studies will comprise an experimental group and a control group. The experimental group may be an intervention or treatment which is expected to effect a specific outcome.
For example, we might want to know the effect of a therapy on treating depression. The effect size value will show us if the therapy as had a small, medium or large effect on depression.
https://www.physport.org/recommendations/Entry.cfm?ID=93385
Effect size is not the same as statistical significance: significance tells how likely it is that a result is due to chance, and effect size tells you how important the result is.
https://methods.sagepub.com/reference/encyclopedia-of-survey-research-methods/n226.xml
An interaction effect is the simultaneous effect of two or more independent variables on at least one dependent variable in which their joint effect is significantly greater (or significantly less) than the sum of the parts. The presence of interaction effects in any kind of survey research is important because it tells researchers how two or more independent variables work together to impact the dependent variable. Including an interaction term effect in an analytic model provides the researcher with a better representation and understanding of the relationship between the dependent and independent variables. Further, it helps explain more of the variability in the dependent variable.