—title:‘Weekly Lab Homework Assignment: Measurement Errors in Psychological Research’ author:“Nat Maul” date:“17 February, 2025” date_time:<- as.POSIXct(“2025-02-16 23:52:00”) formatted_date:<- format(date_time, “%d %B, %Y”) print:(formatted_date) output:html_document:toc:true toc_depth:3 toc_float:true theme:“flatly” highlight:“pygments”—## Objective:
Reinforce your understanding of reliability, validity, and measurement errors through hands - on exercises using real datasets in R. Apply statistical methods to compute reliability and validity metrics and interpret the results in the context of psychological research.
Prepare your RStudio environment by installing and loading the necessary packages:```{ r, warning = F, message = F } # Install and load required packages if (!require(psych)) { install.packages(“psych”, dependencies = TRUE) } if (!require(irr)) { install.packages(“irr”, dependencies = TRUE) } if (!require(ggplot2)) { install.packages(“ggplot2”, dependencies = TRUE) } if (!require(datasets)) { install.packages(“datasets”, dependencies = TRUE) }
library(psych) # For reliability analyses library(irr) # For inter-rater reliability library(ggplot2) # For data visualization library(datasets) # Built-in R datasets
## Homework Exercises:
### 📌 Exercise 1: Test-Retest Reliability (Using the CO2 Dataset)
** Definition: **
Test - retest reliability measures the consistency of a test over time by administering the same test to the same individuals on two different occasions. If the results are highly correlated, the test is considered stable and reliable over time.
💡 Example:If a depression questionnaire is given to participants twice, one month apart, and their scores are highly correlated, the test has good test -
retest reliability.
** Task ** :Evaluate test - retest reliability by checking how similar measurements of CO2 uptake are across two different times in the dataset.
1. Load the built - in CO2 dataset in R.
2. Compute the Pearson correlation coefficient between uptake (CO2 uptake in plants) at two time points.
3. Interpret test - retest reliability in the context of repeated psychological testing.
```{
r, warning = F, message = F
}
#Run this chunk to load the CO2 dataset
data("CO2")
CO2$uptake2 <- CO2$uptake + rnorm(nrow(CO2), mean = 0, sd = 2)
# Scatter plot to visualize relationship between uptake at two times
ggplot(CO2, aes(x = uptake, y = uptake2 + rnorm(
nrow(CO2), mean = 0, sd = 2
))) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Test-Retest Reliability Example", x = "Uptake Time 1", y =
"Uptake Time 2") +
theme_minimal()
```{ r } library(ggplot2)
CO2\(uptake2 <- CO2\)uptake + rnorm(nrow(CO2), mean = 0, sd = 2)
cor_result <- cor(CO2\(uptake, CO2\)uptake2, method = “pearson”)
print(cor_result)
ggplot(CO2, aes(x = uptake, y = uptake2)) + geom_point() + geom_smooth(method = “lm”) + labs(title = “Test-Retest Reliability of CO2 Uptake”, x = “Uptake Time 1”, y = “Uptake Time 2”) + theme_minimal()
📍 ** # Load necessary libraries
library(ggplot2)
# Load the CO2 dataset
data("CO2")
# Simulate second measurement of CO2 uptake with slight variations
set.seed(2024)
CO2$uptake2 <- CO2$uptake + rnorm(nrow(CO2), mean = 0, sd = 2)
# Compute Pearson correlation coefficient
cor_result <- cor(CO2$uptake, CO2$uptake2, method = "pearson")
# Print the result
print(cor_result)
# Scatter plot with regression line
ggplot(CO2, aes(x = uptake, y = uptake2)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Test-Retest Reliability of CO2 Uptake", x = "Uptake Time 1", y = "Uptake Time 2") +
theme_minimal()
Pearson correlation (r) measures how strongly the two time points are related.
If r > 0.80, it suggests strong test - retest reliability, meaning CO2 uptake is stable over time.
**
### 📌 Exercise 2: Internal Consistency Reliability (Using the bfi Dataset from psych)
** Definition: **
Internal consistency reliability assesses how well the items on a test measure the same construct. A test has high internal consistency if all the items that are supposed to measure the same concept produce similar scores.
💡 Example:If a 10 - item self - esteem scale truly measures self - esteem, responses across items should be highly correlated, meaning someone with high self -
esteem should score consistently high on all items. This is assessed using Cronbach’s alpha (α), where values above 0.70 indicate acceptable reliability.
** Task ** :Assess internal consistency using Cronbach's alpha on personality survey responses.
1. Load the bfi dataset (Big Five Inventory) from the psych package.
2. Run the code that selects five items measuring extraversion (`E1` to `E5`).
3. Calculate Cronbach’s alpha using the `alpha()` function.
``` r
# Load the bfi dataset
data("bfi", package="psych")
# Compute Cronbach's alpha for extraversion items
extraversion_items <- bfi[, c("E1", "E2", "E3", "E4", "E5")]
```{ r } # Insert your code to calculate Cronbach’s alpha library(psych)
data(“bfi”, package = “psych”)
extraversion_items <- bfi[, c(“E1”, “E2”, “E3”, “E4”, “E5”)]
alpha_result <- alpha(extraversion_items)
print(alpha_result)
📍 ** # Load necessary library
library(psych)
# Load the bfi dataset
data("bfi", package = "psych")
# Select five extraversion items
extraversion_items <- bfi[, c("E1", "E2", "E3", "E4", "E5")]
# Compute Cronbach's alpha
alpha_result <- alpha(extraversion_items)
# Print the result
print(alpha_result): **
### 📌 Exercise 3: Inter-Rater Reliability (Using raters Data from irr)
** Definition: **
Inter - rater reliability evaluates the level of agreement between two or more raters when assessing the same participants or behaviors. A test has high inter -
rater reliability if different raters produce similar scores for the same observations.
💡 Example:If two therapists independently assess patients' anxiety levels using a standardized rating scale, their ratings should be highly consistent. Cohen’s Kappa (κ) is used to measure how much agreement is beyond chance.
**Task**:
Measure agreement between two raters evaluating 30 participants' behavior scores.
1. Load a dataset containing two sets of ratings from different raters.
2. Compute Cohen's Kappa to assess agreement.
``` r
#Run this chunk to create the data
# Simulated raters' data
set.seed(2024)
rater1 <- rnorm(50, mean = 100, sd = 10)
rater2 <- rater1 + 6
# Create a dataframe
rater_data <- data.frame(Rater_1 = rater1, Rater_2 = rater2)
```{ r } # Insert your code to calculate Cohen’s Kappa library(irr)
set.seed(2024) rater1 <- round(rnorm(50, mean = 100, sd = 10)) rater2 <- round(rater1 + 6)
rater_data <- data.frame(Rater_1 = rater1, Rater_2 = rater2)
kappa_result <- kappa2(rater_data, “unweighted”)
print(kappa_result)
📍 ** # Load necessary library
library(irr)
# Simulated raters' data
set.seed(2024)
rater1 <- round(rnorm(50, mean = 100, sd = 10))
rater2 <- round(rater1 + 6)
# Create a dataframe
rater_data <- data.frame(Rater_1 = rater1, Rater_2 = rater2)
# Compute Cohen’s Kappa
kappa_result <- kappa2(rater_data, "unweighted")
# Print the result
print(kappa_result)
**
### 📌 Exercise 4: Convergent Validity (Using mtcars Dataset)
** Definition: **
Convergent validity measures how strongly a test correlates with another test measuring the same or a closely related construct. High correlations indicate that the new test is actually measuring the intended concept.
💡 Example:If a new anxiety scale correlates highly (r > 0.70) with an already well -
established anxiety scale, then the new scale has strong convergent validity.
** Task ** :Evaluate convergent validity by examining the correlation between two related variables:horsepower (`hp`) and engine displacement (`disp`).
1. Load the built - in mtcars dataset.
Compute the correlation between `hp` (horsepower) and `disp` (engine displacement).
2. Interpret whether this shows convergent validity.
```{
r
}
#Run this chunk to create the data
# Load dataset
data("mtcars")
```{ r } # Insert your code to calculate correlation data(“mtcars”)
cor_result <- cor(mtcars\(hp, mtcars\)disp, method = “pearson”)
print(cor_result)
📍 ** # Load dataset
data("mtcars")
# Compute Pearson correlation between hp and disp
cor_result <- cor(mtcars$hp, mtcars$disp, method = "pearson")
# Print the correlation result
print(cor_result) **
If r > 0.70, this indicates strong convergent validity, meaning horsepower and engine displacement are closely related.
If r is moderate (0.40–0.70), the variables are related but not strongly.
If r < 0.40, the relationship is weak, suggesting poor convergent validity.
Since horsepower (hp) and engine displacement (disp) are both indicators of engine performance, we expect a strong positive correlation. A high r value would confirm that these two variables measure a similar construct, supporting convergent validity.
### 📌 Exercise 5: Divergent Validity (Using iris Dataset)
** Definition: **
Divergent (or discriminant) validity ensures that a test does not correlate with unrelated constructs. A test should measure only what it intends to measure and should not be highly related to irrelevant variables.
💡 Example:A social anxiety scale should not be highly correlated (r < 0.30) with a math ability test, because the two constructs are unrelated. Low correlation suggests strong divergent validity.
** Task ** :Explore divergent validity by examining the relationship between petal width and sepal length, two biologically distinct features in flowers.
1. Load the built - in iris dataset.
2. Compute the correlation between `Petal.Width` and `Sepal.Length`.
3. Interpret whether this shows divergent validity.
```{
r
}
#Run this chunk to load the dataset
data("iris")
```{ r } # Assess divergent validity by calculating the correlation # Insert your code to calculate correlation
data(“iris”)
cor_result <- cor(iris\(Petal.Width, iris\)Sepal.Length, method = “pearson”)
print(cor_result) ```
📍 ** # Load dataset data(“iris”)
cor_result <- cor(iris\(Petal.Width, iris\)Sepal.Length, method = “pearson”)
print(cor_result) **
Divergent validity is demonstrated when two variables that measure distinct constructs have a low correlation (r < 0.30).
If r is high (r > 0.70), the variables may not be as distinct as expected, suggesting weaker divergent validity.
If r is moderate (0.30–0.70), the relationship is neither strongly convergent nor strongly divergent.
Since Petal.Width and Sepal.Length are biologically different flower features, we expect a weak or moderate correlation. A low r value would support divergent validity, confirming that these features measure different biological constructs.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish to RPubs. Submit the link to Canvas Assignments.