• Objective:
  • Setup:
  • Homework Exercises:
    • 📌 Exercise 1: Test-Retest Reliability (Using the CO2 Dataset)
    • 📌 Exercise 2: Internal Consistency Reliability (Using the bfi Dataset from psych)
    • 📌 Exercise 3: Inter-Rater Reliability (Using raters Data from irr)
    • 📌 Exercise 4: Convergent Validity (Using mtcars Dataset)
    • 📌 Exercise 5: Divergent Validity (Using iris Dataset)

Replace “Your Name” with your actual name.

Objective:

Reinforce your understanding of reliability, validity, and measurement errors through hands-on exercises using real datasets in R. Apply statistical methods to compute reliability and validity metrics and interpret the results in the context of psychological research.

Setup:

Prepare your RStudio environment by installing and loading the necessary packages:

# Install and load required packages
if(!require(psych)) { install.packages("psych", dependencies=TRUE) }
if(!require(irr)) { install.packages("irr", dependencies=TRUE) }
if(!require(ggplot2)) { install.packages("ggplot2", dependencies=TRUE) }
if(!require(datasets)) { install.packages("datasets", dependencies=TRUE) }

library(psych)      # For reliability analyses
library(irr)        # For inter-rater reliability
library(ggplot2)    # For data visualization
library(datasets)   # Built-in R datasets

Homework Exercises:

📌 Exercise 1: Test-Retest Reliability (Using the CO2 Dataset)

Definition: Test-retest reliability measures the consistency of a test over time by administering the same test to the same individuals on two different occasions. If the results are highly correlated, the test is considered stable and reliable over time.

đź’ˇ Example: If a depression questionnaire is given to participants twice, one month apart, and their scores are highly correlated, the test has good test-retest reliability.

Task:

Evaluate test-retest reliability by checking how similar measurements of CO2 uptake are across two different times in the dataset.

  1. Load the built-in CO2 dataset in R.

  2. Compute the Pearson correlation coefficient between uptake (CO2 uptake in plants) at two time points.

  3. Interpret test-retest reliability in the context of repeated psychological testing.

#Run this chunk to load the CO2 dataset
data("CO2")
CO2$uptake2 <- CO2$uptake+ rnorm(nrow(CO2), mean=0, sd=2)

# Scatter plot to visualize relationship between uptake at two times
ggplot(CO2, aes(x=uptake, y=uptake2 + rnorm(nrow(CO2), mean=0, sd=2))) +
  geom_point() +
  geom_smooth(method="lm") +
  labs(title="Test-Retest Reliability Example", x="Uptake Time 1", y="Uptake Time 2") +
  theme_minimal()

# Insert your code to calculate Pearson correlation
cor.test(CO2$uptake, CO2$uptake2)
## 
##  Pearson's product-moment correlation
## 
## data:  CO2$uptake and CO2$uptake2
## t = 45.707, df = 82, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9706804 0.9876244
## sample estimates:
##       cor 
## 0.9809342

📍 Write your interpretation here: The correlation between co2 uptake at time 1 and time 2 is 0.98. This is an extremely strong correlation and this indicates strong test-retest reliability.

📌 Exercise 2: Internal Consistency Reliability (Using the bfi Dataset from psych)

Definition: Internal consistency reliability assesses how well the items on a test measure the same construct. A test has high internal consistency if all the items that are supposed to measure the same concept produce similar scores.

💡 Example: If a 10-item self-esteem scale truly measures self-esteem, responses across items should be highly correlated, meaning someone with high self-esteem should score consistently high on all items. This is assessed using Cronbach’s alpha (α), where values above 0.70 indicate acceptable reliability.

Task:

Assess internal consistency using Cronbach’s alpha on personality survey responses.

  1. Load the bfi dataset (Big Five Inventory) from the psych package.

  2. Run the code that selects five items measuring extraversion (E1 to E5).

  3. Calculate Cronbach’s alpha using the alpha() function.

# Load the bfi dataset
data("bfi", package="psych")

# Compute Cronbach's alpha for extraversion items
extraversion_items <- bfi[, c("E1", "E2", "E3", "E4", "E5")]
# Insert your code to calculate Cronbach’s alpha
psych::alpha(extraversion_items, check.keys = TRUE)
## Warning in psych::alpha(extraversion_items, check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = extraversion_items, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
##       0.76      0.76    0.73      0.39 3.2 0.007  4.1 1.1     0.38
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.75  0.76  0.78
## Duhachek  0.75  0.76  0.78
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## E1-      0.73      0.73    0.67      0.40 2.6   0.0084 0.0044  0.38
## E2-      0.69      0.69    0.63      0.36 2.3   0.0095 0.0028  0.35
## E3       0.73      0.73    0.67      0.40 2.7   0.0082 0.0071  0.40
## E4       0.70      0.70    0.65      0.37 2.4   0.0091 0.0033  0.38
## E5       0.74      0.74    0.69      0.42 2.9   0.0078 0.0043  0.42
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean  sd
## E1- 2777  0.72  0.70  0.59   0.52  4.0 1.6
## E2- 2784  0.78  0.76  0.69   0.61  3.9 1.6
## E3  2775  0.68  0.70  0.58   0.50  4.0 1.4
## E4  2791  0.75  0.75  0.66   0.58  4.4 1.5
## E5  2779  0.64  0.66  0.52   0.45  4.4 1.3
## 
## Non missing response frequency for each item
##       1    2    3    4    5    6 miss
## E1 0.24 0.23 0.15 0.16 0.13 0.09 0.01
## E2 0.19 0.24 0.12 0.22 0.14 0.09 0.01
## E3 0.05 0.11 0.15 0.30 0.27 0.13 0.01
## E4 0.05 0.09 0.10 0.16 0.34 0.26 0.00
## E5 0.03 0.08 0.10 0.22 0.34 0.22 0.01

📍 Write your interpretation here: The cronbach alpha is 0.8. This indicates strong internal consistency.

📌 Exercise 3: Inter-Rater Reliability (Using raters Data from irr)

Definition: Inter-rater reliability evaluates the level of agreement between two or more raters when assessing the same participants or behaviors. A test has high inter-rater reliability if different raters produce similar scores for the same observations.

💡 Example: If two therapists independently assess patients’ anxiety levels using a standardized rating scale, their ratings should be highly consistent. Cohen’s Kappa (κ) is used to measure how much agreement is beyond chance.

Task:

Measure agreement between two raters evaluating 30 participants’ behavior scores.

  1. Load a dataset containing two sets of ratings from different raters.

  2. Compute Cohen’s Kappa to assess agreement.

#Run this chunk to create the data
# Simulated raters' data
set.seed(2024)
rater1 <- rnorm(50, mean = 100, sd = 10)
rater2 <- rater1 + 6

# Create a dataframe
rater_data <- data.frame(
  Rater_1 = rater1,
  Rater_2 = rater2
)
# Insert your code to calculate Cohen’s Kappa
cohen.kappa(rater_data)
## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels, 
##     w.exp = w.exp)
## 
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries 
##                  lower estimate upper
## unweighted kappa  0.00     0.00  0.00
## weighted kappa    0.75     0.82  0.88
## 
##  Number of subjects = 50

📍 Write your interpretation here: The cohen’s kappa is 0.82. This indicates strong inter-rater reliability.

📌 Exercise 4: Convergent Validity (Using mtcars Dataset)

Definition: Convergent validity measures how strongly a test correlates with another test measuring the same or a closely related construct. High correlations indicate that the new test is actually measuring the intended concept.

đź’ˇ Example: If a new anxiety scale correlates highly (r > 0.70) with an already well-established anxiety scale, then the new scale has strong convergent validity.

Task:

Evaluate convergent validity by examining the correlation between two related variables: horsepower (hp) and engine displacement (disp).

  1. Load the built-in mtcars dataset. Compute the correlation between hp (horsepower) and disp (engine displacement).

  2. Interpret whether this shows convergent validity.

#Run this chunk to create the data
# Load dataset
data("mtcars")
# Insert your code to calculate correlation
cor.test(mtcars$hp, mtcars$disp)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$hp and mtcars$disp
## t = 7.0801, df = 30, p-value = 7.143e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6106794 0.8932775
## sample estimates:
##       cor 
## 0.7909486

📍 Write your interpretation here: The correlation between horsepower and dispersion is 0.8. That is a strong correlation which indicates good convergent validity.

📌 Exercise 5: Divergent Validity (Using iris Dataset)

Definition: Divergent (or discriminant) validity ensures that a test does not correlate with unrelated constructs. A test should measure only what it intends to measure and should not be highly related to irrelevant variables.

đź’ˇ Example: A social anxiety scale should not be highly correlated (r < 0.30) with a math ability test, because the two constructs are unrelated. Low correlation suggests strong divergent validity.

Task:

Explore divergent validity by examining the relationship between petal width and sepal length, two biologically distinct features in flowers.

  1. Load the built-in iris dataset.

  2. Compute the correlation between Petal.Width and Sepal.Length.

  3. Interpret whether this shows divergent validity.

#Run this chunk to load the dataset
data("iris")
# Assess divergent validity by calculating the correlation
# Insert your code to calculate correlation
cor.test(iris$Petal.Width, iris$Sepal.Length)
## 
##  Pearson's product-moment correlation
## 
## data:  iris$Petal.Width and iris$Sepal.Length
## t = 17.296, df = 148, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7568971 0.8648361
## sample estimates:
##       cor 
## 0.8179411

📍 Write your interpretation here: Petal width and sepal length actually have a strong correlation with each other at 0.8. This means that they do not have divergent validity.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish to RPubs. Submit the link to Canvas Assignments.