Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Scores on a depression inventory (0-63):
Interval – Equal intervals, but no true zero (0 doesn’t mean no
depression).
- Response time in milliseconds: Ratio – Equal
intervals and a true zero (0 ms means no response).
Likert scale ratings of agreement (1-7): Ordinal – Ordered categories, but the intervals may not be exactly equal.
Diagnostic categories (e.g., ADHD or anxiety disorder): Nominal – Categories without a numerical order.
Age in years: Ratio – Equal intervals and a true zero (age cannot be negative).
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Random Error: Mistakes that happen by chance and don’t follow a pattern. - Example: In a memory test, a participant gets distracted by noise and forgets a word.
Systematic Error: Mistakes that happen the same way every time.
Effect on Validity: Measurement errors can make the results less accurate. If stress or academic performance is not measured correctly the study might show a weak or false relationship.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
##
## Descriptive statistics by group
## group: Control
## vars n mean sd median trimmed mad min max range
## reaction_time 1 30 301.40 48.54 299.68 300.42 55.38 201.67 408.45 206.78
## accuracy 2 29 85.49 9.86 85.53 85.68 8.77 61.91 105.50 43.59
## skew kurtosis se
## reaction_time 0.14 -0.66 8.86
## accuracy -0.15 -0.35 1.83
## ------------------------------------------------------------
## group: Experimental
## vars n mean sd median trimmed mad min max range
## reaction_time 1 17 295.75 38.37 288.49 295.61 43.74 215.67 377.94 162.27
## accuracy 2 19 88.06 8.20 88.32 87.76 9.86 74.28 106.87 32.59
## skew kurtosis se
## reaction_time 0.00 -0.27 9.31
## accuracy 0.45 -0.45 1.88
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post) %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
print (data) ## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
Control Group: 3.79 Experimental Group: 8.643
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
mean_rt <- 350 # Mean reaction time in ms
std_rt <- 75 # Standard deviation in ms
# a. Probability of reaction time > 450ms
p_gt_450 <- 1 - pnorm(450, mean = mean_rt, sd = std_rt)
# b. Probability of reaction time between 300ms and 400ms
p_300_to_400 <- pnorm(400, mean = mean_rt, sd = std_rt) - pnorm(300, mean = mean_rt, sd = std_rt)
# Display results
p_gt_450## [1] 0.09121122
## [1] 0.4950149
(a) Probability of a reaction time greater than 450ms: 0.0912 (or 9.12%) (b) Probability of a reaction time between 300ms and 400ms: 0.4950 (or 49.50%)
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
performance_category that
categorizes participants based on their accuracy:
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10), # Ensure accuracy is included
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # Placeholder
)
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
data$anxiety_post <- pmax(data$anxiety_post, 0)
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# I had to reload the data set because my code was not working so after recreating it I cleaned the data
# Recreating clean_data
clean_data <- data %>%
filter(complete.cases(.))
# Verifying columns again
colnames(clean_data)## [1] "participant_id" "reaction_time" "accuracy" "gender"
## [5] "condition" "anxiety_pre" "anxiety_post"
clean_data <- clean_data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High",
accuracy >= 70 & accuracy < 90 ~ "Medium",
accuracy < 70 ~ "Low",
TRUE ~ NA_character_
))
# View the updated dataset
head(clean_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 278.1420 76.06792 Male Control 32.337399
## 2 2 316.5590 88.33903 Male Control 3.712618
## 3 3 199.2895 89.11430 Male Control 33.882217
## 4 4 310.5990 84.66964 Male Control 21.120099
## 5 6 401.8787 110.71458 Male Experimental 22.638738
## 6 7 365.0588 82.94701 Female Control 31.975720
## anxiety_post performance_category
## 1 26.033569 Medium
## 2 3.630559 Medium
## 3 30.984813 Medium
## 4 19.173950 Medium
## 5 14.876373 High
## 6 30.643407 Medium
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
head(filtered_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 10 269.9247 95.24673 Male Experimental 21.87452
## 2 11 282.3977 93.17659 Female Experimental 16.25770
## 3 14 237.0676 75.54591 Female Experimental 38.79410
## 4 21 273.8544 90.10133 Female Experimental 23.28767
## 5 23 296.9589 75.03219 Female Experimental 38.69844
## 6 34 289.9609 73.64412 Male Experimental 20.40484
## anxiety_post performance_category
## 1 13.25773 High
## 2 10.10701 High
## 3 29.86405 Medium
## 4 11.94372 High
## 5 26.98841 Medium
## 6 11.32828 Medium
I cleaned the dataset by first removing any rows with missing values to ensure that all data was complete and usable. Then, I created a new variable to categorize participants based on their accuracy scores into High, Medium, and Low performance levels. After that, I calculated the overall mean reaction time and filtered the dataset to include only participants in the Experimental condition who had reaction times faster than this average.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
clean_data <- clean_data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
# Check if the selection worked
colnames(numeric_data)## [1] "reaction_time" "accuracy" "anxiety_pre" "anxiety_post"
## [5] "anxiety_change"
cor_matrix <- cor(numeric_data, use = "pairwise.complete.obs")
corPlot(cor_matrix,
main = "Correlation Plot of Psychological Variables",
scale = FALSE,
diag = FALSE,
cex = 0.7) # Adjust text sizeThe plot shows that pre- and post-experiment anxiety are strongly related, and bigger reductions in anxiety lead to lower final scores. If reaction time and accuracy are negatively correlated, it suggests a speed-accuracy tradeoff, but a positive link would challenge that idea. If anxiety and accuracy aren’t related, it could mean anxiety doesn’t always hurt performance. These insights can help researchers explore ways to improve focus and reduce anxiety effects.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
1. How does social media usage impact anxiety levels in college students? I would collect self-reported social media usage (hours per day) and anxiety scores (using a standardized scale like GAD-7). 2. Learning R has made me understand data manipulation, visualization, and statistical testing, making it a lot easier for me to work with data sets. The biggest advantage of R is it’s flexibility and powerful libraries which allow for advanced analyses and automation.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.