Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Random error is an unpredictable mistake that happens by chance and affects the accuracy of measurements in an inconsistent way. For example, in a memory experiment, some participants might be distracted by background noise while recalling words, causing occasional mistakes. These errors are random and usually balance out over time.
Systematic errors are a consistent, predictable mistake that affects measurements in the same way every time. In a memory experiment, this could happen if the experimenter gives one group more encouragement or if the word lists for different groups are too easy or hard. This type of error skews the results and makes them less accurate. Overall, random error makes results less reliable, while systematic error makes them less valid.
Measurement error can affect the validity of a study by making the results less accurate and potentially misleading. If the tools used to measure stress or academic performance aren’t reliable, the study might not truly show how stress impacts performance. For example, if the stress levels of participants are measured poorly, it might seem like stress doesn’t affect academic performance, even though it actually does. To reduce these errors, researchers can use well-tested and reliable measurement tools, ensure that all participants are tested in the same way to avoid differences in procedures, and increase the sample size to reduce random errors. They can also pilot test their tools before the study to make sure they’re effective. These steps help make the study’s results more accurate and valid.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
# Calculate descriptive statistics of reaction_time grouped by condition
describeBy(data$reaction_time, data$condition)
##
## Descriptive statistics by group
## group: Control
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 30 301.4 48.54 299.68 300.42 55.38 201.67 408.45 206.78 0.14 -0.66
## se
## X1 8.86
## ------------------------------------------------------------
## group: Experimental
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 17 295.75 38.37 288.49 295.61 43.74 215.67 377.94 162.27 0 -0.27
## se
## X1 9.31
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
# Create new variable named anxiety change
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
# Calculate mean of anxiety change for each condition
data <- data %>%
group_by(condition) %>%
summarize (mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
The mean for anxiety change in the control group is 3.79, and the mean for the experimental group is 8.64.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
## [1] 0.09121122
## [1] 0.4950149
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.# Create data frame
data <- data.frame(
participant_id = c(1, 2, 3, 4, 5, 6),
reaction_time = c(271.9762, 288.4911, 377.9354, 303.5254, 306.4644, 385.7532),
accuracy = c(87.53319, 84.71453, 84.57130, 98.68602, 82.74229, 100.16471),
gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
condition = c("Control", "Experimental", "Experimental", "Control", "Control", "Control"),
anxiety_pre = c(31.30191, 31.15234, 27.65762, 16.93299, 24.04438, 22.75684),
anxiety_post = c(29.05312, 19.21510, 20.45306, 13.75199, 17.84736, 19.93397)
)
# Remove rows with missing values using na.omit()
clean_data <- na.omit(data)
# View the cleaned dataset
print(clean_data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
performance_category
that
categorizes participants based on their accuracy:
library(dplyr)
# Example data frame
data <- data.frame(
participant_id = c(1, 2, 3, 4, 5, 6),
reaction_time = c(271.9762, 288.4911, 377.9354, 303.5254, 306.4644, 385.7532),
accuracy = c(87.53319, 84.71453, 84.57130, 98.68602, 82.74229, 100.16471),
gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
condition = c("Control", "Experimental", "Experimental", "Control", "Control", "Control"),
anxiety_pre = c(31.30191, 31.15234, 27.65762, 16.93299, 24.04438, 22.75684),
anxiety_post = c(29.05312, 19.21510, 20.45306, 13.75199, 17.84736, 19.93397)
)
# Create performance_category based on accuracy
data <- data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High", # Accuracy 90 and above
accuracy >= 70 & accuracy < 90 ~ "Medium", # Accuracy between 70 and 90
accuracy < 70 ~ "Low" # Accuracy below 70
))
# View the updated data
print(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post performance_category
## 1 29.05312 Medium
## 2 19.21510 Medium
## 3 20.45306 Medium
## 4 13.75199 High
## 5 17.84736 Medium
## 6 19.93397 High
library(dplyr)
# Example data frame
data <- data.frame(
participant_id = c(1, 2, 3, 4, 5, 6),
reaction_time = c(271.9762, 288.4911, 377.9354, 303.5254, 306.4644, 385.7532),
accuracy = c(87.53319, 84.71453, 84.57130, 98.68602, 82.74229, 100.16471),
gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
condition = c("Control", "Experimental", "Experimental", "Control", "Control", "Experimental"),
anxiety_pre = c(31.30191, 31.15234, 27.65762, 16.93299, 24.04438, 22.75684),
anxiety_post = c(29.05312, 19.21510, 20.45306, 13.75199, 17.84736, 19.93397)
)
# Calculate the overall mean reaction time
mean_reaction_time <- mean(data$reaction_time)
# Filter the dataset to include only participants in the Experimental condition with reaction times faster than the mean
filtered_data <- data %>%
filter(condition == "Experimental", reaction_time < mean_reaction_time)
# View the filtered data
print(filtered_data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 2 288.4911 84.71453 Female Experimental 31.15234
## anxiety_post
## 1 19.2151
To clean the data, I started by checking for any missing values using the is.na() function. Next, I created a new variable called performance_category to group participants based on their accuracy scores. I categorized them as “High” if their accuracy was 90 or above, “Medium” if it was between 70 and 90, and “Low” if it was below 70. I used mutate() along with case_when(). Then, I filtered the dataset to include only participants in the Experimental condition whose reaction times were faster than the overall mean. To do this, I first calculated the mean reaction time and then used filter() to narrow down the data.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Example data
data <- data.frame(
participant = c(1, 2, 3, 4, 5, 6),
reaction_time = c(271.9762, 288.4911, 377.9354, 303.5254, 306.4644, 385.7532),
accuracy = c(87.53319, 84.71453, 84.57130, 98.68602, 82.74229, 100.16471),
gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
condition = c("Control", "Experimental", "Experimental", "Control", "Control", "Control"),
anxiety_pre = c(31.30191, 31.15234, 27.65762, 16.93299, 24.04438, 22.75684),
anxiety_post = c(29.05312, 19.21510, 20.45306, 13.75199, 17.84736, 19.93397)
)
# Create a new variable
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
# Select only numeric variables
numeric_data <- data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
# Create the correlation plot
corPlot(cor(numeric_data))
## Error in plot.new(): figure margins too large
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.