Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal data categorizes data into distinct groups without an order. Ordinal data categorizes and organizes data by ranks, however the intervals are not equal. Next, interval data categorizes, ranks, and has equal intervals, however no true zero value in the data. Lastly, ration data includes all these collective factors plus a true zero value. An example of nominal data is eye color. An example of ordinal data is people’s educational standings. An example of interval data is IQ levels. Lastly, an example of ratio data is reaction time in milliseconds.
Scores on a depression inventory is an interval data measurement because while the scores have equal sub intervals between them, a score of zero does not conclude that there is an absence of depression, making this an interval scaled measurement. Response time in milliseconds is a ratio data measurement because response time has a true zero value/point, and equal intervals between each measurement. Likert scale ratings of agreement is an ordinal data measurement because the ratings are ordered, however the intervals between each category are not exactly equal. Diagnostic categories are a nominal measurement because the categories are names/labels with no specific order or ranking. Lastly, age in years is a ratio data measurement because age has a true zero value, when a baby is born, and equal intervals between the years.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
A random error is unpredictable and the value fluctuates around the true value. For example, in a memory experiment, a participant may randomly misremember a specific word or little detail, or a researcher might accidentally record the incorrect time during a trial for an experiment or study. A systematic error is consistent and biases measurements in only one direction, impacting the accuracy of the results. For example, in a memory experiment where there a list of words related to a specific theme presented to participants, participants have a better chance of performing well, regardless of their memory power. This is considered as a systematic error because the experiment is focused on people’s ability to remember the words related to a particular theme rather than testing and analyzing one’s memory ability.
Measurement errors can undermine the validity of a study examining stress and academic performance by introducing some inaccuracies that could potentially lead to biased results and non-credible sources. Through careful planning, instrument validation, and data analysis techniques, researchers can minimize these measurement errors.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## item group1 vars n mean sd median trimmed mad min max
## X11 1 Control 1 30 301.40 48.54 299.68 300.42 55.38 201.67 408.45
## X12 2 Experimental 1 17 295.75 38.37 288.49 295.61 43.74 215.67 377.94
## range skew kurtosis se
## X11 206.78 0.14 -0.66 8.86
## X12 162.27 0.00 -0.27 9.31
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
data %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
The mean anxiety for the control group is 3.79. The mean anxiety for the experimental group is 8.64
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
mean_rt <- 350
sd_rt <- 75
# (a) Probability of reaction time > 450ms
p_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
# (b) Probability of reaction time between 300ms and 400ms
p_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
p_greater_450
## [1] 0.09121122
a) The probability that a randomly selected student will have a reaction time greater than 450ms is 0.09. b) The probability that a participant will have a reaction time between 300ms and 400ms is 0.49.
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.performance_category
that
categorizes participants based on their accuracy:
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
I started by removing the rows that had missing values, which created a new dataset (clean_data). Next, I created a new variable (performance_category), which classified participants based on their accuracy. A value of 90 or above accuracy was high, a value between 70 and 90 was participants with medium accuracy and a value of 70 indicated participants with low accuracy. Lastly, I filtered the dataset to include only participants in the experimental who had a faster reaction time than the mean.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
corPlot(cor(numeric_data, use = "pairwise.complete.obs"),
numbers = TRUE, # Display correlation values
upper = FALSE, # Show only lower triangle
main = "Correlation Plot of Key Variables")
## Error in plot.new(): figure margins too large
There is a strong correlation between anziety_pre and anxiety_post. Another strong correlation is between anxiety_change and anxiety_pre. An interesting relationship is between the reaction time and accuracy. These correlations can inform further research in psychology since anxiety negatively affects performance. Future studies can explore ways to reduce anxiety in high pressure situations.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
1) How does the workload high school students deal with affect anxiety levels in teenagers? To study this, I would collect data on previous studies from high school students on how their workload impacts their anxiety levels as well as conduct surveys to get an accurate report on students anxiety. I would use correlation tests to see if a higher workload is linked with higher levels of anxiety. I could use regression analysis to determine this. A potential measurement error I may need to address could come from self reported data from the surveys. People may under or overestimate their anxiety levels or hours dedicated to schoolwork and studying. 2) Learning R and implementing the concepts for data analysis changed my view and overall understanding of psychological statistics because now I have a better idea of how these concepts can aid us in different studies, help us form conclusions, and how useful it is for this topic in particular. The most advantageous component of R is that it is completely free and widely accessible to everyone. It also has various tools for visualizing collected data. The biggest challenge would be learning coding for those who have no prior experience in coding and are starting out fresh.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.