Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal is unordered mutually exclusive categorizing, ex. Gender. Ordinal is mutually exclusive data in a ordered or gradient like categorizing, ex. level of pain. Interval is a measurement of data on a scale with equidistant points, ex. length and temperature. Ratio is like Interval, but it also has a meaningful true zero, ex. height and weight.
“Scores on a depression inventory (0-63)” would be ratio as it has has true zero point meaning no depression and is categorizing with order and measure. “Response time in milliseconds” is ratio as it is organized, scaled, and has a meaningful 0 as 0ms is no reaction at all. “Likert scale ratings of agreement (1-7)” is ordinal as there is no meaningful zero point and has a lack of equidistant measurement as 2 to 3 may not be the same as 6 to 7 would be. “Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)” is nominal as it is unordered and mutually exclusive. “Age in years” is ratio as it has a order, equidistant measure, and a meaningful zero point.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Random error: a unpredictable variation in data due to human error or random circumstances like misreading a piece of data while recording it. Systematic Error: a predictable error that is readable and static like a poorly calibrated scale.
Validity could be effected by measurement error as people recording data may mess up while recording. This may skew or effect the data and give a wrong output. In order to prevent this you could have multiple recorders at once or maybe even a recording camera to review the data in order to r4educe error and increase valididty or the data.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## item group1 vars n mean sd median trimmed mad min max range
## X11 1 Control 1 29 85.49 9.86 85.53 85.68 8.77 61.91 105.50 43.59
## X12 2 Experimental 1 19 88.06 8.20 88.32 87.76 9.86 74.28 106.87 32.59
## skew kurtosis se
## X11 -0.15 -0.35 1.83
## X12 0.45 -0.45 1.88
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.# Your code here
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
data %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
The mean anxiety for the control is 3.79 while the mean for the experimental is 8.64
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Your code here
mean_rt <- 350
sd_rt <- 75
# (a) Probability of reaction time > 450ms
p_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
# (b) Probability of reaction time between 300ms and 400ms
p_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
p_greater_450
## [1] 0.09121122
The prob of someone having a 450ms or more reaction time is .09 and the prob of 300-400 ms is .49.
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.performance_category
that
categorizes participants based on their accuracy:
# Your code here
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
# Your code here
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
I first cleaned he data. Created a new variable (performance_category) which classifies off of of accuarcy. 90 and above is high 90 down to 70 is medium and below 70 is low. Then I only filtlered for those above the reaction time mean.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
corPlot(cor(numeric_data, use = "pairwise.complete.obs"),
numbers = TRUE, # Display correlation values
upper = FALSE, # Show only lower triangle
main = "Correlation Plot of Key Variables")
## Error in plot.new(): figure margins too large
anxiety and pre and post have a strong correlation. Anxiety change and pre also do too. reaction time and accuracy have a suprising lack of correlation. This may gfive a better insight in to the reaction of the neversystem in times demanding reaction with anxiety.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
(1.) One specific research question that interests me in psych is the correlation between gaming experience and technical skills like hand eye coordination. You test people who play a lot of video games in different exercises or tasks that demand theses skills and can put that against the average person. One error may be the previous skills a person has like if you tested coordination in throwing and catching a gamer with baseball experience may skew the data.(2.) learingnig R has shown me the importance of how you record and do expermeints in order to get the best results. It is a hard and through process. The advantage is the computational aspect it allows data to be ordered, but the disadvantage is the unintutitve and rather complex lanugage and systems it imploys for the average user.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.