Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Write your answer(s) here The key differences between nominal, ordinal, interval, and ratio data include the following. Nominal Data consists of categories without any inherent order. For example, gender, eye color, or diagnostic categories, like ADHD and anxiety. Ordinal Data has a meaningful order but the intervals between the values are not equal. For example, education levels, High School, Master’s Degree, PHD, etc., or a pain scale from 1 to 10, since it is a subjective perception. Interval Data has equal intervals between values but no true zero point. For example, temperature in Fahrenheit or Celsius, or year of birth. Ratio Data is very similar to interval data, but with a true zero point, meaning you can make meaningful ratio comparisons. For example, speed in miles per hour, or weight in pounds.
Write your answer(s) here The Scores on a depression inventory (0-63) is interval data because the scores have equal intervals between values. It is interval data because a score of 0 does not mean a complete absence of depression. A Response time in milliseconds is ratio data because there is a true zero, (0 milliseconds meaning no response time), and the differences between values are meaningful. The Likert Scale ratings of agreement (1-7) is ordinal data because the scale shows an order, (strongly disagree to strongly agree), but the intervals between the values might not be exactly equal. Diagnostic categories are nominal because the categories have no inherent order. Age in Years is ratio data because age has a true zero, and differences between values are meaningful.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Write your answer(s) here The difference between random and systematic error is that, random error has unpredictable variations in measurement that happen due to chance, they do not consistently have one outcome over another. An example being, If participants remember different numbers of words from a memory list due to distractions or from simply not paying attention, this can create random error, which causes inconsistencies, which then leads to unreliable data, but that doesn’t put the results in a specific direction. Systematic error happens consistently and in the same direction, mostly caused by flaws or measurement errors in the experiment. An example includes, if the size of words in a memory test is smaller for one group and bigger for another, the participants in the larger group might do better from being able to read it better. This now becomes a bias, favoring one group over the other.
Write your answer(s) here Measurement errors can distort the relationship between stress and academic performance by having inaccuracies. If the stress levels aren’t measured precisely, or if the academic performance is examined poorly, the study might fail to find a true correlation or might find a misleading one. Steps researchers can take to minimize these errors include using reliable measurement tools, minimizing bias in data collections, increasing sample size to help even out random errors, and using objective measures for a more reliable assessment.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
# Your code here
# Load necessary libraries
library(dplyr)
library(psych)
# Compute descriptive statistics for reaction time and accuracy
data %>%
group_by(condition) %>%
summarise(
mean_reaction = mean(reaction_time, na.rm = TRUE),
median_reaction = median(reaction_time, na.rm = TRUE),
sd_reaction = sd(reaction_time, na.rm = TRUE),
min_reaction = min(reaction_time, na.rm = TRUE),
max_reaction = max(reaction_time, na.rm = TRUE),
mean_accuracy = mean(accuracy, na.rm = TRUE),
median_accuracy = median(accuracy, na.rm = TRUE),
sd_accuracy = sd(accuracy, na.rm = TRUE),
min_accuracy = min(accuracy, na.rm = TRUE),
max_accuracy = max(accuracy, na.rm = TRUE) )
## # A tibble: 2 × 11
## condition mean_reaction median_reaction sd_reaction min_reaction max_reaction
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 301. 300. 48.5 202. 408.
## 2 Experimen… 296. 288. 38.4 216. 378.
## # ℹ 5 more variables: mean_accuracy <dbl>, median_accuracy <dbl>,
## # sd_accuracy <dbl>, min_accuracy <dbl>, max_accuracy <dbl>
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.# Your code here
# Create new variable for anxiety change
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
# Compute the mean anxiety change for each condition
data %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
Write your answer(s) here The experimental group had a slightly faster reaction time and a higher accuracy compared to the control group, with less variability in both measures. The experimental group also had a greater reduction in anxiety, showing that the experimental condition was more effective with reducing anxiety than the control condition. Results: The mean reaction time was 301.40 ms for the control group and 295.75 ms for the experimental group. The median reaction time was 299.68 ms for control and 288.49 ms for experimental. The standard deviation was 48.54ms in the control group and 38.37 ms in the experimental group, showing more variability in the control group. The minimum reaction time was 201.67 ms for control and 215.67 ms for experimental. The maximum reaction time was 408.45 ms for control and 377.95 ms for experimental. The mean accuracy was 85.49% in the control group and 88.06% in the experimental group. The median accuracy was 85.53% for control and 88.32% for experimental. The standard deviation was 9.86% in the control group and 8.20% in the experimental group, showing higher variability in the control group. The minimum accuracy was 61.91% for control and 74.28% for experimental. The maximum accuracy was 105.50% for control and 106.87% for experimental.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Your code here
# Given values
mean_rt <- 350 # Mean reaction time
sd_rt <- 75 # Standard deviation
# Probability of reaction time > 450ms
prob_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
prob_greater_450
## [1] 0.09121122
# Probability of reaction time between 300ms and 400ms
prob_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
prob_between_300_400
## [1] 0.4950149
Write your answer(s) here The probability that a participant has a reaction time greater than 450ms is 0.0912 or (9.12%). The probability that a participant’s reaction time is between 300ms and 400ms is 0.4950 or (49.50%).
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.performance_category
that
categorizes participants based on their accuracy:
# Your code here
# Create performance_category variable based on accuracy
clean_data <- clean_data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High",
accuracy >= 70 & accuracy < 90 ~ "Medium",
accuracy < 70 ~ "Low"
))
# Your code here
# Compute the overall mean reaction time
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
# Filter the dataset
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
Write your answer(s) here describing your data cleaning process. After the cleaning process, the dataset had 45 observations and 9 variables. There was a decrease in observations and an increase in variables, because of the addition of the performance_category. The dataset was filtered to include only participants in the experimental condition with reaction times faster than the overall mean reaction time. This resulted in 10 observations and 9 variables.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
# Load necessary packages
library(dplyr)
library(psych)
# Create a new dataset with only numeric variables
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
# Generate correlation plot
corPlot(numeric_data, cex = 1.2)
## Error in plot.new(): figure margins too large
Write your answer(s) here The correlation plot shows a strong positive correlation between anxiety_pre and anxiety_post, showing that the participants with a higher anxiety before the task showed higher anxiety afterward. There is a weak negative correlation between reaction time and accuracy showing that faster reaction times are slightly associated with better accuracy, but the relationship is small. Anxiety_change has a weak negative correlation with anxiety_post , which shows the greater reductions in anxiety are linked to lower post-task anxiety. —
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
Write your answer(s) here 1. A research question in Psychology that interests me is “How does anxiety affect the performance and scores of college students?” This interests me because I am going through this process right now and I feel my anxiety holds me back, especially when it comes to taking exams. I would collect self-reported test anxiety scores and academic performance measurements, like GPA and exam grades. A correlation analysis would show the relationship between the anxiety and performance, and a regression analysis could possibly show if test anxiety predicts future outcomes as a student. Possible issues could be the self-report bias in the anxiety scores and other factors and habits that could be affecting GPA. 2. Learning R for data analysis has truly become a dream come true. R has allowed me to better understand statistics. Although learning how to code is a challenge, for someone who can learn anything but math, it has truly made a huge difference.The advantages are endless, like bettering our visualization of relationships in psychology, the ability to show correlation, and using a hands-on learning technique is always the best way to learn in my opinion. A challenge of R is just learning and understanding the codes and maybe technical issues on a computer, but It has luckily been pretty easy to learn. In my opinion, R has better flexibility and visualization, and just feels more user friendly than other statistical softwares. —
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.