Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
The key differences between nominal, ordinal, interval, and ratio data, is that these vary from the quantity of information that they supply and the different types of statistical analyses they authorize. Nominal Data is made up of groups that have no basic structure or order. An example of this is gender such as female, male, non-binary. Ordinal Data shows groups with a significant and relevant structure, but the gap between the groups aren’t automatically equal. An example of this is scale rating where one might have to rate between 1-5 where 1 might mean strongly agree and 5 might mean strongly disagree. Interval Data has order and equal relation between the values, but there isn’t a zero point. An example of this are IQ scores. Lastly, Ratio Data has the same characteristics as interval data, but it has a zero point. An example of this is reaction time.
For scores on a depression inventory (0-63) the appropriate level of measurement would be interval data. This is because with the range being 0-63 they have equal intervals connecting the values. For response time in milliseconds the appropriate level of measurement would be ratio data. This is because it has a zero point, the measurements connecting the intervals are equal, and the ratios connecting the values are significant. For likert scale rating of agreement (1-7) the appropriate level of measurement would be ordinal data. This is because it has an structural order. For diagnostic categories the appropriate level of measurement would be nominal data. This is because the groupings are clear, incoherent categorizes that arrange people based on their situation. For age in years the appropriate level of measurement would have to be ratio data. This is because it has a zero point, alike intervals connecting the values, and a significant ratio.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
The difference between random and systematic error is that a random error is when a inconsistent variation or unpredictability that happens by chance. They don’t follow any pattern and affect the frequency in uncertain ways, which leads to mixture of results. An example of this memory tests, when you are to recall a list of numbers or words. Systematic error is when a logical, dependable, and predictable faulty bias measurements in a certain direction. These types of errors occur from mistakes in the measurement, design, or technique system, which can lead to answers that drift steady from the true value. An example of this is when you’re doing s test and the brightness is set to low.
How measurement errors might affect the validity of a study examining the relationship between stress and academic performance is that when measurement errors can launch mistakes and inaccurate that can affect the ending results drawn about the connection between academic performance and stress. Some steps that researchers could take to minimize these errors is using reliable and authentic measurements, reduce systematic bias, and regulate data collection methods.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
# Your code here
library(psych)
# Calculate descriptive statistics for reaction_time grouped by condition
describeBy(data$reaction_time, data$condition, mat = TRUE, digits = 2)## item group1 vars n mean sd median trimmed mad min max
## X11 1 Control 1 30 301.40 48.54 299.68 300.42 55.38 201.67 408.45
## X12 2 Experimental 1 17 295.75 38.37 288.49 295.61 43.74 215.67 377.94
## range skew kurtosis se
## X11 206.78 0.14 -0.66 8.86
## X12 162.27 0.00 -0.27 9.31
# Calculate descriptive statistics for accuracy grouped by condition
describeBy(data$accuracy, data$condition, mat = TRUE, digits = 2)## item group1 vars n mean sd median trimmed mad min max range
## X11 1 Control 1 29 85.49 9.86 85.53 85.68 8.77 61.91 105.50 43.59
## X12 2 Experimental 1 19 88.06 8.20 88.32 87.76 9.86 74.28 106.87 32.59
## skew kurtosis se
## X11 -0.15 -0.35 1.83
## X12 0.45 -0.45 1.88
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.# Your code here
library(dplyr)
#Create a new variable for anxiety change
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
#Calculate mean anxiety change
data %>%
group_by(condition) %>%
summarise(mean_anxiety = mean(anxiety_change, na.rm = TRUE))## # A tibble: 2 × 2
## condition mean_anxiety
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
The mean for the control group of the anxiety values is 3.79. The mean for the experimental group of the anxiety values is 8.64
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Your code here
mean_rt <- 350
sd_rt <- 75
#(a) Probability of reaction time > 450ms
p_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
# (b) Probability of reaction time between 300ms and 400ms
p_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
p_greater_450## [1] 0.09121122
## [1] 0.4950149
The probability that a student chosen at random would have the reaction time greater than 450ms is 0.09. The probability that a student chosen at random would have a reaction time between 300ms and 400ms is 0.49.
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.performance_category that
categorizes participants based on their accuracy:
# Your code here
library(dplyr)
#Calculate the overall mean reaction time
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
# Filter the dataset
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)# Your code here
# Your code here```{r filter-participants, error = TRUE, message = FALSE, warning = FALSE}
library(dplyr)
# Calculate the overall mean reaction time
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
# Filter the dataset
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)At first I got rid of the rows that had the missing values, this created a new dataset which is called clean_data. I made a new variable called performance_category which is classified participants which are based on their accuracy. Between 70 and 90, 90 was where participants with a medium accuracy, 90 or above accuracy was high, and 70 was low accuracy. Lastly, I processed the dataset to make it include only participants within the experimental who had a quicker reaction time than the mean.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
library(dplyr)
library(psych)
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
# Create a correlation plot
corPlot(cor(numeric_data, use = "pairwise.complete.obs"),
numbers = TRUE, # Display correlation values
upper = FALSE, # Show only lower triangle
main = "Correlation Plot of Key Variables")## Error in plot.new(): figure margins too large
There’s a strong correlation between the anxiety_post and anxiety_pre. There’s also a strong correlation between anxiety_pre and anxiety_change. There is also a correlation between accuracy and reaction time. These correlations can help advance the research in psychology because of how negatively anxiety can affect performance.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
1. A specific research question in psychology that interests me is “How does anxiety affect sleep deprivation and what are the foundational psychological components that connect them?” To study this, I would collect data on how many hours someone is sleeping and self reports on their anxiety levels by using a survey. I would also use personal interviews to gain more insight. The statistical analyses that I believe to be appropriate descriptive statistics and correlation analysis. Some of the potential measurement errors that I would need to address are the self-report bias, measurement instability, and social appeal bias. 2. How learning R for data analysis changed your understanding of psychological statistics is how it gives you a hands-on perspective to statistical notions, customization, flexibility, and transparency. What I see as the biggest advantages of using R compared to other statically software is that it is free, open-source, data cleaning, and dominant data awareness. Some of the challenges are troubleshooting errors, hard learning curve, and the presentation with larger datasets.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.