Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Write your answer(s) here Key Differences Between Nominal, Ordinal, Interval, and Ratio Data
Nominal: Data that represent categories with no meaningful order or ranking. The numbers or labels used are purely for identification. Example: Gender (Male, Female) or Diagnostic categories (e.g., ADHD, anxiety disorder). Ordinal: Data that represent categories with a meaningful order but the intervals between the categories are not consistent or meaningful. Example: Likert scale ratings (Strongly agree, Agree, Neutral, Disagree, Strongly disagree). Interval: Data that have meaningful intervals between values, but there is no true zero point (i.e., zero does not indicate an absence of the attribute). Example: Temperature measured in Celsius or Fahrenheit. A temperature of 0°C doesn’t mean “no temperature.” Ratio: Data with meaningful intervals and a true zero point, meaning that zero indicates an absence of the attribute. Example: Height, weight, or reaction time (e.g., reaction time measured in milliseconds).
Write your answer(s) here Scores on a depression inventory (0-63): Interval. The scores have meaningful intervals, but there is no true zero point indicating the complete absence of depression.
Response time in milliseconds: Ratio. Response time has a meaningful zero point, indicating no time elapsed.
Likert scale ratings of agreement (1-7): Ordinal. There is an order to the ratings, but the intervals between them may not be equal.
Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis): Nominal. These are categories without any intrinsic ordering.
Age in years: Ratio. Age has a true zero point (birth) and meaningful intervals.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Write your answer(s) here Difference Between Random and Systematic Error
Random Error: Occurs due to unpredictable factors, leading to variations in measurements that are random and do not systematically affect the results. Example: In a memory experiment, random errors could occur due to variations in a participant’s focus or fatigue during testing, leading to variability in recall performance. Systematic Error: Consistent, repeatable errors that affect the measurement in the same direction every time, often due to a flaw in the measurement instrument or procedure. Example: If the memory test is timed incorrectly (e.g., clock runs slow), all participants will be consistently measured with less time to complete the task, leading to inaccurate results.
Write your answer(s) here Impact: Measurement errors can undermine the study’s validity by introducing bias or noise into the data. For example, if stress is measured using a flawed or inconsistent scale, it could distort the true relationship between stress and academic performance.
Steps to Minimize Error:
Use validated and reliable measurement tools for both stress and academic performance. Ensure that all participants are tested under consistent conditions to avoid systematic errors. Use statistical techniques (e.g., reliability analysis) to assess and correct for measurement error when possible.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
library(psych)
library(dplyr)
# 1. Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition
data %>%
group_by(condition) %>%
summarise(
reaction_time_mean = mean(reaction_time, na.rm = TRUE),
reaction_time_median = median(reaction_time, na.rm = TRUE),
reaction_time_sd = sd(reaction_time, na.rm = TRUE),
reaction_time_min = min(reaction_time, na.rm = TRUE),
reaction_time_max = max(reaction_time, na.rm = TRUE),
accuracy_mean = mean(accuracy, na.rm = TRUE),
accuracy_median = median(accuracy, na.rm = TRUE),
accuracy_sd = sd(accuracy, na.rm = TRUE),
accuracy_min = min(accuracy, na.rm = TRUE),
accuracy_max = max(accuracy, na.rm = TRUE)
## Error in parse(text = input): <text>:18:0: unexpected end of input
## 16: accuracy_min = min(accuracy, na.rm = TRUE),
## 17: accuracy_max = max(accuracy, na.rm = TRUE)
## ^
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
data %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))
## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
Write your answer(s) here
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Required library
library(stats)
# 1. a. Calculate the probability that reaction time is greater than 450ms
probability_450ms <- 1 - pnorm(450, mean = 350, sd = 75)
# 1. b. Calculate the probability that reaction time is between 300ms and 400ms
probability_300_400ms <- pnorm(400, mean = 350, sd = 75) - pnorm(300, mean = 350, sd = 75)
probability_450ms
## [1] 0.09121122
## [1] 0.4950149
Write your answer(s) here
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.## Error in drop_na(.): could not find function "drop_na"
performance_category
that
categorizes participants based on their accuracy:
# 2. Create a new variable 'performance_category'
clean_data <- clean_data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High",
accuracy >= 70 & accuracy < 90 ~ "Medium",
accuracy < 70 ~ "Low"
))
## Error: object 'clean_data' not found
## Error: object 'clean_data' not found
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
## Error: object 'clean_data' not found
**Data Cleaning Process In this part of the task, I performed several key steps to clean and manipulate the dataset using the dplyr package in R.
Removing Rows with Missing Values: The dataset contained missing values in certain columns (e.g., reaction time and accuracy). To address this, I used the drop_na() function from the dplyr package to remove any rows that had missing values. This ensured that all participants in the dataset had complete data for analysis.
r Copy Edit clean_data <- data %>% drop_na() After this step, clean_data is a new dataset that does not include any rows with missing values.
Creating a New Variable performance_category: I created a new variable, performance_category, to categorize participants based on their accuracy. The categories were:
“High” for accuracy greater than or equal to 90. “Medium” for accuracy between 70 and 90. “Low” for accuracy less than 70. I used the mutate() function to create this variable with the case_when() function to define the conditions for each category:
r Copy Edit clean_data <- clean_data %>% mutate(performance_category = case_when( accuracy >= 90 ~ “High”, accuracy >= 70 & accuracy < 90 ~ “Medium”, accuracy < 70 ~ “Low” )) Filtering Data Based on Conditions: The next step involved filtering the dataset to include only participants in the “Experimental” condition who had reaction times faster than the overall mean reaction time for the dataset. I first calculated the mean reaction time and then used the filter() function to subset the data accordingly.
r Copy Edit mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>% filter(condition == “Experimental” & reaction_time < mean_reaction_time) This gave me a subset of the data where participants in the Experimental condition had faster reaction times than the average participant in the dataset..**
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# 1. Select numeric variables
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
## Error: object 'clean_data' not found
# 2. Create the correlation plot
library(psych)
corPlot(cor(numeric_data, use = "complete.obs"), main = "Correlation Plot")
## Error: object 'numeric_data' not found
# Interpretation of the plot: # - Strong correlations between anxiety_pre and anxiety_post are expected. # - Reaction time and accuracy may show a negative correlation, as faster reaction times could correlate with higher accuracy. # - These relationships can inform future research, such as exploring how reducing anxiety impacts reaction time and accuracy in tasks.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
**Question 7: Reflection Research Question in Psychology
Research Question: How does sleep deprivation affect cognitive performance? Data to Collect: Reaction time, accuracy, and subjective sleepiness scores (e.g., on a Likert scale). Appropriate Statistical Analyses: T-tests or ANOVA to compare cognitive performance between sleep-deprived and well-rested participants, regression analysis to examine relationships. Potential Measurement Errors: Self-reported sleepiness may be biased, and reaction time measurements may be influenced by distractions or test conditions. Learning R for Data Analysis
Impact on Understanding: Learning R has helped me better understand the importance of data visualization and statistical analyses in psychological research. I now feel more confident in handling complex datasets and performing analyses efficiently. Advantages of R: R offers powerful packages and flexibility for statistical analysis and visualization. Challenges: R’s steep learning curve can be intimidating, especially when troubleshooting errors or writing complex code.**
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.