Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal data is a from of data that is named and separated into categories that are in no apparent order; they are labels and names only and can not be ranked. An example of this from psychological research could be a research experiment interested in observing the differences between gender in hobbies. The nominal data is male or female.
Ordinal data is data that has a meaningful order, but the distance between values are not equal or known. An example of this can be a mental health assessment where questions ask you to fill in if you “Strongly agree, Agree slightly, Neutral, Disagree slightly, or Strongly disagree”.
Interval data is numeric data that has known, equal intervals between values, and there is no true zero. An example of this could bean IQ test, where the difference between scores (like 125 and 130, and 130 and 135) is consistent, but 0 does not mean zero intelligence levels.
Last, ratio data is numeric data that has equal intervals and a true zero point. An example of this could be a psychological research experiment where they are keeping track of how many errors made with instructive tasks. If a participant makes 0 errors, they have made truly zero errors. If a participant makes 2 errors and another makes 4, that person will have made exactly 2 more errors than the other.
Scores on a depression inventory (0-63): This is an example of interval data because scores on this inventory are numerical, have equal intervals between values, but the zero would not mean a total absence of depression.
Response time in milliseconds: This is an example of ratio data because response time is a numerical variable with equal intervals and 0 would truly mean zero time passed.
Likert scale ratings of agreement (1-7): This is an example of ordinal data should be used here, becuase the numbers on the Likert scale shows ranked variables, like “Strongly disagree”, but the difference between variables is not necessarily equal or clear.
Diagnostic categories (e.g. ADHD, anxiety disorder, no diagnosis): This is an example of nominal data is being used because the variables are categorical by diagnosis. There is no order and one doesn’t mean “more” than the other.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
A random error is an error that happens due to unpredictable and uncontrollable factors that do not have a pattern. It affects the results of the study. An example of this could be a participant taking part in a research experiment studying memory recall, and there being a distraction, like a beeping fire alarm in the room. It is an unaccounted for factor that negatively affects the results of the studying. A systematic error refers to an error in a study that provides consistent and repeated errors that are due to a flaw in the study’s measurement system. They will often yield the same direction of errored results every time. An example of this is in a research study measuring time in seconds on a handheld stopwatch, and the stopwatch is a few milliseconds late. This error in the measurement system will provide continuous wrong answers to the study.
In an example of a study examining the relationship between stress and academic performance, measurement errors could affect the study in different ways. In my experience, surveys on this topic would be emailed to the teacher for them to distribute during class time for the students to take then. Often times, students will be distracted by peers or worry about others seeing their answers. This would be classified as a random error that could affect the study. An example of a systematic error that could occur is framing questions in a black and white way- one just assesses test anxiety, not other avenues of anxiety or their sources. A way researchers could minimize these errors are making sure or encouraging students and participants take this in a private and comfortable area, free of distractions. Additionally, carefully crafting questions with input from real people who experience test anxiety can help enhance the assessments accuracy.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## reaction_time accuracy
## 1 271.9762 87.53319
## 2 288.4911 84.71453
## 3 377.9354 84.57130
## 4 303.5254 98.68602
## 5 306.4644 82.74229
## 6 385.7532 100.16471
## 7 323.0458 69.51247
## 8 236.7469 90.84614
## 9 NA 86.23854
## 10 277.7169 87.15942
## 11 NA 88.79639
## 12 317.9907 79.97677
## 13 320.0386 81.66793
## 14 305.5341 74.81425
## 15 272.2079 74.28209
## 16 NA 88.03529
## 17 324.8925 89.48210
## 18 201.6691 85.53004
## 19 335.0678 94.22267
## 20 276.3604 105.50085
## 21 246.6088 80.08969
## 22 289.1013 61.90831
## 23 248.6998 95.05739
## 24 263.5554 77.90799
## 25 268.7480 78.11991
## 26 215.6653 95.25571
## 27 341.8894 82.15227
## 28 307.6687 72.79282
## 29 243.0932 86.81303
## 30 362.6907 NA
## 31 321.3232 85.05764
## 32 285.2464 88.85280
## 33 344.7563 81.29340
## 34 343.9067 91.44377
## 35 341.0791 82.79513
## 36 334.4320 88.31782
## 37 327.6959 95.96839
## 38 296.9044 89.35181
## 39 284.7019 81.74068
## 40 280.9764 96.48808
## 41 265.2647 94.93504
## 42 289.6041 90.48397
## 43 236.7302 NA
## 44 408.4478 78.72094
## 45 360.3981 98.60652
## 46 243.8446 78.99740
## 47 279.8558 106.87333
## 48 276.6672 100.32611
## 49 338.9983 82.64300
## 50 295.8315 74.73579
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.# # Create anxiety_change and calculate mean by condition
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
# Mean anxiety change for each condition
data %>%
group_by(condition) %>%
summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Your code here
```# Given values
mean_rt <- 350
sd_rt <- 75
# a. Probability that reaction time > 450ms
p_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
# b. Probability that reaction time is between 300ms and 400ms
p_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
# Print the results
p_greater_450
p_between_300_400
**Write your answer(s) here**
---
## Part 3: Data Cleaning and Manipulation
### Question 5: Data Cleaning with dplyr
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
1. Remove all rows with missing values and create a new dataset called `clean_data`.
install.packages("dplyr")
# Load the dplyr package
library(dplyr)
# Remove rows with missing values and create a new dataset called clean_data
clean_data <- na.omit(data)
# View the first few rows of the cleaned dataset
head(clean_data)## Error in parse(text = input): attempt to use zero-length variable name
performance_category that
categorizes participants based on their accuracy:
library(dplyr)
clean_data <- na.omit(data) # assuming 'data' exists from part 2
# Create the performance_category variable based on accuracy
clean_data <- clean_data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High", # High if accuracy is >= 90
accuracy >= 70 & accuracy < 90 ~ "Medium", # Medium if accuracy is between 70 and 90
accuracy < 70 ~ "Low" # Low if accuracy is < 70
))
head(clean_data)
library(dplyr)
# Calculate the overall mean reaction time, excluding NA values
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)## Error: object 'clean_data' not found
# Filter the dataset: Experimental condition and reaction time faster than the mean
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)## Error: object 'clean_data' not found
## Error: object 'filtered_data' not found
I removed missing values from the dataset using na.omit() , created a new variable (performance_category) , and filtered data based on conditions and reactions.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.install.packages(“psych”) library(psych) data\(anxiety_change <- data\)anxiety_pre - data$anxiety_post cor_data <- data[, c(“reaction_time”, “accuracy”, “anxiety_pre”, “anxiety_post”, “anxiety_change”)] corPlot(cor_data, upper = TRUE, main = “Correlation Plot for Selected Variables”) ```
anxiety_pre and anxiety_post seem to be positively correlated. anxiety_change is negatively correlated with anxiety_pos. The research suggests that higher anxiety leads to slower reaction times, and could inform future experiments exploring how stress impacts cognitive performance.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address? A specific reserach questionin psychology that interests me is: What factors make people more likely to lie in everyday social situations? To do this, I would conduct a study that collects quantitative data in self-reportest questionnaires, scales that measure personality traits, a an experiment in a lab that puts participants in a situation where lying benefits them to observe behavior. Potential measurement errots could be bias from participants where they lie to appear better socially or where they behave differently becuase they know they are being studied.
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
Learning R for datat analysis has chnaged by understandingof psychological studies becuase it allows me to apply statistical idea in a real world way. Hwen runnning chunks, I can see results right away, giving me a better and more thorough understaning of data. The biggest advantage is the customization of the space, and the biggest disadvanatge is running into errors in the code and self-correcting.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.