Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Write your answer(s) here Nominal data is unordered and unranked categories or labels that can’t be ordered. In terms of psychological research nominal data is like when they are doing a study about different types of therapy whether it maybe be cognitive or humanistic therapy and finding the effectiveness of each of those. In ordinal data the data can be ranked and ordered. An example of ordinal in a psychological research sense would be a Likert scale measuring the participants agreement to the statement, choices like “highly agree” to “highly disagree”. Interval data is an ordered and ranked data type with equally distanced values form one another. As an example of interval data in psychological research, it can be time, seconds are equally sized and also objective. Lastly is ratio data has a zero point and equal intervals. An example of ratio data in psychological research is reaction time, measuring how quickly someone may respond to a stimulus, zero being no reaction time.
Write your answer(s) here The Likert scale rating of agreement (1-7) would most definitely work with ordinal data. Ordinal data is data that can be ranked and ordered, which is perfect for this type of research. Also using ordinal data as measurement would be scores on a depression inventory (0-63). It would represent on an orderly ranked system of depression severity. Each being on a scale scored and the final score indicating the overall level of depression. Response time in milliseconds would be with ratio data since it has a zero point and equal intervals, due to the fact it is measuring how quickly someone is responding to a stimulus, and zero being no reaction time it is perfect. Another one that is good for ratio data is age in years. Age does have a zero point being birth, so zero being the absence of age and the age’s intervals between five-years-old and six-years-old are the same. Lastly nominal data would be a good measurement for diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis), representing clear categories and labels without order or rankings.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Write your answer(s) here Random error is defined as unpredictable variations in measurements that happen dude to unknown factors or chance. An example in the context of a memory experiment an individual may read a number unknowingly wrong on a list of numbers, leaving to a random variation in the results at hand. While systematic error is defined as constant and repeatable mistakes that in turn affect all measurements in the same way, likely from a flaw in the equipment or the overall experimental design. In regards to memory experiment the directions in an experiment may be unclear to the participants causing them to have a systemic error in the results.
Write your answer(s) here Measurement errors can affect validity of a study on examining the relationship between stress and academic performance by compromising the validity of the data, making it hard to draw throughout conclusions. This in turn leads to faulty data affecting future research, clinical practice, and policy-making. What steps can researchers take to prevent this from occurring? Well it all comes down to making sure everything is checked and recorded correctly each time, which helps if multiple people do it so if one messes up collecting the data it can easily be found. They should make sure that their measurement tools are effective and they take all the cautionary steps to insure the least amount of errors by testing and or looking their items. Faulty instruments too need to be overlooked to ensure the best courses are taken for preventing missteps.
The code below creates a simulated data set for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## [1] 322.3333
## [1] 305
## [1] 47.89015
## [1] 386
## [1] 272
# Calculate mode
get_mode <- function(x) {
uniqv <- unique(x)
uniqv[which.max(tabulate(match(x, uniqv)))]
}
get_mode(reaction_times)## [1] 272
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.# Creating a new variable using mutate()
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)
# Calculate the mean anxiety change for each conditions
data %>% group_by(condition) %>%
summarize(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
Write your answer(s) here In regards to the the controlled condition the mean anxiety change was 3.8. Controlled conditions tend to represents a lower mean of anxiety change. The experimental condition was vastly different having a mean anxiety change of 8.6. Which showcases a much higher mean than the previous data.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Define parameter
mean <- 350
sd <- 75
# Calculate the probability of a reaction time greater than 450ms
prob_more_than <- 1 - pnorm(450, mean, sd)
print(paste("Probability of a reaction time greater than 450ms:", prob_more_than))## [1] "Probability of a reaction time greater than 450ms: 0.0912112197258679"
# Calculate the probability of a reaction time between 300ms and 400ms
prob_between_300_and_400 <-pnorm(450, mean, sd) - pnorm(90, mean, sd)
print(paste("Probability of a score between 300ms and 400ms", prob_between_300_and_400))## [1] "Probability of a score between 300ms and 400ms 0.908525302808537"
Write your answer(s) here In regards to the first question stating “What is the probability that a randomly selected participant will have a reaction time greater than 450ms?”, the probability of the reaction time being greater than the 450ms is 0.09. Leading to a very small pool of participants being able go over that amount, representing outliers. The second research question asks “What is the probability that a participant will have a reaction time between 300ms and 400ms?”. Now this data is the majority and the probability of a score between 300ms and 400ms 0.91. Most people will fall into that range of data. —
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.performance_category that
categorizes participants based on their accuracy:
data <- data %>%
mutate(performance_category = case_when(
accuracy >= 90 ~ "High",
accuracy >= 70 & accuracy < 90 ~ "Medium",
accuracy < 70 ~ "Low",
TRUE ~ NA_character_ ))# Calculate the overall mean reaction time
mean_reaction_time <- mean(data$reaction_time, na.rm = TRUE)
#Filter the dataset for the Experimental condition and reaction times
filtered_data <- data %>%
filter(condition == "Experimental", reaction_time < mean_reaction_time)Write your answer(s) here describing your data cleaning process. To start off a added a “Remove rows with NA values and create ‘clean_data’” followed by “clean_data <- data %>% na.omit()” this in turn removed all the rows with missing values. Then created a new data set called ‘clean_data’. To categorize participants accuracy within their scores I added a “performance_category”. After that clean up I filtered out the ouliers in the experimental condition, to narrow out the data for easier analysis. Overall this is important to use it is great to organize the data. — ## Part 4: Visualization and Correlation Analysis
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post anxiety_change performance_category
## 1 29.05312 2.248794 Medium
## 2 19.21510 11.937239 Medium
## 3 20.45306 7.204565 Medium
## 4 13.75199 3.180993 High
## 5 17.84736 6.197018 Medium
## 6 19.93397 2.822870 High
Write your answer(s) here What was a surprising relationship was the male’s anxiety pre versus post. It was already such a low number when compared to the females in the study, and the post then went more down. The variables I find correlated are the gender and the pre anxiety levels. The females tended to have way higher pre anxiety measures than the male counterparts. How might these correlations inform further research in psychology? It will showcase how females tend to have similar reaction times to males but females tend to have more higher pre anxiety than males. It can show the gender differences and similarities which can help future research. — ## Part 5: Reflection and Application
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
Write your answer(s) here How common is OCD in the
United States? In terms of data I need to collect how many individuals
reside in America to then have a base of out of the many people what
percentage are there (population). From researchers and psychologists I
would collect how many people they have reported have been diagnosed
with OCD. Regarding OCD and how to approach an appropriate statistical
analysis I would use the “Yale-Brown Obsessive Compulsive Scale”. It is
a ten item measure of severity of obsessive-compulsive symptoms
regardless of symptom presentation, it is also considered the
“gold-standard” when getting outcomes in OCD literature. The potential
measurement errors that may occur are incorrect diagnoses that never got
reevaluated again and got into the system of data. Also could be taken
into account of people that may have OCD but never actually got
psychological evaluations from professionals to be diagnosed, so their
data goes unaccounted for. All leading to a sort of guesstimate on the
number of people in America of Obsessive Compulsive Disorder. R has made
my eyes open to many different ways people really take on data analysis
in terms of psychological statistics. I before this never really new the
names of different types of psychological statistics and now I have so
much more appreciation for this work! In terms of R it is way easier to
compile all your needed elements into one space using this than other
coding software. I don’t need to go use multiple apps to launch my
coding, making it way less stressful all together and just a great
overall tool. I love that it tells me my mistakes in spelling as well
since I have difficulty in that area. What is necessarily not good is we
can only “knit” the whole thing once to look at the final product so
even if I look over my project usually I still find errors, but since I
knitted my work it is too late.
—
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.