Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Write your answer(s) here Nominal data is unordered and unranked categories or labels that can’t be ordered. In terms of psychological research nominal data is like when they are doing a study about different types of therapy whether it maybe be cognitive or humanistic therapy and finding the effectiveness of each of those. In ordinal data the data can be ranked and ordered. An example of ordinal in a psychological research sense would be a Likert scale measuring the participants agreement to the statement, choices like “highly agree” to “highly disagree”. Interval data is an ordered and ranked data type with equally distanced values form one another. As an example of interval data in psychological research, it can be time, seconds are equally sized and also objective. Lastly is ratio data has a zero point and equal intervals. An example of ratio data in psychological research is reaction time, measuring how quickly someone may respond to a stimulus, zero being no reaction time.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Write your answer(s) here The Likert scale rating of agreement (1-7) would most definitely work with ordinal data. Ordinal data is data that can be ranked and ordered, which is perfect for this type of research. Also using ordinal data as measurement would be scores on a depression inventory (0-63). It would represent on an orderly ranked system of depression severity. Each being on a scale scored and the final score indicating the overall level of depression. Response time in milliseconds would be with ratio data since it has a zero point and equal intervals, due to the fact it is measuring how quickly someone is responding to a stimulus, and zero being no reaction time it is perfect. Another one that is good for ratio data is age in years. Age does have a zero point being birth, so zero being the absence of age and the age’s intervals between five-years-old and six-years-old are the same. Lastly nominal data would be a good measurement for diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis), representing clear categories and labels without order or rankings.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Write your answer(s) here Random error is defined as unpredictable variations in measurements that happen dude to unknown factors or chance. An example in the context of a memory experiment an individual may read a number unknowingly wrong on a list of numbers, leaving to a random variation in the results at hand. While systematic error is defined as constant and repeatable mistakes that in turn affect all measurements in the same way, likely from a flaw in the equipment or the overall experimental design. In regards to memory experiment the directions in an experiment may be unclear to the participants causing them to have a systemic error in the results.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Write your answer(s) here Measurement errors can affect validity of a study on examining the relationship between stress and academic performance by compromising the validity of the data, making it hard to draw throughout conclusions. This in turn leads to faulty data affecting future research, clinical practice, and policy-making. What steps can researchers take to prevent this from occurring? Well it all comes down to making sure everything is checked and recorded correctly each time, which helps if multiple people do it so if one messes up collecting the data it can easily be found. They should make sure that their measurement tools are effective and they take all the cautionary steps to insure the least amount of errors by testing and or looking their items. Faulty instruments too need to be overlooked to ensure the best courses are taken for preventing missteps.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated data set for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

# Sample data vector
accuracy <- c(87.5, 85.7, 84.6, 98.7, 83.7, 100.2)

# Sample data vector
reaction_times <- c(272, 288, 378, 304, 306,386)

# Calculate mean
mean(reaction_times)

## [1] 322.3333

# Calculate median
median(reaction_times)

## [1] 305

# Calculate standard deviation
sd(reaction_times)

## [1] 47.89015

#Calculate maximum
max(reaction_times)

## [1] 386

#Calculate minimum
min(reaction_times)

## [1] 272

# Calculate mode
get_mode <- function(x) {
  uniqv <- unique(x)
  uniqv[which.max(tabulate(match(x, uniqv)))]
}

get_mode(reaction_times)

## [1] 272

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

# Creating a new variable using mutate()
data <- data %>% 
  mutate(anxiety_change = anxiety_pre - anxiety_post)
 
# Calculate the mean anxiety change for each conditions
data %>% group_by(condition) %>% 
summarize(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))

## # A tibble: 2 × 2
##   condition    mean_anxiety_change
##   <chr>                      <dbl>
## 1 Control                     3.79
## 2 Experimental                8.64

Write your answer(s) here In regards to the the controlled condition the mean anxiety change was 3.8. Controlled conditions tend to represents a lower mean of anxiety change. The experimental condition was vastly different having a mean anxiety change of 8.6. Which showcases a much higher mean than the previous data.

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Define parameter
mean <- 350
sd <- 75

# Calculate the probability of a reaction time greater than 450ms
prob_more_than <- 1 - pnorm(450, mean, sd)
print(paste("Probability of a reaction time greater than 450ms:", prob_more_than))

## [1] "Probability of a reaction time greater than 450ms: 0.0912112197258679"

# Calculate the probability of a reaction time between 300ms and 400ms
prob_between_300_and_400 <-pnorm(450, mean, sd) - pnorm(90, mean, sd)
print(paste("Probability of a score between 300ms and 400ms", prob_between_300_and_400))

## [1] "Probability of a score between 300ms and 400ms 0.908525302808537"

Write your answer(s) here In regards to the first question stating “What is the probability that a randomly selected participant will have a reaction time greater than 450ms?”, the probability of the reaction time being greater than the 450ms is 0.09. Leading to a very small pool of participants being able go over that amount, representing outliers. The second research question asks “What is the probability that a participant will have a reaction time between 300ms and 400ms?”. Now this data is the majority and the probability of a score between 300ms and 400ms 0.91. Most people will fall into that range of data. —

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# Remove rows with NA values and create 'clean_data'
clean_data <- data %>% na.omit()

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

data <- data %>% 
  mutate(performance_category = case_when(
    accuracy >= 90 ~ "High",
    accuracy >= 70 & accuracy < 90 ~ "Medium",
    accuracy < 70 ~ "Low",
    TRUE ~ NA_character_ ))

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

# Calculate the overall mean reaction time
mean_reaction_time <- mean(data$reaction_time, na.rm = TRUE)

#Filter the dataset for the Experimental condition and reaction times
filtered_data <- data %>% 
  filter(condition == "Experimental", reaction_time < mean_reaction_time)

Write your answer(s) here describing your data cleaning process. To start off a added a “Remove rows with NA values and create ‘clean_data’” followed by “clean_data <- data %>% na.omit()” this in turn removed all the rows with missing values. Then created a new data set called ‘clean_data’. To categorize participants accuracy within their scores I added a “performance_category”. After that clean up I filtered out the ouliers in the experimental condition, to narrow out the data for easier analysis. Overall this is important to use it is great to organize the data. — ## Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

clean_data <- data %>% na.omit()
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post anxiety_change performance_category
## 1     29.05312       2.248794               Medium
## 2     19.21510      11.937239               Medium
## 3     20.45306       7.204565               Medium
## 4     13.75199       3.180993                 High
## 5     17.84736       6.197018               Medium
## 6     19.93397       2.822870                 High

Write your answer(s) here What was a surprising relationship was the male’s anxiety pre versus post. It was already such a low number when compared to the females in the study, and the post then went more down. The variables I find correlated are the gender and the pre anxiety levels. The females tended to have way higher pre anxiety measures than the male counterparts. How might these correlations inform further research in psychology? It will showcase how females tend to have similar reaction times to males but females tend to have more higher pre anxiety than males. It can show the gender differences and similarities which can help future research. — ## Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

Write your answer(s) here How common is OCD in the United States? In terms of data I need to collect how many individuals reside in America to then have a base of out of the many people what percentage are there (population). From researchers and psychologists I would collect how many people they have reported have been diagnosed with OCD. Regarding OCD and how to approach an appropriate statistical analysis I would use the “Yale-Brown Obsessive Compulsive Scale”. It is a ten item measure of severity of obsessive-compulsive symptoms regardless of symptom presentation, it is also considered the “gold-standard” when getting outcomes in OCD literature. The potential measurement errors that may occur are incorrect diagnoses that never got reevaluated again and got into the system of data. Also could be taken into account of people that may have OCD but never actually got psychological evaluations from professionals to be diagnosed, so their data goes unaccounted for. All leading to a sort of guesstimate on the number of people in America of Obsessive Compulsive Disorder. R has made my eyes open to many different ways people really take on data analysis in terms of psychological statistics. I before this never really new the names of different types of psychological statistics and now I have so much more appreciation for this work! In terms of R it is way easier to compile all your needed elements into one space using this than other coding software. I don’t need to go use multiple apps to launch my coding, making it way less stressful all together and just a great overall tool. I love that it tells me my mistakes in spelling as well since I have difficulty in that area. What is necessarily not good is we can only “knit” the whole thing once to look at the final product so even if I look over my project usually I still find errors, but since I knitted my work it is too late.
—

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.