Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Nominal is unordered mutually exclusive categorizing, ex. Gender. Ordinal is mutually exclusive data in a ordered or gradient like categorizing, ex. level of pain. Interval is a measurement of data on a scale with equidistant points, ex. length and temperature. Ratio is like Interval, but it also has a meaningful true zero, ex. height and weight.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

“Scores on a depression inventory (0-63)” would be ratio as it has has true zero point meaning no depression and is categorizing with order and measure. “Response time in milliseconds” is ratio as it is organized, scaled, and has a meaningful 0 as 0ms is no reaction at all. “Likert scale ratings of agreement (1-7)” is ordinal as there is no meaningful zero point and has a lack of equidistant measurement as 2 to 3 may not be the same as 6 to 7 would be. “Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)” is nominal as it is unordered and mutually exclusive. “Age in years” is ratio as it has a order, equidistant measure, and a meaningful zero point.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Random error: a unpredictable variation in data due to human error or random circumstances like misreading a piece of data while recording it. Systematic Error: a predictable error that is readable and static like a poorly calibrated scale.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Validity could be effected by measurement error as people recording data may mess up while recording. This may skew or effect the data and give a wrong output. In order to prevent this you could have multiple recorders at once or maybe even a recording camera to review the data in order to r4educe error and increase valididty or the data.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

# Your code here
describeBy(data$accuracy, data$condition, mat = TRUE, digits = 2)

##     item       group1 vars  n  mean   sd median trimmed  mad   min    max range
## X11    1      Control    1 29 85.49 9.86  85.53   85.68 8.77 61.91 105.50 43.59
## X12    2 Experimental    1 19 88.06 8.20  88.32   87.76 9.86 74.28 106.87 32.59
##      skew kurtosis   se
## X11 -0.15    -0.35 1.83
## X12  0.45    -0.45 1.88

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

# Your code here
data <- data %>%
  mutate(anxiety_change = anxiety_pre - anxiety_post)

data %>%
  group_by(condition) %>%
  summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))

## # A tibble: 2 × 2
##   condition    mean_anxiety_change
##   <chr>                      <dbl>
## 1 Control                     3.79
## 2 Experimental                8.64

The mean anxiety for the control is 3.79 while the mean for the experimental is 8.64

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Your code here
mean_rt <- 350  
sd_rt <- 75     

# (a) Probability of reaction time > 450ms
p_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)

# (b) Probability of reaction time between 300ms and 400ms
p_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)


p_greater_450

## [1] 0.09121122

The prob of someone having a 450ms or more reaction time is .09 and the prob of 300-400 ms is .49.

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# Your code here
clean_data <- na.omit(data)

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

# Your code here
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)

filtered_data <- clean_data %>%
  filter(condition == "Experimental" & reaction_time < mean_reaction_time)

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

# Your code here
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)

filtered_data <- clean_data %>%
  filter(condition == "Experimental" & reaction_time < mean_reaction_time)

I first cleaned he data. Created a new variable (performance_category) which classifies off of of accuarcy. 90 and above is high 90 down to 70 is medium and below 70 is low. Then I only filtlered for those above the reaction time mean.

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
numeric_data <- clean_data %>%
  select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)

corPlot(cor(numeric_data, use = "pairwise.complete.obs"), 
        numbers = TRUE,  # Display correlation values
        upper = FALSE,   # Show only lower triangle
        main = "Correlation Plot of Key Variables")

## Error in plot.new(): figure margins too large

anxiety and pre and post have a strong correlation. Anxiety change and pre also do too. reaction time and accuracy have a suprising lack of correlation. This may gfive a better insight in to the reaction of the neversystem in times demanding reaction with anxiety.

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

(1.) One specific research question that interests me in psych is the correlation between gaming experience and technical skills like hand eye coordination. You test people who play a lot of video games in different exercises or tasks that demand theses skills and can put that against the average person. One error may be the previous skills a person has like if you tested coordination in throwing and catching a gamer with baseball experience may skew the data.(2.) learingnig R has shown me the importance of how you record and do expermeints in order to get the best results. It is a hard and through process. The advantage is the computational aspect it allows data to be ordered, but the disadvantage is the unintutitve and rather complex lanugage and systems it imploys for the average user.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.