Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Write your answer(s) here The key differences between all of them is that nominal data takes sections without order for example anxiety. Ordinal data is more like on a ranking scale for example like a relation scale (1-10.) Interval data has equal spacing but no true zero for example an iq score. Ratio data is the same thing but with a true zero and an example can be a flinch test.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Write your answer(s) here Depression score (0-63), this meassurement would be interval data because the only take away is the differnce between scores but it doesn’t have a true zero point. Response time in milliseconds, this measurement would be ratio data since there are equal spaces and have a true zero. That basically means there is no response possible. Likert scale rating(1-7), this measurement would be ordinal data because there is order in it and the spaces between aren’t really equal. Diagnostic categories, this measurement would be nominal data since it’s just labels with no numerical meaning. Age in years would be a ratio data measurement because it has equal spaces and a zero start for birth.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Write your answer(s) here The difference is that a Random error happens by a chance and causes little variations in how fast someone can respond in a memory test. Systematic error is a error that is consistent in the measuring process. Using a stopwatch that always runs slow it leads to inaccurate reaction time recordings every time.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Write your answer(s) here If all the stress levels were self-answered differently each time there will be random errors that could make the data unreliable. For academic performance, If that was measured with a biased grading system, systematic errors can make it look like stress has more or less effects than it actaully does. To minimize these errors, researchers should use validated stress scales, and a consistent grading scale.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

# Your code here
data %>%
  group_by(condition) %>%
  summarise(
    mean_reaction_time = mean(reaction_time),
    median_reaction_time = median(reaction_time),
    sd_reaction_time = sd(reaction_time),
    min_reaction_time = min(reaction_time),
    max_reaction_time = max(reaction_time),
    mean_accuracy = mean(accuracy),
    median_accuracy = median(accuracy),
    sd_accuracy = sd(accuracy),
    min_accuracy = min(accuracy),
    max_accuracy = max(accuracy)
  )

## # A tibble: 2 × 11
##   condition    mean_reaction_time median_reaction_time sd_reaction_time
##   <chr>                     <dbl>                <dbl>            <dbl>
## 1 Control                      NA                   NA               NA
## 2 Experimental                 NA                   NA               NA
## # ℹ 7 more variables: min_reaction_time <dbl>, max_reaction_time <dbl>,
## #   mean_accuracy <dbl>, median_accuracy <dbl>, sd_accuracy <dbl>,
## #   min_accuracy <dbl>, max_accuracy <dbl>

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

# Your code here
data <- data %>%
  mutate(anxiety_change = anxiety_pre - anxiety_post)
data %>%
  group_by(condition) %>%
  summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))

## # A tibble: 2 × 2
##   condition    mean_anxiety_change
##   <chr>                      <dbl>
## 1 Control                     3.79
## 2 Experimental                8.64

Write your answer(s) here Results show that the experimental and control groups had difference reaction times and accuracy levels, with variability answered by the standard deviation while the anxiety change variable tells me that anxiety levels decreased more in the experiemntal group, teliing me the intervention worked.

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Your code here
mean_rt <- 350
sd_rt <- 75

prob_gt_450 <- 1 - pnorm(450, mean_rt, sd_rt)
prob_gt_450

## [1] 0.09121122

prob_300_400 <- pnorm(400, mean_rt, sd_rt) - pnorm(300, mean_rt, sd_rt)
prob_300_400

## [1] 0.4950149

Write your answer(s) here The probability of a reaction time surpassing 450ms shows how unique hihg reactions time are, while the probability of a reaction time between 300ms and 400ms helps understand how common reaction times scale in the normal range, allowing psychologists to figure out performance patterns and see any outliers.

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# Your code here
clean_data <- data %>%
  filter(!is.na(reaction_time), !is.na(accuracy))

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

# Your code here
clean_data <- clean_data %>%
  mutate(performance_category = case_when(
    accuracy >= 90 ~ "High",
    accuracy >= 70 & accuracy < 90 ~ "Medium",
    accuracy < 70 ~ "Low"
  ))

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

# Your code here
mean_reaction_time <- mean(clean_data$reaction_time)
filtered_data <- clean_data %>%
  filter(condition == "Experimental", reaction_time < mean_reaction_time)

Write your answer(s) here describing your data cleaning process. I cleaned the dataset by removing missing values, and sectioning accuracy into performance levels, and filtering participants in the experimental condition with faster than average reaction times, making sure a more structured and true dataset.

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
numeric_data <- data %>%
  select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)

cor_matrix <- cor(numeric_data, use = "complete.obs")

corPlot(cor_matrix, numbers = TRUE, main = "Correlation Matrix of Key Variables")

## Error in plot.new(): figure margins too large

pairs.panels(numeric_data, 
             method = "pearson",
             hist.col = "blue",   
             density = TRUE,       
             ellipses = TRUE)

## Error in par(old.par): invalid value specified for graphical parameter "pin"

Write your answer(s) here The correlation plot and pair panels show strong negative correlations between accuracy and reaction time. A positive correlation between the pre and post anxeity levels, and weak relationships between other variables help psychologists understand performance patterns and stress efffects. —

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

Write your answer(s) here 1. My research question would be “Does listening to music while studying affect concentration in college students?” The steps i would take to unsure my research is collect data on music habits, focus levels, and gpa. A correlation test would compare focus between music and people who don’t use music groups. The correlation analysis would check for trends. I would also use a controlled test for the best accuracy and avoiding errors. 2. R makes data analysis more flexible and faster. It feels like having a statistical calculator that gives you answers right away without manually having to do it. The only challenge i see is just remembering the specific code but honestly it is so worth it to learn because R gives me a visual screen of my code and will make graphs for me and there is just a lot of customization you can do to benefit or make your research easier.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.