Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Write your answer(s) here The key differences between nominal, ordinal, interval, and ratio data include the following. Nominal Data consists of categories without any inherent order. For example, gender, eye color, or diagnostic categories, like ADHD and anxiety. Ordinal Data has a meaningful order but the intervals between the values are not equal. For example, education levels, High School, Master’s Degree, PHD, etc., or a pain scale from 1 to 10, since it is a subjective perception. Interval Data has equal intervals between values but no true zero point. For example, temperature in Fahrenheit or Celsius, or year of birth. Ratio Data is very similar to interval data, but with a true zero point, meaning you can make meaningful ratio comparisons. For example, speed in miles per hour, or weight in pounds.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Write your answer(s) here The Scores on a depression inventory (0-63) is interval data because the scores have equal intervals between values. It is interval data because a score of 0 does not mean a complete absence of depression. A Response time in milliseconds is ratio data because there is a true zero, (0 milliseconds meaning no response time), and the differences between values are meaningful. The Likert Scale ratings of agreement (1-7) is ordinal data because the scale shows an order, (strongly disagree to strongly agree), but the intervals between the values might not be exactly equal. Diagnostic categories are nominal because the categories have no inherent order. Age in Years is ratio data because age has a true zero, and differences between values are meaningful.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Write your answer(s) here The difference between random and systematic error is that, random error has unpredictable variations in measurement that happen due to chance, they do not consistently have one outcome over another. An example being, If participants remember different numbers of words from a memory list due to distractions or from simply not paying attention, this can create random error, which causes inconsistencies, which then leads to unreliable data, but that doesn’t put the results in a specific direction. Systematic error happens consistently and in the same direction, mostly caused by flaws or measurement errors in the experiment. An example includes, if the size of words in a memory test is smaller for one group and bigger for another, the participants in the larger group might do better from being able to read it better. This now becomes a bias, favoring one group over the other.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Write your answer(s) here Measurement errors can distort the relationship between stress and academic performance by having inaccuracies. If the stress levels aren’t measured precisely, or if the academic performance is examined poorly, the study might fail to find a true correlation or might find a misleading one. Steps researchers can take to minimize these errors include using reliable measurement tools, minimizing bias in data collections, increasing sample size to help even out random errors, and using objective measures for a more reliable assessment.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

# Your code here
# Load necessary libraries
library(dplyr)
library(psych)
# Compute descriptive statistics for reaction time and accuracy
data %>%
  group_by(condition) %>%
  summarise(
    mean_reaction = mean(reaction_time, na.rm = TRUE),
    median_reaction = median(reaction_time, na.rm = TRUE),
    sd_reaction = sd(reaction_time, na.rm = TRUE),
    min_reaction = min(reaction_time, na.rm = TRUE),
    max_reaction = max(reaction_time, na.rm = TRUE),
    mean_accuracy = mean(accuracy, na.rm = TRUE),
    median_accuracy = median(accuracy, na.rm = TRUE),
    sd_accuracy = sd(accuracy, na.rm = TRUE),
    min_accuracy = min(accuracy, na.rm = TRUE),
    max_accuracy = max(accuracy, na.rm = TRUE) )

## # A tibble: 2 × 11
##   condition  mean_reaction median_reaction sd_reaction min_reaction max_reaction
##   <chr>              <dbl>           <dbl>       <dbl>        <dbl>        <dbl>
## 1 Control             301.            300.        48.5         202.         408.
## 2 Experimen…          296.            288.        38.4         216.         378.
## # ℹ 5 more variables: mean_accuracy <dbl>, median_accuracy <dbl>,
## #   sd_accuracy <dbl>, min_accuracy <dbl>, max_accuracy <dbl>

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

# Your code here
# Create new variable for anxiety change
data <- data %>%
  mutate(anxiety_change = anxiety_pre - anxiety_post)

# Compute the mean anxiety change for each condition
data %>%
  group_by(condition) %>%
  summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))

## # A tibble: 2 × 2
##   condition    mean_anxiety_change
##   <chr>                      <dbl>
## 1 Control                     3.79
## 2 Experimental                8.64

Write your answer(s) here The experimental group had a slightly faster reaction time and a higher accuracy compared to the control group, with less variability in both measures. The experimental group also had a greater reduction in anxiety, showing that the experimental condition was more effective with reducing anxiety than the control condition. Results: The mean reaction time was 301.40 ms for the control group and 295.75 ms for the experimental group. The median reaction time was 299.68 ms for control and 288.49 ms for experimental. The standard deviation was 48.54ms in the control group and 38.37 ms in the experimental group, showing more variability in the control group. The minimum reaction time was 201.67 ms for control and 215.67 ms for experimental. The maximum reaction time was 408.45 ms for control and 377.95 ms for experimental. The mean accuracy was 85.49% in the control group and 88.06% in the experimental group. The median accuracy was 85.53% for control and 88.32% for experimental. The standard deviation was 9.86% in the control group and 8.20% in the experimental group, showing higher variability in the control group. The minimum accuracy was 61.91% for control and 74.28% for experimental. The maximum accuracy was 105.50% for control and 106.87% for experimental.

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Your code here
# Given values
mean_rt <- 350  # Mean reaction time
sd_rt <- 75     # Standard deviation

# Probability of reaction time > 450ms
prob_greater_450 <- 1 - pnorm(450, mean = mean_rt, sd = sd_rt)
prob_greater_450

## [1] 0.09121122

# Probability of reaction time between 300ms and 400ms
prob_between_300_400 <- pnorm(400, mean = mean_rt, sd = sd_rt) - pnorm(300, mean = mean_rt, sd = sd_rt)
prob_between_300_400

## [1] 0.4950149

Write your answer(s) here The probability that a participant has a reaction time greater than 450ms is 0.0912 or (9.12%). The probability that a participant’s reaction time is between 300ms and 400ms is 0.4950 or (49.50%).

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# Your code here
# Remove missing values using na.omit()
clean_data <- na.omit(data)

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

# Your code here
# Create performance_category variable based on accuracy
clean_data <- clean_data %>%
  mutate(performance_category = case_when(
    accuracy >= 90 ~ "High",
    accuracy >= 70 & accuracy < 90 ~ "Medium",
    accuracy < 70 ~ "Low"
  ))

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

# Your code here
# Compute the overall mean reaction time
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)

# Filter the dataset
filtered_data <- clean_data %>%
  filter(condition == "Experimental" & reaction_time < mean_reaction_time)

Write your answer(s) here describing your data cleaning process. After the cleaning process, the dataset had 45 observations and 9 variables. There was a decrease in observations and an increase in variables, because of the addition of the performance_category. The dataset was filtered to include only participants in the experimental condition with reaction times faster than the overall mean reaction time. This resulted in 10 observations and 9 variables.

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
# Load necessary packages
library(dplyr)
library(psych)

# Create a new dataset with only numeric variables
numeric_data <- clean_data %>%
  select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)
# Generate correlation plot
corPlot(numeric_data, cex = 1.2)

## Error in plot.new(): figure margins too large

Write your answer(s) here The correlation plot shows a strong positive correlation between anxiety_pre and anxiety_post, showing that the participants with a higher anxiety before the task showed higher anxiety afterward. There is a weak negative correlation between reaction time and accuracy showing that faster reaction times are slightly associated with better accuracy, but the relationship is small. Anxiety_change has a weak negative correlation with anxiety_post , which shows the greater reductions in anxiety are linked to lower post-task anxiety. —

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

Write your answer(s) here 1. A research question in Psychology that interests me is “How does anxiety affect the performance and scores of college students?” This interests me because I am going through this process right now and I feel my anxiety holds me back, especially when it comes to taking exams. I would collect self-reported test anxiety scores and academic performance measurements, like GPA and exam grades. A correlation analysis would show the relationship between the anxiety and performance, and a regression analysis could possibly show if test anxiety predicts future outcomes as a student. Possible issues could be the self-report bias in the anxiety scores and other factors and habits that could be affecting GPA. 2. Learning R for data analysis has truly become a dream come true. R has allowed me to better understand statistics. Although learning how to code is a challenge, for someone who can learn anything but math, it has truly made a huge difference.The advantages are endless, like bettering our visualization of relationships in psychology, the ability to show correlation, and using a hands-on learning technique is always the best way to learn in my opinion. A challenge of R is just learning and understanding the codes and maybe technical issues on a computer, but It has luckily been pretty easy to learn. In my opinion, R has better flexibility and visualization, and just feels more user friendly than other statistical softwares. —

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.