Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Write your answer(s) here Key Differences Between Nominal, Ordinal, Interval, and Ratio Data

Nominal: Data that represent categories with no meaningful order or ranking. The numbers or labels used are purely for identification. Example: Gender (Male, Female) or Diagnostic categories (e.g., ADHD, anxiety disorder). Ordinal: Data that represent categories with a meaningful order but the intervals between the categories are not consistent or meaningful. Example: Likert scale ratings (Strongly agree, Agree, Neutral, Disagree, Strongly disagree). Interval: Data that have meaningful intervals between values, but there is no true zero point (i.e., zero does not indicate an absence of the attribute). Example: Temperature measured in Celsius or Fahrenheit. A temperature of 0°C doesn’t mean “no temperature.” Ratio: Data with meaningful intervals and a true zero point, meaning that zero indicates an absence of the attribute. Example: Height, weight, or reaction time (e.g., reaction time measured in milliseconds).

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Write your answer(s) here Scores on a depression inventory (0-63): Interval. The scores have meaningful intervals, but there is no true zero point indicating the complete absence of depression.

Response time in milliseconds: Ratio. Response time has a meaningful zero point, indicating no time elapsed.

Likert scale ratings of agreement (1-7): Ordinal. There is an order to the ratings, but the intervals between them may not be equal.

Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis): Nominal. These are categories without any intrinsic ordering.

Age in years: Ratio. Age has a true zero point (birth) and meaningful intervals.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Write your answer(s) here Difference Between Random and Systematic Error

Random Error: Occurs due to unpredictable factors, leading to variations in measurements that are random and do not systematically affect the results. Example: In a memory experiment, random errors could occur due to variations in a participant’s focus or fatigue during testing, leading to variability in recall performance. Systematic Error: Consistent, repeatable errors that affect the measurement in the same direction every time, often due to a flaw in the measurement instrument or procedure. Example: If the memory test is timed incorrectly (e.g., clock runs slow), all participants will be consistently measured with less time to complete the task, leading to inaccurate results.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Write your answer(s) here Impact: Measurement errors can undermine the study’s validity by introducing bias or noise into the data. For example, if stress is measured using a flawed or inconsistent scale, it could distort the true relationship between stress and academic performance.

Steps to Minimize Error:

Use validated and reliable measurement tools for both stress and academic performance. Ensure that all participants are tested under consistent conditions to avoid systematic errors. Use statistical techniques (e.g., reliability analysis) to assess and correct for measurement error when possible.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

library(psych)
library(dplyr)

# 1. Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition
data %>%
  group_by(condition) %>%
  summarise(
    reaction_time_mean = mean(reaction_time, na.rm = TRUE),
    reaction_time_median = median(reaction_time, na.rm = TRUE),
    reaction_time_sd = sd(reaction_time, na.rm = TRUE),
    reaction_time_min = min(reaction_time, na.rm = TRUE),
    reaction_time_max = max(reaction_time, na.rm = TRUE),
    accuracy_mean = mean(accuracy, na.rm = TRUE),
    accuracy_median = median(accuracy, na.rm = TRUE),
    accuracy_sd = sd(accuracy, na.rm = TRUE),
    accuracy_min = min(accuracy, na.rm = TRUE),
    accuracy_max = max(accuracy, na.rm = TRUE)

## Error in parse(text = input): <text>:18:0: unexpected end of input
## 16:     accuracy_min = min(accuracy, na.rm = TRUE),
## 17:     accuracy_max = max(accuracy, na.rm = TRUE)
##    ^

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

data <- data %>%
  mutate(anxiety_change = anxiety_pre - anxiety_post)

data %>%
  group_by(condition) %>%
  summarise(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))

## # A tibble: 2 × 2
##   condition    mean_anxiety_change
##   <chr>                      <dbl>
## 1 Control                     3.79
## 2 Experimental                8.64

Write your answer(s) here

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Required library
library(stats)

# 1. a. Calculate the probability that reaction time is greater than 450ms
probability_450ms <- 1 - pnorm(450, mean = 350, sd = 75)

# 1. b. Calculate the probability that reaction time is between 300ms and 400ms
probability_300_400ms <- pnorm(400, mean = 350, sd = 75) - pnorm(300, mean = 350, sd = 75)

probability_450ms

## [1] 0.09121122

probability_300_400ms

## [1] 0.4950149

Write your answer(s) here

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# 1. Remove rows with missing values and create a new dataset
clean_data <- data %>%
  drop_na()

## Error in drop_na(.): could not find function "drop_na"

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

# 2. Create a new variable 'performance_category'
clean_data <- clean_data %>%
  mutate(performance_category = case_when(
    accuracy >= 90 ~ "High",
    accuracy >= 70 & accuracy < 90 ~ "Medium",
    accuracy < 70 ~ "Low"
  ))

## Error: object 'clean_data' not found

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)

## Error: object 'clean_data' not found

filtered_data <- clean_data %>%
  filter(condition == "Experimental" & reaction_time < mean_reaction_time)

## Error: object 'clean_data' not found

**Data Cleaning Process In this part of the task, I performed several key steps to clean and manipulate the dataset using the dplyr package in R.

Removing Rows with Missing Values: The dataset contained missing values in certain columns (e.g., reaction time and accuracy). To address this, I used the drop_na() function from the dplyr package to remove any rows that had missing values. This ensured that all participants in the dataset had complete data for analysis.

r Copy Edit clean_data <- data %>% drop_na() After this step, clean_data is a new dataset that does not include any rows with missing values.

Creating a New Variable performance_category: I created a new variable, performance_category, to categorize participants based on their accuracy. The categories were:

“High” for accuracy greater than or equal to 90. “Medium” for accuracy between 70 and 90. “Low” for accuracy less than 70. I used the mutate() function to create this variable with the case_when() function to define the conditions for each category:

r Copy Edit clean_data <- clean_data %>% mutate(performance_category = case_when( accuracy >= 90 ~ “High”, accuracy >= 70 & accuracy < 90 ~ “Medium”, accuracy < 70 ~ “Low” )) Filtering Data Based on Conditions: The next step involved filtering the dataset to include only participants in the “Experimental” condition who had reaction times faster than the overall mean reaction time for the dataset. I first calculated the mean reaction time and then used the filter() function to subset the data accordingly.

r Copy Edit mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)

filtered_data <- clean_data %>% filter(condition == “Experimental” & reaction_time < mean_reaction_time) This gave me a subset of the data where participants in the Experimental condition had faster reaction times than the average participant in the dataset..**

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

# 1. Select numeric variables
numeric_data <- clean_data %>%
  select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)

## Error: object 'clean_data' not found

# 2. Create the correlation plot
library(psych)
corPlot(cor(numeric_data, use = "complete.obs"), main = "Correlation Plot")

## Error: object 'numeric_data' not found

# Interpretation of the plot: # - Strong correlations between anxiety_pre and anxiety_post are expected. # - Reaction time and accuracy may show a negative correlation, as faster reaction times could correlate with higher accuracy. # - These relationships can inform future research, such as exploring how reducing anxiety impacts reaction time and accuracy in tasks.

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

**Question 7: Reflection Research Question in Psychology

Research Question: How does sleep deprivation affect cognitive performance? Data to Collect: Reaction time, accuracy, and subjective sleepiness scores (e.g., on a Likert scale). Appropriate Statistical Analyses: T-tests or ANOVA to compare cognitive performance between sleep-deprived and well-rested participants, regression analysis to examine relationships. Potential Measurement Errors: Self-reported sleepiness may be biased, and reaction time measurements may be influenced by distractions or test conditions. Learning R for Data Analysis

Impact on Understanding: Learning R has helped me better understand the importance of data visualization and statistical analyses in psychological research. I now feel more confident in handling complex datasets and performing analyses efficiently. Advantages of R: R offers powerful packages and flexibility for statistical analysis and visualization. Challenges: R’s steep learning curve can be intimidating, especially when troubleshooting errors or writing complex code.**

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.