Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Ratio Data (like weight and height) true zero with equal intervals Ordinal Data (surveys and interviews) Meaningful order Nominal Data (like left or right handedness and gender) without order Interval Data (like rulers and temperature) no true zero but equal intervals

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

**Use of Ratio as a level of measurement for Depression Inventory (0 to 63) Ratio score has no true zero (0 = no depression). It works on a continuous scale and the values hold meaning in their difference. A score of 30 is 20 points higher than 10 which means there is more depression. Use of Ratio as a level of measurement for response time in milliseconds Ratio has a true zero here. It works on a continuous variable scale. (0 = no time passed). Score has meaning for their difference and ratios. (30 milliseconds is 3 times as long as 10 milliseconds).

Use of Ordinal as a level of measurement for Liker scale rating of agreement (1-7) Can have an inconsistent difference between levels, such as 2 and 3 can be greater or smaller than between 5 and 6), while keeping that a higher number is more than a lesser number. The scale measures represent rank order. (Strongly agree = 1, disagrees medium = 3.5, strongly disagrees = 7) Use of Nominal as a level of measurement for Diagnostic categorizing Different types of diagnoses like social anxiety disorder or no diagnosis at all are just specific categories. There is no real rank order or valid meaning or numbers to record. Use of Ratio as a level of measurement for Age in years Ratio has a true zero with a continuous variable (0 = no age). Score has meaning in the difference and ratios, (20 is 2 times older than 10; 60 is 3 times older than 20)**

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment. Errors Random - has variable and inconsistent measurements due to uncontrolled factors and influences that affects precision. Results may vary in different directions. (A distraction to the participant can cause random fluctuation in the overall results) Systemic - Results will skew toward one direction due to it’s consistency, accuracy and errors or flaws in the test. (Loud consistent noise may cause the participant to block out part of a video recording during the testing/experiment and mis-hear the information).
How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Validity and relationships can be blurred or unable to validate due to errors during performance or stress assessments. The results can occur from errors in measurements, random variation and bias.
To increase accuracy for reliable findings: * make sure researchers are consistent with their data * use standard data collection processes * Test your instruments before using * use reliable instruments that can be verified (like surveys, physiological parameters and standard equipment) * use various sources and triangulate the data * use indirect questions and use anonymous subjects to avoid response bias.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Calculate mean, median, standard deviation, minimum, and maximum for reaction time and accuracy grouped by condition
descriptive_stats <- data %>%
  group_by(condition) %>%
  summarise(
    mean_reaction_time = mean(reaction_time, na.rm = TRUE),
    median_reaction_time = median(reaction_time, na.rm = TRUE),
    sd_reaction_time = sd(reaction_time, na.rm = TRUE),
    min_reaction_time = min(reaction_time, na.rm = TRUE),
    max_reaction_time = max(reaction_time, na.rm = TRUE),
    mean_accuracy = mean(accuracy, na.rm = TRUE),
    median_accuracy = median(accuracy, na.rm = TRUE),
    sd_accuracy = sd(accuracy, na.rm = TRUE),
    min_accuracy = min(accuracy, na.rm = TRUE),
    max_accuracy = max(accuracy, na.rm = TRUE)
  )

## Error in UseMethod("group_by"): no applicable method for 'group_by' applied to an object of class "function"

# Print results
print(descriptive_stats)

## Error: object 'descriptive_stats' not found

# Load necessary libraries
library(psych)
library(dplyr)

# Set seed for reproducibility
set.seed(123)

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # Placeholder for post-anxiety values
)

# Modify anxiety_post based on condition
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety_post values do not drop below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Introduce missing values for realism
set.seed(42)
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View first few rows of the dataset
head(data)

# Your code here
``````{r filter-participants, error = TRUE, message = FALSE, warning = FALSE}
# Compute the overall mean reaction time (excluding NA values)
overall_mean_rt <- mean(data$reaction_time, na.rm = TRUE)

# Filter participants in the Experimental condition with reaction times faster than the overall mean
filtered_data <- data %>%
  filter(condition == "Experimental" & reaction_time < overall_mean_rt)

# Print filtered results
print(filtered_data)

**Write your answer(s) here describing your data cleaning process.**


---

## Part 4: Visualization and Correlation Analysis 

### Question 6: Correlation Analysis with the psych Package
Using the **psych** package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

1. Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
2. Use the **psych** package's `corPlot()` function to create a correlation plot.
3. Interpret the resulting plot by addressing:
   - Which variables appear to be strongly correlated?
   - Are there any surprising relationships?
   - How might these correlations inform further research in psychology?

## Error in parse(text = input): attempt to use zero-length variable name

# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).

Write your answer(s) here

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

Research Question in Psychology

“How does chronic stress impact cognitive performance in college students?”

Data:
    Student demographics; stress as measured by cortisol level and perception; and cognitive functioning measured through memory responses and reaction time.
Analyzing Statistics:
    Use of correlations and connections, multiple regression, and t-tests/Anova.
Errors in Potential Measurements:
    Variations in cortisol levels, effects from tests and bias from self perception.

Psychological statistics and the impact of learning R.

Understanding: Better critical thinking process using R. Increased understanding of visualization, manipulation of data and statistical modeling. Advantages: Flexibility, zero cost, consistent results all fit in with research in this field. Challenges: Must understand complex coding, account for errors and address the high learning curve.**

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.