Take-Home Midterm Exam: Introductory Psychological Statistics

Replace “Your Name” with your actual name.

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Write your answer(s) here Nominal is represents different categories without any order. An example would be the research of a type of therapy like cognitive-behavioral,There are distinct categories but are not ranked. Ordinal data has categories that are in order/ranked but they are not necessarily ranked equally. An example would be measuring agreement: agree, disagree, strongly disagree, strongly agree, etc. The responses can be in order but the different between each point. Interval data does not have any ordered categories with equal intervals between the values while it lacks a true point of zero. An example would be temperature is measured in fahrenheit or celsius, we can say one tempurature if warmer than the other but there is no absolute zero the can indicate the absence. Ratio includes data that is a true zero with meaningful ratios. For example research could be the number of hours spent studying, which zero indicates the study time meanwhile 10 hours is twice as much compared to 5 hours.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Write your answer(s) here Scores on depression (0-63) I would say is Interval because the scores represent a range of values and the indicate the severity of depression and there is no true zero point that indicates the absence. Response time in milliseconds I would say is Ratio because it does has a true point of zero and the differences and ratios between values are meaningful. Likert scale ratings of agreement (1-7) I would say is ordinal because the rating do indicate a ranking order of agreement but I think I would also say maybe Interval as well because the numbers may not necessarily be equal. Diagnostic categories I would say is nominal because the categories are distinct but they do not have a meaningful order or ranking. Age in years I would say is Ratio because age has a true zero point when we are born and both differences and ratios are meaningful.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

Write your answer(s) here The difference between random and systematic error is that random errors are unpredictable that can arise from various sources. For example participants are asked to recall a list of words and a random error might occur if one of the participants is distracted by even the smallest thing such as a noise that could cause them to mishear or recall the wrong word. Systematic errors are consistent and repeatable inaccuracies. For example having faulty memory while participating in a memory experiment, consistently underestimates the number of words recalled.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

Write your answer(s) here Measurement error can have a significant impact on the validity of a study examining the relationship between stress and academic performance because stress is measured inaccurately such as through self-reports, surveys, etc. The data may not accurately reflect the true stress levels of the participant. To minimize these errors, researchers should take several steps like use validated and reliable instruments for measuring stress, for example like using standardized questionnaires that have been tested for accuracy. Researchers should also employ multiple methods of measurement that can provide a more comprehensive view of stress levels.

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

# Your code here

describe(mean, median, standard deviation, minimum, maximum)

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

# Your code here

anxiety_change () %>%

Write your answer(s) here

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

# Your code here

reaction_times < - c(75, 102, 80, 100, 95,65, 200,89, 77, 69)

Write your answer(s) here The probability would be 50 percent, it could come close to 300ms or pass it and reach 400ms.

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

# Your code here

clean_data <- data %>%

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

# Your code here

cleaned_data <- data %>% high(90) medium(80) low(50)

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

# Your code here

clean_data <- data.frame( participants = reaction_times )

Write your answer(s) here describing your data cleaning process. The participants in the experimental condition with reaction times faster than the overall mean reaction times were not given to me.

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).

corPlot(cor(anxiety_data)) %>%

Write your answer(s) here I am not sure what I am supposed to be writing here —

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?

Write your answer(s) here I don’t think that I have come across a research question in psychology that has really interested me. Learning R data analysis has changed my understanding of psychological statistics by showing me that there is a lot that has to go into everything little thing. The smallest things really do matter and make a huge difference. If you forget something or misspell it can change everything or give you nothing. You have to pay such close attention to everything and not to mention it’s pretty hard to do everything, understand everything and on top of that having to remember everything. The biggest advantage I would say is being able to have a better understanding of the statistics side of psychology because I had no idea. The biggest challenge I would say is figuring everything out, especially because when you first get into R it seems like so much and it feels so overwhelming, and you don’t really understand how big of a deal it is to pay extra attention to make sure you get everything right and don’t miss or forget to add or take away anything.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.