Take-Home Midterm Exam: Introductory Psychological Statistics

Instructions

Please complete this exam on your own. Include your R code, interpretations, and answers within this document.

Part 1: Types of Data and Measurement Errors

Question 1: Data Types in Psychological Research

Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:

Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.

Nominal data is data representing labels, concepts, etc, without any quantitative information or ranking/order, such as gender. Ordinal data is represented in order or rank, but without any quantifiable information. There is a determined sequence of the data with separate categories, such as levels of happiness or depression in high, medium, or low. Interval data has variables that are both labeled and categorized in a ranking system like the nominal and ordinal, but also has know intervals. These are equal and consistent intervals that make the data quantifiable and mathematically usable, but contain no true zero point, such test scores or IQ scores, because on interval scales zero is an point that does not mean there is an absence of a variable, in this case absence of intelligence. Ratio data is has all of the factors of nominal, ordinal, and interval data as well as having a true zero point. Ratio data has numerical values that contain a true zero, which means the absence of a variable or quantity, with equal intervals between values. This means all mathematical operations can be applied and then used. For example, age, reaction time, or duration.

For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
- Scores on a depression inventory (0-63)
- Response time in milliseconds
- Likert scale ratings of agreement (1-7)
- Diagnostic categories (e.g., ADHD, anxiety disorder, no diagnosis)
- Age in years

Age and response time would be ratio data, they have true zeros and set intervals. Likert scale ratings are ordinal data because the scaling/meaning between responses, 1-10, may not be equal. Depression inventory would be interval data as it has no true zero but equal values and quantitative data.. Diagnostic categories would be nominal data as they have no numerical values, categories, or ranking.

Question 2: Measurement Error

Referring to Chapter 3 (Measurement Errors in Psychological Research):

Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.

A systemic error occurs when the system in place to collect data makes the same kind of mistake every time it measures something in the same way.This means consistent, predictable biases in measurements. A random error simply occurs due to chance. These are unpredictable changes in measurements. Systemic errors effect the accuracy of results, and random errors effect the precision of results. Distractions, state of mind, or concentration ability can all effect ones performance in a recollection test, and this will lead to a random changing in data collection that can’t be predicted. If, in this same test, the things that are being recalled, words, images, etc, could be more or less familiar to a certain demographic of people. Children will do systemically worse at remembering complex words or phrases they aren’t used to, but older educated adults will have a better time as they are more accustomed to those kinds of words or phrases. Systemic errors are most likely to be caused by the system itself, including types of biases or skewed equipment. Random errors are more likely participants be caused by participants.

How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?

The biggest issue would be in the actual measurement in the level of stress. Academic performance in easily measured as we have a solid system already in place, but level of stress is both subjective to the observer and the participant. The scale used by every person is different and in an experiment there would need to be a quantitative way to measure stress equally throughout a whole group. This would be a sort of systemic error. Even on a test like the Perceived Stress Scale, a 0-4 rating may be simple, but there will be differences in what people interpret very often, fairly often, never, etc,as. To minimized this error, solution’s could be pretesting, testing and experiment before to determine places of error, increase the sample size, ensure consistent measurement/explanation of measurement,ensure validity of systems(use GPAs based on correct academic level), use more then one method of measurement per data type, and reducing human error(blinding, training, testing).

Part 2: Descriptive Statistics and Basic Probability

Question 3: Descriptive Analysis

The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:

# Create a simulated dataset
set.seed(123)  # For reproducibility

# Number of participants
n <- 50

# Create the data frame
data <- data.frame(
  participant_id = 1:n,
  reaction_time = rnorm(n, mean = 300, sd = 50),
  accuracy = rnorm(n, mean = 85, sd = 10),
  gender = sample(c("Male", "Female"), n, replace = TRUE),
  condition = sample(c("Control", "Experimental"), n, replace = TRUE),
  anxiety_pre = rnorm(n, mean = 25, sd = 8),
  anxiety_post = NA  # We'll fill this in based on condition
)

# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
  data$condition == "Experimental",
  data$anxiety_pre - rnorm(n, mean = 8, sd = 3),  # Larger reduction
  data$anxiety_pre - rnorm(n, mean = 3, sd = 2)   # Smaller reduction
)

# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)

# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA

# View the first few rows of the dataset
head(data)

##   participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1              1      271.9762  87.53319 Female      Control    31.30191
## 2              2      288.4911  84.71453 Female Experimental    31.15234
## 3              3      377.9354  84.57130 Female Experimental    27.65762
## 4              4      303.5254  98.68602   Male      Control    16.93299
## 5              5      306.4644  82.74229 Female      Control    24.04438
## 6              6      385.7532 100.16471 Female      Control    22.75684
##   anxiety_post
## 1     29.05312
## 2     19.21510
## 3     20.45306
## 4     13.75199
## 5     17.84736
## 6     19.93397

Now, perform the following computations*:

Calculate the mean, median, standard deviation, minimum, and maximum for reaction time and accuracy, grouped by condition (hint: use the psych package).

summary_stats <- data %>%
  group_by(condition) %>%
  summarize(
    mean_reaction_time = mean(reaction_time),
    median_reaction_time = median(reaction_time),
    sd_reaction_time = sd(reaction_time),
    min_reaction_time = min(reaction_time),
    max_reaction_time = max(reaction_time),
    mean_accuracy = mean(accuracy),
    median_accuracy = median(accuracy),
    sd_accuracy = sd(accuracy),
    min_accuracy = min(accuracy),
    max_accuracy = max(accuracy)
  )
print(summary_stats)

## # A tibble: 2 × 11
##   condition    mean_reaction_time median_reaction_time sd_reaction_time
##   <chr>                     <dbl>                <dbl>            <dbl>
## 1 Control                      NA                   NA               NA
## 2 Experimental                 NA                   NA               NA
## # ℹ 7 more variables: min_reaction_time <dbl>, max_reaction_time <dbl>,
## #   mean_accuracy <dbl>, median_accuracy <dbl>, sd_accuracy <dbl>,
## #   min_accuracy <dbl>, max_accuracy <dbl>

Using dplyr and piping, create a new variable anxiety_change that represents the difference between pre and post anxiety scores (pre minus post). Then calculate the mean anxiety change for each condition.

  data <- data %>%
  mutate(anxiety_change = anxiety_pre - anxiety_post)
  mean_anxiety_change <- data %>%
  group_by(condition) %>%
  summarize(mean_anxiety_change = mean(anxiety_change))

Question 4: Probability Calculations

Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):

If reaction times in a cognitive task are normally distributed with a mean of 350ms and a standard deviation of 75ms:
1. What is the probability that a randomly selected participant will have a reaction time greater than 450ms?
2. What is the probability that a participant will have a reaction time between 300ms and 400ms?

  Q4_1a <- 1 - pnorm(450, mean = 350, sd = 75)
  Q4_1b <- pnorm(400, mean = 350, sd = 75) - pnorm(300, mean = 350, sd = 75)
 Q4_lA = 0.0912112197 * 100
 Q4_1B = 0.4950149249 * 100

Question 4, 1a: 9.12 percent chance a randomly selected participant will have a reaction time greater than 450ms 1b: 49.5 percent chance participant will have a reaction time between 300ms and 400ms

Part 3: Data Cleaning and Manipulation

Question 5: Data Cleaning with dplyr

Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:

Remove all rows with missing values and create a new dataset called clean_data.

cleaned_data <-data %>%
  na.omit()
  summary_stats <- cleaned_data %>%
  group_by(condition) %>%
  summarize(
    mean_reaction_time = mean(reaction_time),
    median_reaction_time = median(reaction_time),
    sd_reaction_time = sd(reaction_time),
    min_reaction_time = min(reaction_time),
    max_reaction_time = max(reaction_time),
    mean_accuracy = mean(accuracy),
    median_accuracy = median(accuracy),
    sd_accuracy = sd(accuracy),
    min_accuracy = min(accuracy),
    max_accuracy = max(accuracy)
  )
print(summary_stats)

## # A tibble: 2 × 11
##   condition    mean_reaction_time median_reaction_time sd_reaction_time
##   <chr>                     <dbl>                <dbl>            <dbl>
## 1 Control                    302.                 300.             47.3
## 2 Experimental               296.                 288.             38.4
## # ℹ 7 more variables: min_reaction_time <dbl>, max_reaction_time <dbl>,
## #   mean_accuracy <dbl>, median_accuracy <dbl>, sd_accuracy <dbl>,
## #   min_accuracy <dbl>, max_accuracy <dbl>

Create a new variable performance_category that categorizes participants based on their accuracy:
- “High” if accuracy is greater than or equal to 90
- “Medium” if accuracy is between 70 and 90
- “Low” if accuracy is less than 70

cleaned_data <- data %>%
  na.omit() %>%
 mutate(performance_category = ifelse(accuracy >= 90, "High",
                   ifelse(accuracy >= 70 & accuracy < 90, "Medium", "Low")))
print(cleaned_data)

##    participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1               1      271.9762  87.53319 Female      Control    31.30191
## 2               2      288.4911  84.71453 Female Experimental    31.15234
## 3               3      377.9354  84.57130 Female Experimental    27.65762
## 4               4      303.5254  98.68602   Male      Control    16.93299
## 5               5      306.4644  82.74229 Female      Control    24.04438
## 6               6      385.7532 100.16471 Female      Control    22.75684
## 7               7      323.0458  69.51247 Female      Control    29.50392
## 8               8      236.7469  90.84614   Male      Control    22.02049
## 10             10      277.7169  87.15942 Female      Control    22.00335
## 12             12      317.9907  79.97677   Male Experimental    16.60658
## 13             13      320.0386  81.66793   Male Experimental    14.91876
## 14             14      305.5341  74.81425 Female      Control    50.92832
## 15             15      272.2079  74.28209 Female Experimental    21.66514
## 17             17      324.8925  89.48210 Female Experimental    30.09256
## 18             18      201.6691  85.53004   Male      Control    21.12975
## 19             19      335.0678  94.22267 Female      Control    29.13490
## 20             20      276.3604 105.50085   Male      Control    27.95172
## 21             21      246.6088  80.08969 Female      Control    23.27696
## 22             22      289.1013  61.90831   Male      Control    25.52234
## 23             23      248.6998  95.05739   Male      Control    24.72746
## 24             24      263.5554  77.90799   Male Experimental    42.02762
## 25             25      268.7480  78.11991 Female      Control    19.06931
## 26             26      215.6653  95.25571 Female Experimental    16.23203
## 27             27      341.8894  82.15227   Male      Control    25.30231
## 28             28      307.6687  72.79282   Male      Control    27.48385
## 29             29      243.0932  86.81303 Female      Control    28.49219
## 31             31      321.3232  85.05764   Male Experimental    16.49339
## 32             32      285.2464  88.85280 Female Experimental    35.10548
## 33             33      344.7563  81.29340 Female      Control    22.20280
## 34             34      343.9067  91.44377   Male      Control    18.07590
## 35             35      341.0791  82.79513 Female      Control    23.10976
## 36             36      334.4320  88.31782 Female Experimental    23.42259
## 37             37      327.6959  95.96839 Female Experimental    33.87936
## 38             38      296.9044  89.35181 Female Experimental    25.67790
## 39             39      284.7019  81.74068 Female      Control    31.03243
## 40             40      280.9764  96.48808   Male Experimental    21.00566
## 41             41      265.2647  94.93504   Male      Control    26.71556
## 42             42      289.6041  90.48397 Female      Control    22.40251
## 44             44      408.4478  78.72094 Female      Control    17.83709
## 45             45      360.3981  98.60652   Male      Control    14.51359
## 46             46      243.8446  78.99740   Male Experimental    40.97771
## 47             47      279.8558 106.87333   Male Experimental    29.80567
## 48             48      276.6672 100.32611 Female Experimental    14.98983
## 49             49      338.9983  82.64300 Female      Control    20.11067
## 50             50      295.8315  74.73579 Female      Control    15.51616
##    anxiety_post anxiety_change performance_category
## 1     29.053117     2.24879426               Medium
## 2     19.215099    11.93723893               Medium
## 3     20.453056     7.20456483               Medium
## 4     13.751994     3.18099329                 High
## 5     17.847362     6.19701754               Medium
## 6     19.933968     2.82286978                 High
## 7     24.342317     5.16159899                  Low
## 8     17.758982     4.26150823                 High
## 10    22.069157    -0.06580401               Medium
## 12     7.875522     8.73106229               Medium
## 13     3.221330    11.69742764               Medium
## 14    45.327922     5.60039736               Medium
## 15    16.642661     5.02247855               Medium
## 17    23.416047     6.67651035               Medium
## 18    21.642810    -0.51305479               Medium
## 19    26.912456     2.22244027                 High
## 20    24.773302     3.17841445                 High
## 21    18.586930     4.69002601               Medium
## 22    20.597288     4.92505594                  Low
## 23    20.358843     4.36861886                 High
## 24    31.904850    10.12276506               Medium
## 25    14.370025     4.69928609               Medium
## 26     8.052780     8.17924981                 High
## 27    21.952702     3.34960540               Medium
## 28    24.334744     3.14910235               Medium
## 29    24.635854     3.85633353               Medium
## 31     2.627509    13.86588190               Medium
## 32    27.376440     7.72904122               Medium
## 33    18.430744     3.77205314               Medium
## 34    15.607200     2.46869675                 High
## 35    19.873474     3.23628902               Medium
## 36    19.373641     4.04895160               Medium
## 37    26.428138     7.45122383                 High
## 38    16.420951     9.25694721               Medium
## 39    28.470531     2.56189924               Medium
## 40    15.350273     5.65539054                 High
## 41    21.378795     5.33676775                 High
## 42    17.294151     5.10836205                 High
## 44    15.992029     1.84506400               Medium
## 45     7.508622     7.00496546                 High
## 46    27.270622    13.70708547               Medium
## 47    22.108595     7.69707534                 High
## 48    11.069351     3.92047789                 High
## 49    17.068705     3.04196717               Medium
## 50    10.016330     5.49982914               Medium

Filter the dataset to include only participants in the Experimental condition with reaction times faster than the overall mean reaction time.

  mean_overall = mean((302+296)/2)
filtered_data <- data %>%
  filter(condition == "Experimental" & reaction_time < mean_overall)  %>%
mutate(performance_category = ifelse(accuracy >= 90, "High",
                   ifelse(accuracy >= 70 & accuracy < 90, "Medium", "Low")))
print(filtered_data)

##    participant_id reaction_time  accuracy gender    condition anxiety_pre
## 1               2      288.4911  84.71453 Female Experimental    31.15234
## 2              15      272.2079  74.28209 Female Experimental    21.66514
## 3              24      263.5554  77.90799   Male Experimental    42.02762
## 4              26      215.6653  95.25571 Female Experimental    16.23203
## 5              32      285.2464  88.85280 Female Experimental    35.10548
## 6              38      296.9044  89.35181 Female Experimental    25.67790
## 7              40      280.9764  96.48808   Male Experimental    21.00566
## 8              46      243.8446  78.99740   Male Experimental    40.97771
## 9              47      279.8558 106.87333   Male Experimental    29.80567
## 10             48      276.6672 100.32611 Female Experimental    14.98983
##    anxiety_post anxiety_change performance_category
## 1      19.21510      11.937239               Medium
## 2      16.64266       5.022479               Medium
## 3      31.90485      10.122765               Medium
## 4       8.05278       8.179250                 High
## 5      27.37644       7.729041               Medium
## 6      16.42095       9.256947               Medium
## 7      15.35027       5.655391                 High
## 8      27.27062      13.707085               Medium
## 9      22.10860       7.697075                 High
## 10     11.06935       3.920478                 High

So I first created a new dataset called “cleaned_data” with na.omit() and then reran the summarizing code to create a new set of summary_stats the were up to date with the clean data.I then used mutate to create the new performance category from this clean data. I then used filter to make a new dataset called “filtered_data” with the conditions of “Experimental” only and within that only those with reaction times greater then the overall mean reaction time which I created as “mean_overall” and a simple mean mathematical function.I re-ran the performace_category code to pop out this filtered data with that category as well.

Part 4: Visualization and Correlation Analysis

Question 6: Correlation Analysis with the psych Package

Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:

Select the numeric variables from the dataset (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Use the psych package’s corPlot() function to create a correlation plot.
Interpret the resulting plot by addressing:
- Which variables appear to be strongly correlated?
- Are there any surprising relationships?
- How might these correlations inform further research in psychology?

plotable_data <- cleaned_data %>%
  select(where(is.numeric), -participant_id) 

corPlot(cor(plotable_data))

Anxiety pre and post are strongly positively correlated, the more anxiety before the more after. Anxiety pre and change are sightly positive. Reaction time and pre, post, change, and accuracy are all slightly negative. Accuracy is slight negative with all. Anxiety change is sightly negative with post, reaction time, and accuracy. I would have expected higher anxiety worse accuracy and reaction and and less anxiety post being down for all situations as the test would be over. Seems to show that anxiety doesn’t effect reaction time or accuracy very heavily. —

Part 5: Reflection and Application

Question 7: Reflection

Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:

I would like to determine if a healthy amount of physical activity a day improves stress levels and other negative factors. I also want to see at what point it would become to much activity if that is possible. This would be a experiment comparing how much someone works out, for example, and how stressed the are before and after, as well as when the do not, and what is the ideal intensity and amount of time spent to achieve minimum stress. This would have to take into account different body types, normalcy with exercise, mental strength, and genetic differences.
I have basically no other experience with software outside of R, so so far it has been very useful and I am having a surprisingly fun time coding. It seems to be simple sofar so it may not have the reach of different coding systems.