Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal data is data representing labels, concepts, etc, without any quantitative information or ranking/order, such as gender. Ordinal data is represented in order or rank, but without any quantifiable information. There is a determined sequence of the data with separate categories, such as levels of happiness or depression in high, medium, or low. Interval data has variables that are both labeled and categorized in a ranking system like the nominal and ordinal, but also has know intervals. These are equal and consistent intervals that make the data quantifiable and mathematically usable, but contain no true zero point, such test scores or IQ scores, because on interval scales zero is an point that does not mean there is an absence of a variable, in this case absence of intelligence. Ratio data is has all of the factors of nominal, ordinal, and interval data as well as having a true zero point. Ratio data has numerical values that contain a true zero, which means the absence of a variable or quantity, with equal intervals between values. This means all mathematical operations can be applied and then used. For example, age, reaction time, or duration.
Age and response time would be ratio data, they have true zeros and set intervals. Likert scale ratings are ordinal data because the scaling/meaning between responses, 1-10, may not be equal. Depression inventory would be interval data as it has no true zero but equal values and quantitative data.. Diagnostic categories would be nominal data as they have no numerical values, categories, or ranking.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
A systemic error occurs when the system in place to collect data makes the same kind of mistake every time it measures something in the same way.This means consistent, predictable biases in measurements. A random error simply occurs due to chance. These are unpredictable changes in measurements. Systemic errors effect the accuracy of results, and random errors effect the precision of results. Distractions, state of mind, or concentration ability can all effect ones performance in a recollection test, and this will lead to a random changing in data collection that can’t be predicted. If, in this same test, the things that are being recalled, words, images, etc, could be more or less familiar to a certain demographic of people. Children will do systemically worse at remembering complex words or phrases they aren’t used to, but older educated adults will have a better time as they are more accustomed to those kinds of words or phrases. Systemic errors are most likely to be caused by the system itself, including types of biases or skewed equipment. Random errors are more likely participants be caused by participants.
The biggest issue would be in the actual measurement in the level of stress. Academic performance in easily measured as we have a solid system already in place, but level of stress is both subjective to the observer and the participant. The scale used by every person is different and in an experiment there would need to be a quantitative way to measure stress equally throughout a whole group. This would be a sort of systemic error. Even on a test like the Perceived Stress Scale, a 0-4 rating may be simple, but there will be differences in what people interpret very often, fairly often, never, etc,as. To minimized this error, solution’s could be pretesting, testing and experiment before to determine places of error, increase the sample size, ensure consistent measurement/explanation of measurement,ensure validity of systems(use GPAs based on correct academic level), use more then one method of measurement per data type, and reducing human error(blinding, training, testing).
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
summary_stats <- data %>%
group_by(condition) %>%
summarize(
mean_reaction_time = mean(reaction_time),
median_reaction_time = median(reaction_time),
sd_reaction_time = sd(reaction_time),
min_reaction_time = min(reaction_time),
max_reaction_time = max(reaction_time),
mean_accuracy = mean(accuracy),
median_accuracy = median(accuracy),
sd_accuracy = sd(accuracy),
min_accuracy = min(accuracy),
max_accuracy = max(accuracy)
)
print(summary_stats)## # A tibble: 2 × 11
## condition mean_reaction_time median_reaction_time sd_reaction_time
## <chr> <dbl> <dbl> <dbl>
## 1 Control NA NA NA
## 2 Experimental NA NA NA
## # ℹ 7 more variables: min_reaction_time <dbl>, max_reaction_time <dbl>,
## # mean_accuracy <dbl>, median_accuracy <dbl>, sd_accuracy <dbl>,
## # min_accuracy <dbl>, max_accuracy <dbl>
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
Q4_1a <- 1 - pnorm(450, mean = 350, sd = 75)
Q4_1b <- pnorm(400, mean = 350, sd = 75) - pnorm(300, mean = 350, sd = 75)
Q4_lA = 0.0912112197 * 100
Q4_1B = 0.4950149249 * 100Question 4, 1a: 9.12 percent chance a randomly selected participant will have a reaction time greater than 450ms 1b: 49.5 percent chance participant will have a reaction time between 300ms and 400ms
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.cleaned_data <-data %>%
na.omit()
summary_stats <- cleaned_data %>%
group_by(condition) %>%
summarize(
mean_reaction_time = mean(reaction_time),
median_reaction_time = median(reaction_time),
sd_reaction_time = sd(reaction_time),
min_reaction_time = min(reaction_time),
max_reaction_time = max(reaction_time),
mean_accuracy = mean(accuracy),
median_accuracy = median(accuracy),
sd_accuracy = sd(accuracy),
min_accuracy = min(accuracy),
max_accuracy = max(accuracy)
)
print(summary_stats)## # A tibble: 2 × 11
## condition mean_reaction_time median_reaction_time sd_reaction_time
## <chr> <dbl> <dbl> <dbl>
## 1 Control 302. 300. 47.3
## 2 Experimental 296. 288. 38.4
## # ℹ 7 more variables: min_reaction_time <dbl>, max_reaction_time <dbl>,
## # mean_accuracy <dbl>, median_accuracy <dbl>, sd_accuracy <dbl>,
## # min_accuracy <dbl>, max_accuracy <dbl>
performance_category that
categorizes participants based on their accuracy:
cleaned_data <- data %>%
na.omit() %>%
mutate(performance_category = ifelse(accuracy >= 90, "High",
ifelse(accuracy >= 70 & accuracy < 90, "Medium", "Low")))
print(cleaned_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## 7 7 323.0458 69.51247 Female Control 29.50392
## 8 8 236.7469 90.84614 Male Control 22.02049
## 10 10 277.7169 87.15942 Female Control 22.00335
## 12 12 317.9907 79.97677 Male Experimental 16.60658
## 13 13 320.0386 81.66793 Male Experimental 14.91876
## 14 14 305.5341 74.81425 Female Control 50.92832
## 15 15 272.2079 74.28209 Female Experimental 21.66514
## 17 17 324.8925 89.48210 Female Experimental 30.09256
## 18 18 201.6691 85.53004 Male Control 21.12975
## 19 19 335.0678 94.22267 Female Control 29.13490
## 20 20 276.3604 105.50085 Male Control 27.95172
## 21 21 246.6088 80.08969 Female Control 23.27696
## 22 22 289.1013 61.90831 Male Control 25.52234
## 23 23 248.6998 95.05739 Male Control 24.72746
## 24 24 263.5554 77.90799 Male Experimental 42.02762
## 25 25 268.7480 78.11991 Female Control 19.06931
## 26 26 215.6653 95.25571 Female Experimental 16.23203
## 27 27 341.8894 82.15227 Male Control 25.30231
## 28 28 307.6687 72.79282 Male Control 27.48385
## 29 29 243.0932 86.81303 Female Control 28.49219
## 31 31 321.3232 85.05764 Male Experimental 16.49339
## 32 32 285.2464 88.85280 Female Experimental 35.10548
## 33 33 344.7563 81.29340 Female Control 22.20280
## 34 34 343.9067 91.44377 Male Control 18.07590
## 35 35 341.0791 82.79513 Female Control 23.10976
## 36 36 334.4320 88.31782 Female Experimental 23.42259
## 37 37 327.6959 95.96839 Female Experimental 33.87936
## 38 38 296.9044 89.35181 Female Experimental 25.67790
## 39 39 284.7019 81.74068 Female Control 31.03243
## 40 40 280.9764 96.48808 Male Experimental 21.00566
## 41 41 265.2647 94.93504 Male Control 26.71556
## 42 42 289.6041 90.48397 Female Control 22.40251
## 44 44 408.4478 78.72094 Female Control 17.83709
## 45 45 360.3981 98.60652 Male Control 14.51359
## 46 46 243.8446 78.99740 Male Experimental 40.97771
## 47 47 279.8558 106.87333 Male Experimental 29.80567
## 48 48 276.6672 100.32611 Female Experimental 14.98983
## 49 49 338.9983 82.64300 Female Control 20.11067
## 50 50 295.8315 74.73579 Female Control 15.51616
## anxiety_post anxiety_change performance_category
## 1 29.053117 2.24879426 Medium
## 2 19.215099 11.93723893 Medium
## 3 20.453056 7.20456483 Medium
## 4 13.751994 3.18099329 High
## 5 17.847362 6.19701754 Medium
## 6 19.933968 2.82286978 High
## 7 24.342317 5.16159899 Low
## 8 17.758982 4.26150823 High
## 10 22.069157 -0.06580401 Medium
## 12 7.875522 8.73106229 Medium
## 13 3.221330 11.69742764 Medium
## 14 45.327922 5.60039736 Medium
## 15 16.642661 5.02247855 Medium
## 17 23.416047 6.67651035 Medium
## 18 21.642810 -0.51305479 Medium
## 19 26.912456 2.22244027 High
## 20 24.773302 3.17841445 High
## 21 18.586930 4.69002601 Medium
## 22 20.597288 4.92505594 Low
## 23 20.358843 4.36861886 High
## 24 31.904850 10.12276506 Medium
## 25 14.370025 4.69928609 Medium
## 26 8.052780 8.17924981 High
## 27 21.952702 3.34960540 Medium
## 28 24.334744 3.14910235 Medium
## 29 24.635854 3.85633353 Medium
## 31 2.627509 13.86588190 Medium
## 32 27.376440 7.72904122 Medium
## 33 18.430744 3.77205314 Medium
## 34 15.607200 2.46869675 High
## 35 19.873474 3.23628902 Medium
## 36 19.373641 4.04895160 Medium
## 37 26.428138 7.45122383 High
## 38 16.420951 9.25694721 Medium
## 39 28.470531 2.56189924 Medium
## 40 15.350273 5.65539054 High
## 41 21.378795 5.33676775 High
## 42 17.294151 5.10836205 High
## 44 15.992029 1.84506400 Medium
## 45 7.508622 7.00496546 High
## 46 27.270622 13.70708547 Medium
## 47 22.108595 7.69707534 High
## 48 11.069351 3.92047789 High
## 49 17.068705 3.04196717 Medium
## 50 10.016330 5.49982914 Medium
mean_overall = mean((302+296)/2)
filtered_data <- data %>%
filter(condition == "Experimental" & reaction_time < mean_overall) %>%
mutate(performance_category = ifelse(accuracy >= 90, "High",
ifelse(accuracy >= 70 & accuracy < 90, "Medium", "Low")))
print(filtered_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 2 288.4911 84.71453 Female Experimental 31.15234
## 2 15 272.2079 74.28209 Female Experimental 21.66514
## 3 24 263.5554 77.90799 Male Experimental 42.02762
## 4 26 215.6653 95.25571 Female Experimental 16.23203
## 5 32 285.2464 88.85280 Female Experimental 35.10548
## 6 38 296.9044 89.35181 Female Experimental 25.67790
## 7 40 280.9764 96.48808 Male Experimental 21.00566
## 8 46 243.8446 78.99740 Male Experimental 40.97771
## 9 47 279.8558 106.87333 Male Experimental 29.80567
## 10 48 276.6672 100.32611 Female Experimental 14.98983
## anxiety_post anxiety_change performance_category
## 1 19.21510 11.937239 Medium
## 2 16.64266 5.022479 Medium
## 3 31.90485 10.122765 Medium
## 4 8.05278 8.179250 High
## 5 27.37644 7.729041 Medium
## 6 16.42095 9.256947 Medium
## 7 15.35027 5.655391 High
## 8 27.27062 13.707085 Medium
## 9 22.10860 7.697075 High
## 10 11.06935 3.920478 High
So I first created a new dataset called “cleaned_data” with na.omit() and then reran the summarizing code to create a new set of summary_stats the were up to date with the clean data.I then used mutate to create the new performance category from this clean data. I then used filter to make a new dataset called “filtered_data” with the conditions of “Experimental” only and within that only those with reaction times greater then the overall mean reaction time which I created as “mean_overall” and a simple mean mathematical function.I re-ran the performace_category code to pop out this filtered data with that category as well.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.plotable_data <- cleaned_data %>%
select(where(is.numeric), -participant_id)
corPlot(cor(plotable_data))Anxiety pre and post are strongly positively correlated, the more anxiety before the more after. Anxiety pre and change are sightly positive. Reaction time and pre, post, change, and accuracy are all slightly negative. Accuracy is slight negative with all. Anxiety change is sightly negative with post, reaction time, and accuracy. I would have expected higher anxiety worse accuracy and reaction and and less anxiety post being down for all situations as the test would be over. Seems to show that anxiety doesn’t effect reaction time or accuracy very heavily. —
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
I would like to determine if a healthy amount of physical activity a day improves stress levels and other negative factors. I also want to see at what point it would become to much activity if that is possible. This would be a experiment comparing how much someone works out, for example, and how stressed the are before and after, as well as when the do not, and what is the ideal intensity and amount of time spent to achieve minimum stress. This would have to take into account different body types, normalcy with exercise, mental strength, and genetic differences.
I have basically no other experience with software outside of R, so so far it has been very useful and I am having a surprisingly fun time coding. It seems to be simple sofar so it may not have the reach of different coding systems.