Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal data refers to categorical data that is in no inherent order and only allows counting and frequency analysis. For example, gender, ethnicity, and diagnostic categories are all nominal data. On the other hand, ordinal data is another categorical data that DOES have a meaningful order, but with unequal intervals like education levels and Likert scales, for instance. Ordinal data also allows for comparisons, but not arithmetic operations. Interval data is numerical data that has equal intervals, but with NO true zero point such as IQ scores and temperatures in celsius, for instance. In contrast, ratio data is another type of numerical data that DOES HAVE a true zero point and equal intervals. For instance, reaction time and weight are considered ratio data. Furthermore, only ratio data allows for meaningful ratios and both ratio data and interval data allow arithmetic operations.
**
Scores on a depression inventory (0-63): Ratio data because the scores are numerical data that have a true zero point with equal intervals. Because the data has a true zero point with equal intervals, we can make more meaningful ratios like how a higher score indicates more severe depression than a lower score.
Response time in milliseconds: Ratio data because the response time in milliseconds is numerical data that has a true zero point with equal intervals. By looking at this ratio data, we can make meaningful ratios/connections like identifying a response time that was twice as fast as another response time.
Likert scale ratings of agreement (1-7): Ordinal data because the Likert scale ratings of agreement are categorical data that have meaningful order without equal intervals. For example, a score of 7 could indicate that one strongly agrees with a statement while a score of 2 could indicate that one disagrees with the same statement, thus making them unequal intervals.
Diagnostic categories: Nominal data because diagnostic categories are data that are in no inherent order and permits only frequency analysis and counting. For instance, diagnostic categories like “ADHD,” “anxiety,” and “multiple sclerosis” are in no inherent order.
Age in years: Ratio data because age in years are numerical data with a true zero point and equal intervals. For instance, one person can be 16 years old and another person can be 32 years old, which is twice as older as the 16 year old. **
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Random error is when there are unpredictable fluctuations in experiments that reduces reliability, but doesn’t systematically bias the results. For example, in a memory experiment, random error may occur if the participants are distracted by other stimuli, thus affecting the results. On the other hand, systematic error is when there are consistent and predictable deviations in measurements. Systematic errors may bias results in a specific direction, whereas random error does not systematically bias the results, as the fluctuations in experiments are unpredictable. For example, in a memory experiment, experimenter bias may occur if the experimenter guides the participants answers or memory, which directly affects the results of the experiment. Furthermore, another key difference between random error and systematic error is that random error can be reduced by increasing the sample size, whereas systematic errors cannot be reduced by increasing the sample size.
Measurement errors may affect the validity of a study examining the relationship between stress and academic performance because of reliability issues or random and systematic errors. Without reliability of results in an experiment, the results may end up invalid because it is not accurately measuring what it is intended to measure, consistently. On the other hand, if the study has random or systematic errors, results may end up biased. Researchers can minimize these errors by re-calibrating instruments, increasing sample size, or by completing thorough and careful analysis of their data.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
library(psych)
# Calculate descriptive statistics of reaction_time grouped by condition
describeBy(data$reaction_time, data$condition)##
## Descriptive statistics by group
## group: Control
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 30 301.4 48.54 299.68 300.42 55.38 201.67 408.45 206.78 0.14 -0.66
## se
## X1 8.86
## ------------------------------------------------------------
## group: Experimental
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 17 295.75 38.37 288.49 295.61 43.74 215.67 377.94 162.27 0 -0.27
## se
## X1 9.31
##
## Descriptive statistics by group
## group: Control
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 29 85.49 9.86 85.53 85.68 8.77 61.91 105.5 43.59 -0.15 -0.35 1.83
## ------------------------------------------------------------
## group: Experimental
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 19 88.06 8.2 88.32 87.76 9.86 74.28 106.87 32.59 0.45 -0.45 1.88
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
# Create new variable named anxiety_change
data <- data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post)# Calculate mean of anxiety change for each condition
data %>%
group_by(condition) %>%
summarize(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE))## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
The mean anxiety change for the control group ended up being 3.79 and the mean anxiety change for the experimental group was 8.64.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
## [1] 0.09121122
#(b) Probability of reaction time between 300ms and 400ms
pnorm(400, mean = 350, sd = 75) - pnorm(300, mean = 350, sd = 75)## [1] 0.4950149
** a. The probability that a randomly selected participant will have a reaction time greater than 450ms is 0.09% or 9%.
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.# Remove rows with any NA values and create new dataset clean_data
clean_data <- data %>%
na.omit() %>%
print(clean_data)## Error: object 'clean_data' not found
performance_category that
categorizes participants based on their accuracy:
# Create new variable performance_category based on their accuracy
clean_data <- data %>%
mutate(
performance_category = case_when(
accuracy >= 90 ~ "High", accuracy >= 70 & accuracy < 90 ~ "Medium", accuracy < 70 ~ "Low"
))
print(clean_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## 7 7 323.0458 69.51247 Female Control 29.50392
## 8 8 236.7469 90.84614 Male Control 22.02049
## 9 9 NA 86.23854 Female Experimental 32.81579
## 10 10 277.7169 87.15942 Female Control 22.00335
## 11 11 NA 88.79639 Female Experimental 33.42169
## 12 12 317.9907 79.97677 Male Experimental 16.60658
## 13 13 320.0386 81.66793 Male Experimental 14.91876
## 14 14 305.5341 74.81425 Female Control 50.92832
## 15 15 272.2079 74.28209 Female Experimental 21.66514
## 16 16 NA 88.03529 Female Control 27.38582
## 17 17 324.8925 89.48210 Female Experimental 30.09256
## 18 18 201.6691 85.53004 Male Control 21.12975
## 19 19 335.0678 94.22267 Female Control 29.13490
## 20 20 276.3604 105.50085 Male Control 27.95172
## 21 21 246.6088 80.08969 Female Control 23.27696
## 22 22 289.1013 61.90831 Male Control 25.52234
## 23 23 248.6998 95.05739 Male Control 24.72746
## 24 24 263.5554 77.90799 Male Experimental 42.02762
## 25 25 268.7480 78.11991 Female Control 19.06931
## 26 26 215.6653 95.25571 Female Experimental 16.23203
## 27 27 341.8894 82.15227 Male Control 25.30231
## 28 28 307.6687 72.79282 Male Control 27.48385
## 29 29 243.0932 86.81303 Female Control 28.49219
## 30 30 362.6907 NA Male Control 21.33308
## 31 31 321.3232 85.05764 Male Experimental 16.49339
## 32 32 285.2464 88.85280 Female Experimental 35.10548
## 33 33 344.7563 81.29340 Female Control 22.20280
## 34 34 343.9067 91.44377 Male Control 18.07590
## 35 35 341.0791 82.79513 Female Control 23.10976
## 36 36 334.4320 88.31782 Female Experimental 23.42259
## 37 37 327.6959 95.96839 Female Experimental 33.87936
## 38 38 296.9044 89.35181 Female Experimental 25.67790
## 39 39 284.7019 81.74068 Female Control 31.03243
## 40 40 280.9764 96.48808 Male Experimental 21.00566
## 41 41 265.2647 94.93504 Male Control 26.71556
## 42 42 289.6041 90.48397 Female Control 22.40251
## 43 43 236.7302 NA Male Control 25.75667
## 44 44 408.4478 78.72094 Female Control 17.83709
## 45 45 360.3981 98.60652 Male Control 14.51359
## 46 46 243.8446 78.99740 Male Experimental 40.97771
## 47 47 279.8558 106.87333 Male Experimental 29.80567
## 48 48 276.6672 100.32611 Female Experimental 14.98983
## 49 49 338.9983 82.64300 Female Control 20.11067
## 50 50 295.8315 74.73579 Female Control 15.51616
## anxiety_post anxiety_change performance_category
## 1 29.053117 2.24879426 Medium
## 2 19.215099 11.93723893 Medium
## 3 20.453056 7.20456483 Medium
## 4 13.751994 3.18099329 High
## 5 17.847362 6.19701754 Medium
## 6 19.933968 2.82286978 High
## 7 24.342317 5.16159899 Low
## 8 17.758982 4.26150823 High
## 9 19.863065 12.95272240 Medium
## 10 22.069157 -0.06580401 Medium
## 11 25.063956 8.35773571 Medium
## 12 7.875522 8.73106229 Medium
## 13 3.221330 11.69742764 Medium
## 14 45.327922 5.60039736 Medium
## 15 16.642661 5.02247855 Medium
## 16 21.290659 6.09516212 Medium
## 17 23.416047 6.67651035 Medium
## 18 21.642810 -0.51305479 Medium
## 19 26.912456 2.22244027 High
## 20 24.773302 3.17841445 High
## 21 18.586930 4.69002601 Medium
## 22 20.597288 4.92505594 Low
## 23 20.358843 4.36861886 High
## 24 31.904850 10.12276506 Medium
## 25 14.370025 4.69928609 Medium
## 26 8.052780 8.17924981 High
## 27 21.952702 3.34960540 Medium
## 28 24.334744 3.14910235 Medium
## 29 24.635854 3.85633353 Medium
## 30 18.283727 3.04934997 <NA>
## 31 2.627509 13.86588190 Medium
## 32 27.376440 7.72904122 Medium
## 33 18.430744 3.77205314 Medium
## 34 15.607200 2.46869675 High
## 35 19.873474 3.23628902 Medium
## 36 19.373641 4.04895160 Medium
## 37 26.428138 7.45122383 High
## 38 16.420951 9.25694721 Medium
## 39 28.470531 2.56189924 Medium
## 40 15.350273 5.65539054 High
## 41 21.378795 5.33676775 High
## 42 17.294151 5.10836205 High
## 43 20.466142 5.29052622 <NA>
## 44 15.992029 1.84506400 Medium
## 45 7.508622 7.00496546 High
## 46 27.270622 13.70708547 Medium
## 47 22.108595 7.69707534 High
## 48 11.069351 3.92047789 High
## 49 17.068705 3.04196717 Medium
## 50 10.016330 5.49982914 Medium
# Filter the dataset to keep only reaction times > mean reaction time
data %>%
filter(condition == "Experimental" & reaction_time > mean(reaction_time, na.rm = TRUE))## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 3 377.9354 84.57130 Female Experimental 27.65762
## 2 12 317.9907 79.97677 Male Experimental 16.60658
## 3 13 320.0386 81.66793 Male Experimental 14.91876
## 4 17 324.8925 89.48210 Female Experimental 30.09256
## 5 31 321.3232 85.05764 Male Experimental 16.49339
## 6 36 334.4320 88.31782 Female Experimental 23.42259
## 7 37 327.6959 95.96839 Female Experimental 33.87936
## anxiety_post anxiety_change
## 1 20.453056 7.204565
## 2 7.875522 8.731062
## 3 3.221330 11.697428
## 4 23.416047 6.676510
## 5 2.627509 13.865882
## 6 19.373641 4.048952
## 7 26.428138 7.451224
First, I removed all missing or NA data from the dataset by using dplyr and the pipe function, then created a new variable called clean_data. I then printed a new dataset with the cleaned data. Then, I created a new variable called performance_category that categorized the participants based on their accuracy, whether the accuracy was greater than or equal to 90 (higher), between 70 and 90 (medium), and less than 70 (low). I did this. by using the mutate function to create the new variable, and using the case_when function to categorize the accuracy speed. After this I printed the new data that included the performance_category variables. Lastly, I filtered the new dataset so that we can keep only reaction times that were faster than the overall mean reaction time by using the filter function to simplify the data.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Select numeric variables from the dataset and create corPlot. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change) %>%
corPlot(upper = FALSE)## Error in plot.new(): figure margins too large
The variables that appear to be strongly correlated are anxiety_pre and anxiety_post, such that as anxiety_pre increases, so does anxiety_post. A surprising relationship was anxiety_pre and accuracy because the relationship was a major difference. These correlations may inform further research in psychology by highlighting the relationship between accuracy and anxiety change, as according to the plot, there is a strong correlation and in doing so, will help researchers understand how anxiety may affect overall performance accuracy.
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
**
A specific research question in psychology that interests me is: did the Covid-19 pandemic increase social anxiety in individuals? The type of data I would collect would be lifestyle habits, anxiety change, and anxiety symptoms. The statistical analyses that would be appropriate to use include correlational analysis, causal analysis, or the GAD-7 test. However, some potential measurement errors I may need to address would be observer bias and random error. Observer bias may occur if I, for example, had preconceived notions about social anxiety and the pandemic; this can lead to skewed results and data. Moreover, random error like the fluctuating emotional state of the participants in the study may negatively impact results because it’ll affect both validity and reliability of the study and data.
Learning R for data analysis has changed my understanding of psychological statistics drastically because I like how R helps you input data much quicker than having to calculate anything on your own, manually. The biggest advantage of R, in my opinion, is being able to easily create graphs and tables in a matter of seconds because it allows us to easily visualize our data analyses. On the other hand, for me, the biggest challenge of using R is understanding the different programming concepts. I get information overload very easily and R sometimes confuses me because of the many different codes it has to offer, as many codes are similar, but do different things.
**
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.