Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal data is categorical data used to classify variables in no order and without assigning numeric values. Personality types are an example of nominal data because they categorize individuals into distinct groups that don’t have a numerical or meaningful sequence. When labeling individuals as introverted or extroverted, the labels classify individuals based on unique characteristics; neither group is considered “greater” or “lesser” than the other in a way that would apply to a measurable scale such as temperature.
Ordinal data is categorical data with an order or ranking, but the intervals between them are not guaranteed to be consistent. An example of ordinal data is the Likert scale, which is used to evaluate attitudes or opinions by having participants provide their rating on a series of ranked choices, such as “Agree,” “Neutral,” and “Disagree.” The differences in agreement between these answers on the scale likely aren’t equal or consistent. There is a meaningful order, which makes it ordinal data.
Interval data is numerical data with equal intervals between values, allowing for precise variance measurements. When working with interval data, zero doesn’t denote the complete absence of the measured variable because this type of data lacks a zero point. IQ test scores are an example because the difference between scores is consistent and meaningful. An IQ score of 120 is the same distance away from 100 as an IQ score of 80, but a score of 0 doesn’t represent “no intelligence,” so it lacks a true zero.
Ratio data is numerical data with all the properties of interval data but with a true zero point, meaning zero indicates the absence of the measured quantity. Considering the time it takes for a participant to respond to a stimulus, a reaction time of 0 seconds represents no delay, and the data allows for meaningful comparisons between fast and slow when working with a reasonably reliable metric.
Scores on a depression inventory are ratio data because they have a true zero point, meaning a score of 0 indicates no depression. The differences between scores are consistent and meaningful comparisons can be made between scores.
Response time in milliseconds is ratio data because it has a true zero point (0 milliseconds means no time passed). Since the differences between times are consistent, we can make valid comparisons like 2x the average or 0.5x less.
Likert scale ratings are ordinal because they rank responses (e.g., “Agree” to “Disagree”), but the range between these responses isn’t guaranteed to be equal. The order matters, and we can’t assume the difference between levels is the same.
Diagnostic categories are nominal because they are used to label and categorize individuals; when analyzing data, they are functionally similar to personality types or eye colors.
Age in years is ratio data. It has a true zero point, which is before birth or death.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Random errors are unpredictable variations that occur when measuring something. They happen because of small, uncontrollable factors like slight conditional changes such as a participant’s sleep quality. These errors do not follow a pattern and can make results higher or lower than the true value. Systematic errors are consistent mistakes that happen the same way every time and occur from a flaw in the method, equipment, and biases. Repeated measuring does not correct systematic errors, unlike random mistakes. The main distinction is that changing the sample size or increasing the measurements helps solve the less predictable random errors. By contrast, systematic mistakes call for addressing biases or flaws in the equipment or technique. Random errors can occur in a memory experiment, such as when a researcher tests memory by showing participants a series of images. Some participants may recall fewer images because they were tired or momentarily distracted. Others might not recall an image correctly because it reminded them of something similar. These errors happen by chance and do not follow a pattern. Over many trials, they balance out. There’s a systematic error if the researcher always presents images for too short a time. That makes it harder for all participants to process and remember them. Because this mistake happens the same way for everyone, it consistently lowers memory scores. Unlike random errors, systematic errors do not balance out and can lead to misleading results.
Poorly written questionnaires can lead to misleading conclusions about correlations. Additionally, factors like health, family support, school environment, and more contribute to stress and academic performance and should be considered when drawing the most accurate conclusion. It would also be essential to consider where the sample originated; you will get different answers depending on which school the students attend. A study that extends research efforts beyond grades, working with a large sample size comprised of students from many schools and socioeconomic statuses, and even including surveys with physiological data would provide less room for error and improve the validity of the correlation. In establishing a correlation between any two variables, researchers must consider the impact of any external variables.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
set.seed(123)
n <- 50
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA
)
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3),
data$anxiety_pre - rnorm(n, mean = 3, sd = 2)
)
data$anxiety_post <- pmax(data$anxiety_post, 0)
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## item group1 vars n mean sd median trimmed mad
## reaction_time1 1 Control 1 30 301.40 48.54 299.68 300.42 55.38
## reaction_time2 2 Experimental 1 17 295.75 38.37 288.49 295.61 43.74
## accuracy1 3 Control 2 29 85.49 9.86 85.53 85.68 8.77
## accuracy2 4 Experimental 2 19 88.06 8.20 88.32 87.76 9.86
## min max range skew kurtosis se
## reaction_time1 201.67 408.45 206.78 0.14 -0.66 8.86
## reaction_time2 215.67 377.94 162.27 0.00 -0.27 9.31
## accuracy1 61.91 105.50 43.59 -0.15 -0.35 1.83
## accuracy2 74.28 106.87 32.59 0.45 -0.45 1.88
anxiety_change that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.data %>%
mutate(anxiety_change = anxiety_pre - anxiety_post) %>%
group_by(condition) %>%
summarize(mean_anxiety_change = mean(anxiety_change, na.rm = TRUE)) ## # A tibble: 2 × 2
## condition mean_anxiety_change
## <chr> <dbl>
## 1 Control 3.79
## 2 Experimental 8.64
The mean anxiety change for the control group is 3.79. The mean anxiety change for the experimental group is 8.64
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
mean_reaction_time <- 350
sd_reaction_time <- 75
z_450 <- (450 - mean_reaction_time) / sd_reaction_time
probability_greater_than_450 <- 1 - pnorm(450, mean = mean_reaction_time, sd = sd_reaction_time)
probability_between_300_and_400 <- pnorm(400, mean = mean_reaction_time, sd = sd_reaction_time) - pnorm(300, mean = mean_reaction_time, sd = sd_reaction_time)
probability_greater_than_450## [1] 0.09121122
## [1] 0.4950149
The probability that a randomly selected participant will have a reaction time greater than 450ms is 0.09. The probability that a participant will have a reaction time between 300ms and 400ms is 0.49
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data.performance_category that
categorizes participants based on their accuracy:
data <- data %>%
mutate(
performance_category = case_when(
accuracy >= 90 ~ "High",
accuracy >= 70 & accuracy < 90 ~ "Medium",
accuracy < 70 ~ "Low"
)
)
head(data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post performance_category
## 1 29.05312 Medium
## 2 19.21510 Medium
## 3 20.45306 Medium
## 4 13.75199 High
## 5 17.84736 Medium
## 6 19.93397 High
mean_reaction_time <- mean(clean_data$reaction_time, na.rm = TRUE)
filtered_data <- clean_data %>%
filter(condition == "Experimental" & reaction_time < mean_reaction_time)
head(filtered_data)## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 2 288.4911 84.71453 Female Experimental 31.15234
## 2 15 272.2079 74.28209 Female Experimental 21.66514
## 3 24 263.5554 77.90799 Male Experimental 42.02762
## 4 26 215.6653 95.25571 Female Experimental 16.23203
## 5 32 285.2464 88.85280 Female Experimental 35.10548
## 6 38 296.9044 89.35181 Female Experimental 25.67790
## anxiety_post
## 1 19.21510
## 2 16.64266
## 3 31.90485
## 4 8.05278
## 5 27.37644
## 6 16.42095
I removed rows with missing values to create a clean dataset and added a new variable (performance_category), categorizing participants based on their accuracy scores (High, Medium, Low). Finally, I filtered the dataset to include only participants in the experimental category with faster reaction time than the mean, providing a more structured dataset ready for further analysis.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.numeric_data <- clean_data %>%
select(reaction_time, accuracy, anxiety_pre, anxiety_post, anxiety_change)## Error in `select()`:
## ! Can't select columns that don't exist.
## ✖ Column `anxiety_change` doesn't exist.
corPlot(cor(numeric_data, use = "pairwise.complete.obs"),
numbers = TRUE,
upper = FALSE,
main = "Correlation Plot of Key Variables")## Error: object 'numeric_data' not found
Anxiety_pre and anxiety_post are the most highly correlated, which isn’t entirely unpredictable since anxiety levels before and after an experiment are anticipatable. Intriguingly, accuracy and reaction time have a very low correlation.