Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following: 1. Describe the key differences between nominal, ordinal, interval, and ratio data. Provide one example of each from psychological research.
In Nominal data, the data isn’t in a proper order or ranking. An example of this could be a person’s gender identity. Gender Identity: male, female, non-binary. In Ordinal Data, the data is lined up and ranked in a proper order. A research question example of this could be, Does Social Media Badly Impact a Student’s Study Habits? Strongly Disagree, Disagree, Neutral, Agree, or Strongly Agree. In Interval Data, it is ordered with equal spacing between values but has no true zero point. An example of this could be a student’s SAT scores. While Ratio Data is a data ordered with equal spacing and a true zero. An example of this could be a person’s numbers of hours slept. It could be from 0, 4, or 8 hours.
Scores on depression inventory (0-63) is Interval because they are numerical and have equal spacing between the values. Response time in milliseconds is Ratio because they’re a continuous numerical variable with the same intervals between values. Likert scale ratings of agreement (1-7) is Ordinal because the ratings of agreement is lined up and measured in a proper way. Diagnostic categories is Nominal because they are categories with no inherent ranking. Age in years is Ratio because age is a numerical variable with equal intervals.
Referring to Chapter 3 (Measurement Errors in Psychological Research):
In Random Error it is caused by unpredictable errors. Those unpredictable factors can affect the measurements resulting to false and inconsistent results. An example of a random error is when a researcher studies the number of red cars that pass through a street. This could affect the experiment and result to a random error if the researcher missed a fast car or miscounted the red cars that passed through. In Systematic Error, it is caused by an ongoing flaw in the system for example if the researcher wanted to calculate the speed of cars but consistently gets mixed results due to a fault on how they measured time.
Measurement error could happen when the data is not accurate to the variables being studied so for this situation, measurement error can affect the result when students don’t answer truthfully. Survey questions could also affect the results if they are misinterpreted by the students.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## [1] 321.6667
## [1] 304.5
## [1] 2273.467
## [1] 47.68088
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.#create cleaned_data Ctrl + shift + M for pipe
cleaned_data <- data %>%
mutate(anxiety_change = ifelse(accuracy >= 16, "High", "Low" ))
The pre anxiety scores shows that they are less than than the post anxiety scores especially in females in controlled condition.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
# Calculate mode
get_mode <- function(x) {
uniqv <- function(x)
uniqv[which.max(tabulate(match(x,uniqv)))]
}
get_mode(reaction_times)
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.#create cleaned_data Ctrl + Shift + M for pipe
cleaned_data <- data %>%
na.omit() %>%
# View the cleaned data
print (cleaned_data)
## Error in print.default(m, ..., quote = quote, right = right, max = max): invalid printing digits -2147483648
performance_category
that
categorizes participants based on their accuracy:
#create cleaned_data Ctrl + shift + M for pipe
cleaned_data <- data %>%
na.omit() %>%
rename(performance_category = performance accuracy) %>%
mutate(performance_group = ifelse(accuracy >= 90, "High", "Low")) %>%
remove_outliers(performance_category) %>%
mutate(performance_group = relevel(factor(performance_group), ref="Low"))
## Error in parse(text = input): <text>:4:45: unexpected symbol
## 3: na.omit() %>%
## 4: rename(performance_category = performance accuracy
## ^
Write your answer(s) here describing your data cleaning process.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
1. A research question that interest me in psychology is “How Does Early Childhood Trauma Affect Adult Relationships?” I would be collecting self-reported data with the use of a questionnaire and a survey. With this data, it will help me get to know, sort, and filter the participant’s responses into groups and categories that would help my research. The disadvantage of my research method is response bias. Participants may answer incorrectly due to wanting be accepted and to fit into the mold of what is socially acceptable. Recall inaccuracies can also occur because there is a chance that participants would likely forget what happened on their early childhood due to wanting to forget the bad things that happened in the past. They may also just not remember and create false experience from the past. 2. R has deepen my understanding of psychological statistics. I don’t have any background in coding whatsoever and this is the first time I have experienced it. I have to say that it is quite complex yet amazing how psychologists use the program to record statistical analysis that they use for research. The biggest advantage of R is how it is easily accessible and is not hidden through a pay wall which most programs are. It is also helpful in psychology because it has a ton of packages where psychology can benefit from. The disadvantage of R for me is learning the code. As someone who hasn’t had any coding experience it is quite hard to remember and input the code needed on the homework.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.