Replace “Your Name” with your actual name.
Please complete this exam on your own. Include your R code, interpretations, and answers within this document.
Read Chapter 2 (Types of Data Psychologists Collect) and answer the following:
Nominal: Uses numbers, words, or symbols to classify or categorize. These numbers/Symbols are used only as a label and do not indicate an amount or quantity of what is being counted. Examples of nominal categorization could be labeling by gender or ethnicity. Ordinal: Categorizes by order but not by interval. An example would be listing contestants who ran in a race by their place in the race without their times. So one would know what a person placed, but not by how much. Interval: Categorizes by rank and by definite interval between them. An example would be temperature, IQ scores, or by the time they placed in a race. Ratio Data: A scale with a zero point, that has equal interval, are in order of rank and is fully quantitative. An example would be taking an income of 100 people and placing them in order form zero income to highest income.
For each of the following variables, identify the appropriate level of measurement (nominal, ordinal, interval, or ratio) and explain your reasoning:
Referring to Chapter 3 (Measurement Errors in Psychological Research):
Explain the difference between random and systematic error, providing an example of each in the context of a memory experiment.
Random errors are just that, they are random. They are unpredictable and cause the measurements to vary from one another but generally cluster around the true value. Systematic errors are caused by equipment error can lead to measurements being far from the true value. An example of both in a memory experiment is a followed. If it is a random error, like the participants being tired on one day because of a long day of partying, as can happen with college students, then the data will not vary from re-test if taken on days the kids did not party. This is an example of random error. A system error can be the program used to test the participants might have a time glitch. The time glitch might be adding time to their response time or subtracting from it. If this happens, then the results might be far from what they should actually be and give totally wrong results.
How might measurement error affect the validity of a study examining the relationship between stress and academic performance? What steps could researchers take to minimize these errors?
Validity is the ability of a test to measure what it intends to measuring. Measuring error can lead to mismeasuring data and not accurately recording data. Possible culpirts of measuring error is noncalibrated instrumetns, poorly designed measuring tools, and human errors. One should first make sure that the procedure to measure stress and academic performance is done accuratly in order to measure the relationship of those two variables. It’s important to make sure that sleep, finances, and proper diet are all accounted for, and possibly an lot of other variebles, to make sure that there is no proceducral errors in the study. In addition, a proper sample and control group should be chosen for the test. Both should represent the population of which is to be studied. The researcher should separate themselves from the test as to not have any observer bias. All these steps are important to run a valid examination between the relationship of stress and academic performance. If these things are not taken into account, it is possible to have reduced reliability and validity. Which in turn can lead to misleading conclusions or worse, a complete wasted study. That is to say, that the study might have to be completly thrown away because of poor management.
The code below creates a simulated dataset for a psychological experiment. Run the below code chunk without making any changes:
# Create a simulated dataset
set.seed(123) # For reproducibility
# Number of participants
n <- 50
# Create the data frame
data <- data.frame(
participant_id = 1:n,
reaction_time = rnorm(n, mean = 300, sd = 50),
accuracy = rnorm(n, mean = 85, sd = 10),
gender = sample(c("Male", "Female"), n, replace = TRUE),
condition = sample(c("Control", "Experimental"), n, replace = TRUE),
anxiety_pre = rnorm(n, mean = 25, sd = 8),
anxiety_post = NA # We'll fill this in based on condition
)
# Make the experimental condition reduce anxiety more than control
data$anxiety_post <- ifelse(
data$condition == "Experimental",
data$anxiety_pre - rnorm(n, mean = 8, sd = 3), # Larger reduction
data$anxiety_pre - rnorm(n, mean = 3, sd = 2) # Smaller reduction
)
# Ensure anxiety doesn't go below 0
data$anxiety_post <- pmax(data$anxiety_post, 0)
# Add some missing values for realism
data$reaction_time[sample(1:n, 3)] <- NA
data$accuracy[sample(1:n, 2)] <- NA
# View the first few rows of the dataset
head(data)
## participant_id reaction_time accuracy gender condition anxiety_pre
## 1 1 271.9762 87.53319 Female Control 31.30191
## 2 2 288.4911 84.71453 Female Experimental 31.15234
## 3 3 377.9354 84.57130 Female Experimental 27.65762
## 4 4 303.5254 98.68602 Male Control 16.93299
## 5 5 306.4644 82.74229 Female Control 24.04438
## 6 6 385.7532 100.16471 Female Control 22.75684
## anxiety_post
## 1 29.05312
## 2 19.21510
## 3 20.45306
## 4 13.75199
## 5 17.84736
## 6 19.93397
Now, perform the following computations*:
## vars n mean sd median trimmed mad min max range
## participant_id 1 50 25.50 14.58 25.50 25.50 18.53 1.00 50.00 49.00
## reaction_time 2 47 299.36 44.78 295.83 298.57 43.09 201.67 408.45 206.78
## accuracy 3 48 86.50 9.23 86.53 86.52 8.65 61.91 106.87 44.97
## gender* 4 50 1.40 0.49 1.00 1.38 0.00 1.00 2.00 1.00
## condition* 5 50 1.38 0.49 1.00 1.35 0.00 1.00 2.00 1.00
## anxiety_pre 6 50 25.31 7.45 24.39 24.66 6.69 14.51 50.93 36.41
## anxiety_post 7 50 19.67 7.37 19.90 19.81 5.50 2.63 45.33 42.70
## skew kurtosis se
## participant_id 0.00 -1.27 2.06
## reaction_time 0.15 -0.36 6.53
## accuracy -0.05 -0.06 1.33
## gender* 0.40 -1.88 0.07
## condition* 0.48 -1.80 0.07
## anxiety_pre 0.98 1.39 1.05
## anxiety_post 0.30 1.87 1.04
anxiety_change
that represents the difference between pre
and post anxiety scores (pre minus post). Then calculate the mean
anxiety change for each condition.## vars n mean sd median trimmed mad min max range
## participant_id 1 50 25.50 14.58 25.50 25.50 18.53 1.00 50.00 49.00
## reaction_time 2 47 299.36 44.78 295.83 298.57 43.09 201.67 408.45 206.78
## accuracy 3 48 86.50 9.23 86.53 86.52 8.65 61.91 106.87 44.97
## gender* 4 50 1.40 0.49 1.00 1.38 0.00 1.00 2.00 1.00
## condition* 5 50 1.38 0.49 1.00 1.35 0.00 1.00 2.00 1.00
## anxiety_pre 6 50 25.31 7.45 24.39 24.66 6.69 14.51 50.93 36.41
## anxiety_post 7 50 19.67 7.37 19.90 19.81 5.50 2.63 45.33 42.70
## anxiety_change 8 50 5.64 3.30 5.07 5.30 2.86 -0.51 13.87 14.38
## skew kurtosis se
## participant_id 0.00 -1.27 2.06
## reaction_time 0.15 -0.36 6.53
## accuracy -0.05 -0.06 1.33
## gender* 0.40 -1.88 0.07
## condition* 0.48 -1.80 0.07
## anxiety_pre 0.98 1.39 1.05
## anxiety_post 0.30 1.87 1.04
## anxiety_change 0.79 0.19 0.47
The difference between Pre and Post is 5.64 with a standard divination of 3.3. 3.3 is a large standard divination which would imply that the post group is far lower in anxiety that the pre.
Using the concepts from Chapter 4 (Descriptive Statistics and Basic Probability in Psychological Research):
## [1] 0.2524925
## [1] 0.4950149
The probability of a participant having a reaction time of 450ms or greater is approximately 25%. The probability of a participant having a reaction time between 300ms and 400ms is approximately 50%.
Using the dataset created in Part 2, perform the following data cleaning and manipulation tasks:
clean_data
.## Error in parse(text = input): <text>:8:0: unexpected end of input
## 6:
## 7:
## ^
performance_category
that
categorizes participants based on their accuracy:
Write your answer(s) here describing your data cleaning process.
Using the psych package, create a correlation plot for the simulated dataset created in Part 2. Include the following steps:
corPlot()
function to create a correlation plot.# Your code here. Hint: first, with dplyr create a new dataset that selects only the numeric variable (reaction_time, accuracy, anxiety_pre, anxiety_post, and anxiety_change if you created it).
Write your answer(s) here
Reflect on how the statistical concepts and R techniques covered in this course apply to psychological research:
Describe a specific research question in psychology that interests you. What type of data would you collect, what statistical analyses would be appropriate, and what potential measurement errors might you need to address?
How has learning R for data analysis changed your understanding of psychological statistics? What do you see as the biggest advantages and challenges of using R compared to other statistical software?
Answer #1: I think a great psychological question to ask is, does kissing your significant other immediately after arriving at home from work lower stress levels? There are several ways to collect the data, questionnaires, interviews, and observation. A large population of the city would have to be randomly selected for both the control group and the group that would need to kiss their significant other immediately after arriving from home. The easiest and most hopeful way to collect data would be to choose addresses at random, for both groups, and have them fill out a questionnaire about their stress levels when they get home. Have one group begin to kiss their significant other when they arrive and the other not. As time progresses, have multiple follow up questionnaires 2 months apart for 6 months and compare both groups. The data collected would be their ages, genders, how long they have been a couple, their income level, number of children, and if they are in good general health. All of these variables may play a part in their stress levels from the beginning of the data collection to the end. The statistical analysis that would have to be done are both the descriptive and basic probability descriptive statistics. It would be important to have the mean, or average, median, mode, and variability. These types of statistics are used to give general understandings of where the population stands on stress levels and how they can be applied to the population in general. In addition to the descriptive, the probability statistics also add a great understanding of the data. With the probability statistics we get to understand how big a range stress levels are for the population and the standard divination between groups. The standard divination may shows differences of improvements by age, health levels, or their gender. The possible error which can affect the reliability and validity of the experiment can come from a variety of places. The first error can come from measurement error, that is that if participants come from a specific neighborhood, economical class, or any other niche, it can create results that do not represent the population as a whole but instead only a minority of it. An additional error can come from measurement error. A measurement error can come from poorly designed questionnaire or human error. The researcher could ask the questions so that those filling it out may not understand the question or those collecting the data may not be tabulating the data as per the design of the experiment. Part of that human error and also be observer bias. If a researcher believes that people will naturally have lower stress by kissing their significant other, they may subconsciously grade the questionnaires to better fit their ideal result.
Answer #2: Learning R has given me a greater understanding of what psychologist must do in order to show that their theories have, or do not have, merit. Because of this tool, psychologist have the advantage of being able to concentrate on the difficulties of the psychological methods instead of trying to write their own programs. They simply do not have to reinvent the wheel. By sticking to psychological methods of creating test, questionnaires, and experiments, their results can be quickly correlated without the need of learning or creating spread sheets of their own. R has already been created for easy use. More importantly, because R is so widely used, many other psychologists can read and understand the way the data was collected and can re-test any experiment done. This facilitates the scientific method of being able to reconstruct the experiment and validating it. By asserting the results through reliable and valid results, theories are supported by data or shown to be faulty. In either case, R is a powerful tool that can be used by psychologist and other researchers. The biggest challenge to using R is learning the program. Because it is such a powerful tool, it requires learning and understanding. Despite it being able to provide both descriptive and qualitative results quickly, it still requires understanding of both the program and psychological test methods and how to combine them. Once R is mastered, it becomes a valuable tool for any research wanting to tally raw data into easy understood quantities and graphs.
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Publish your assignment to RPubs and submit the URL to canvas.