In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.
What is our independent variable? What is our dependent variable?
Independent variable: Congruency condition(does the word match the color it is printed in?) Dependent variable: Time(time is takes the participant to identify the correct color)
What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? The difference between the time elapsed to recognize congruent words and incongruent suggests that the Stroop Effect exists. To be precise, the average time difference between the congruent and incongruent word groups if significant, is indication that the Stroop Effect exists. An important pointer to note, the sample set is not extensive, the inference from the data does not include all potential participants from the world, hece observation means, SD have to be calculated to infer population means. Hence, we will use the two-sided t-test to verify our hypotheses. This is used because we are checking if there is a difference between the two tests. Also, due to the small sample size, t-test is appropriate as distributions can not be approximated to be normal.
# Load dependancies
suppressWarnings(library(tidyr))
suppressWarnings(library(dplyr))
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
suppressWarnings(library(nortest))
suppressWarnings(library(ggplot2))
# Read the data
data <- read.csv("/Users/SuhasYelluru/Documents/DataScience/Udacity Nanodegree/1. Statistics/stroopdata.csv")
head(data)
## Congruent Incongruent
## 1 12.079 19.278
## 2 16.791 18.741
## 3 9.564 21.214
## 4 8.630 15.687
## 5 14.669 22.803
## 6 12.238 20.878
# Summarize data
summary(data)
## Congruent Incongruent
## Min. : 8.63 Min. :15.69
## 1st Qu.:11.90 1st Qu.:18.72
## Median :14.36 Median :21.02
## Mean :14.05 Mean :22.02
## 3rd Qu.:16.20 3rd Qu.:24.05
## Max. :22.33 Max. :35.26
# Check shape of Congruent data
qqnorm(data$Congruent)
qqline(data$Congruent, col="red")
# Check shape of Incongruent data
qqnorm(data$Incongruent)
qqline(data$Incongruent, col="red")
# Add column to identify subjects
data.subject <- mutate(data, subject=1:nrow(data))
# Group data congruency into one variable and find average time
neat.data <- gather(data.subject, congruency, time, -subject)
neat.data %>% group_by(congruency) %>% summarise(mean(time),median(time),sd(time),var(time))
## # A tibble: 2 × 5
## congruency `mean(time)` `median(time)` `sd(time)` `var(time)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Congruent 14.05113 14.3565 3.559358 12.66903
## 2 Incongruent 22.01592 21.0175 4.797057 23.01176
# Plot Time vs Condition
ggplot(data=neat.data,
aes(x=subject, y=time, color=congruency))+
geom_line()
ggplot(neat.data,
aes(x=congruency, y=time, fill=congruency))+
geom_boxplot()
The two-tailed P value is less than 0.0001. this difference is considered to be extremely statistically significant. The mean of Congruent minus Incongruent ~ -7.96479. The 95% confidence interval of this difference: from -10.42 to -5.5 This indicates that the null hypothesis can be rejected, which goes to state that the Stroop effect exists.
t.test(data$Congruent,data$Incongruent)
##
## Welch Two Sample t-test
##
## data: data$Congruent and data$Incongruent
## t = -6.5323, df = 42.434, p-value = 6.51e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -10.424698 -5.504885
## sample estimates:
## mean of x mean of y
## 14.05113 22.01592