1. Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

2. Questions for Investigation

# Load dependancies 
suppressWarnings(library(tidyr))
suppressWarnings(library(dplyr))
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
suppressWarnings(library(nortest))
suppressWarnings(library(ggplot2))

# Read the data
data <- read.csv("/Users/SuhasYelluru/Documents/DataScience/Udacity Nanodegree/1. Statistics/stroopdata.csv")
head(data)
##   Congruent Incongruent
## 1    12.079      19.278
## 2    16.791      18.741
## 3     9.564      21.214
## 4     8.630      15.687
## 5    14.669      22.803
## 6    12.238      20.878
# Summarize data
summary(data)
##    Congruent      Incongruent   
##  Min.   : 8.63   Min.   :15.69  
##  1st Qu.:11.90   1st Qu.:18.72  
##  Median :14.36   Median :21.02  
##  Mean   :14.05   Mean   :22.02  
##  3rd Qu.:16.20   3rd Qu.:24.05  
##  Max.   :22.33   Max.   :35.26
# Check shape of Congruent data
qqnorm(data$Congruent)
qqline(data$Congruent, col="red")

# Check shape of Incongruent data
qqnorm(data$Incongruent)
qqline(data$Incongruent, col="red")

# Add column to identify subjects
data.subject <- mutate(data, subject=1:nrow(data))
# Group data congruency into one variable and find average time
neat.data <- gather(data.subject, congruency, time, -subject)
neat.data %>% group_by(congruency) %>% summarise(mean(time),median(time),sd(time),var(time))
## # A tibble: 2 × 5
##    congruency `mean(time)` `median(time)` `sd(time)` `var(time)`
##         <chr>        <dbl>          <dbl>      <dbl>       <dbl>
## 1   Congruent     14.05113        14.3565   3.559358    12.66903
## 2 Incongruent     22.01592        21.0175   4.797057    23.01176
# Plot Time vs Condition
ggplot(data=neat.data,
        aes(x=subject, y=time, color=congruency))+
     geom_line()

ggplot(neat.data,
        aes(x=congruency, y=time, fill=congruency))+
   geom_boxplot()

3. Results and Discussion

The two-tailed P value is less than 0.0001. this difference is considered to be extremely statistically significant. The mean of Congruent minus Incongruent ~ -7.96479. The 95% confidence interval of this difference: from -10.42 to -5.5 This indicates that the null hypothesis can be rejected, which goes to state that the Stroop effect exists.

t.test(data$Congruent,data$Incongruent)
## 
##  Welch Two Sample t-test
## 
## data:  data$Congruent and data$Incongruent
## t = -6.5323, df = 42.434, p-value = 6.51e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.424698  -5.504885
## sample estimates:
## mean of x mean of y 
##  14.05113  22.01592