The Science of Decisions

1. Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

2. Questions for Investigation

What is our independent variable? What is our dependent variable?

Independent variable: Congruency condition(does the word match the color it is printed in?) Dependent variable: Time(time is takes the participant to identify the correct color)
What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? The difference between the time elapsed to recognize congruent words and incongruent suggests that the Stroop Effect exists. To be precise, the average time difference between the congruent and incongruent word groups if significant, is indication that the Stroop Effect exists. An important pointer to note, the sample set is not extensive, the inference from the data does not include all potential participants from the world, hece observation means, SD have to be calculated to infer population means. Hence, we will use the two-sided t-test to verify our hypotheses. This is used because we are checking if there is a difference between the two tests. Also, due to the small sample size, t-test is appropriate as distributions can not be approximated to be normal.

# Load dependancies 
suppressWarnings(library(tidyr))
suppressWarnings(library(dplyr))

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

suppressWarnings(library(nortest))
suppressWarnings(library(ggplot2))

# Read the data
data <- read.csv("/Users/SuhasYelluru/Documents/DataScience/Udacity Nanodegree/1. Statistics/stroopdata.csv")
head(data)

##   Congruent Incongruent
## 1    12.079      19.278
## 2    16.791      18.741
## 3     9.564      21.214
## 4     8.630      15.687
## 5    14.669      22.803
## 6    12.238      20.878

# Summarize data
summary(data)

##    Congruent      Incongruent   
##  Min.   : 8.63   Min.   :15.69  
##  1st Qu.:11.90   1st Qu.:18.72  
##  Median :14.36   Median :21.02  
##  Mean   :14.05   Mean   :22.02  
##  3rd Qu.:16.20   3rd Qu.:24.05  
##  Max.   :22.33   Max.   :35.26

# Check shape of Congruent data
qqnorm(data$Congruent)
qqline(data$Congruent, col="red")

# Check shape of Incongruent data
qqnorm(data$Incongruent)
qqline(data$Incongruent, col="red")

# Add column to identify subjects
data.subject <- mutate(data, subject=1:nrow(data))
# Group data congruency into one variable and find average time
neat.data <- gather(data.subject, congruency, time, -subject)
neat.data %>% group_by(congruency) %>% summarise(mean(time),median(time),sd(time),var(time))

## # A tibble: 2 × 5
##    congruency `mean(time)` `median(time)` `sd(time)` `var(time)`
##         <chr>        <dbl>          <dbl>      <dbl>       <dbl>
## 1   Congruent     14.05113        14.3565   3.559358    12.66903
## 2 Incongruent     22.01592        21.0175   4.797057    23.01176

# Plot Time vs Condition
ggplot(data=neat.data,
        aes(x=subject, y=time, color=congruency))+
     geom_line()

ggplot(neat.data,
        aes(x=congruency, y=time, fill=congruency))+
   geom_boxplot()

3. Results and Discussion

The two-tailed P value is less than 0.0001. this difference is considered to be extremely statistically significant. The mean of Congruent minus Incongruent ~ -7.96479. The 95% confidence interval of this difference: from -10.42 to -5.5 This indicates that the null hypothesis can be rejected, which goes to state that the Stroop effect exists.

t.test(data$Congruent,data$Incongruent)

## 
##  Welch Two Sample t-test
## 
## data:  data$Congruent and data$Incongruent
## t = -6.5323, df = 42.434, p-value = 6.51e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.424698  -5.504885
## sample estimates:
## mean of x mean of y 
##  14.05113  22.01592

The Science of Decisions - Stroop Effect

Suhas Yelluru, Fremont-CA

4/27/2017

1. Background Information

2. Questions for Investigation

3. Results and Discussion