Test a Perceptual Phenomenon

Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

1. What is our independent variable? What is our dependent variable?

The independent variable is the congruency condition - whether the name of the color matches with the ink color.

The dependent variable is the time it takes to name the ink colors in equally-sized lists.

2. What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

One hypothesis we can use is: there is a difference between the time used to recognize colors under congruent words condition and incongruent words condition, namely, the Stroop Effect is in existence.

Specifically, here we referring to the population means of congruence words group and incongruence words group - average times for the respective groups to recognize the colors. By comparing these means directly, we’ll be able to tell whether there is a difference between the the two groups’ color recognition times. However, it wouldn’t be possible to do the experiment with all potential subjects in the world, so we need to work with the sample we have on hand to make inference about the population means, i.e., to use the observation means, sd and other statistics to infer about the population means. In this case, the observation is the difference between the two groups’ times. With this new data, we can construct new statistics such as means and standard errors.

To achieve this, We can use a two-sided paired student T-test to verify. This is because: one, we need to address the uncertainty in sample standard error resulted from the unknown population standard deviation; two, we are comparing the means of two groups that are dependent; three, the same subject is involved under both conditions.

Below is the hypothesis to test:

H0: mu_diff = 0 (The real difference between group population means is zero)

HA: mu_diff != 0 (The real difference between group population means is not zero)

3. Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.

# Read in the data
dat <- read.csv("C:/Users/George/Dropbox/WorkingDir/Udacity/P1/stroopdata.csv")
# Tidy up the data for later analysis
library(tidyr); suppressMessages(library(dplyr)) 
# Add a column identifying subjects
dat.subject <- mutate(dat, subject = 1:nrow(dat))
# Tidy up data by keeping one variable in one column
tidy.dat <- gather(dat.subject, congruency, time, -subject)
# Calculate the average time for both groups
tidy.dat %>%
    group_by(congruency) %>%
    summarise(mean(time), median(time), sd(time), var(time))

## Source: local data frame [2 x 5]
## 
##    congruency mean(time) median(time) sd(time) var(time)
##        (fctr)      (dbl)        (dbl)    (dbl)     (dbl)
## 1   Congruent   14.05113      14.3565 3.559358  12.66903
## 2 Incongruent   22.01592      21.0175 4.797057  23.01176

4. Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

library(ggplot2)
b <- ggplot(tidy.dat, aes(y = time, x = congruency, fill = congruency))
b + geom_boxplot()

h <- ggplot(tidy.dat, aes(x = time, fill = congruency))
h + geom_histogram()

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

The boxplot indicates that the two groups have significant difference in median times, and the two groups also have different ranges - with the Incongruent words group presenting much longer times.

The histograms confirms the previous observation. It also shows that both groups have evident outliers.

5. Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

Hypothesis: the two groups have significant difference in their mean times.

# H0: mu_diff = 0
# HA: mu_diff != 0

mu_diff <- 0 # the null value
dat.diff <- mutate(dat.subject, diff = Congruent - Incongruent) # add a new diff column
diff <- dat.diff$diff # grab all the diff values into a vector
sigma <- sd(diff) # sample sd
diff_bar <- mean(diff) # sample mean
n <- length(diff) # sample size
DF <- n - 1 # degree of freedom
SE <- sigma/sqrt(n) # standard error
# Calculate the T-statistic:
T <- (diff_bar- mu_diff)/SE; T

## [1] -8.020707

# Calculate the p-value
p_value <- pt(T, df = DF, lower.tail = TRUE) * 2; p_value

## [1] 4.103001e-08

# Build the confidence interval based on 5% confidence level
diff_bar + c(1, -1) * qt(.975, df = DF, lower.tail = FALSE) * SE

## [1] -10.019028  -5.910555

# Verify using the t.test() function
t.test(x=dat$Congruent, y=dat$Incongruent, alternative = "two.sided", mu = 0, paired = TRUE, conf.level = 0.95)

## 
##  Paired t-test
## 
## data:  dat$Congruent and dat$Incongruent
## t = -8.0207, df = 23, p-value = 4.103e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.019028  -5.910555
## sample estimates:
## mean of the differences 
##               -7.964792

Since the p-value is less than 0.05, we reject the null hypothesis and conclude that the difference between congruence and incongruence group time difference is statistically significant, namely, the stroop effect is present. This is in line with my expectation.

Based on the confidence intervals, we’re 95% confident that the true difference between the congruence and incongruence group average times is between -10.019028 and -5.910555.

6. Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

I think this effect is caused by the distraction resulted from the presence of the words and particularly, the wrongly labled words. Since humans are so sensitive to words that there is already a “conditioned reflex” established. Therefore, whenever a a word is present, eyes and brain will automatically capture and deciper them, which leads to a delay and a interference with the color recognition.

As per Wikipedia, “Warped words”" is a similiar situation. A similiear experiment involving hard to read words can be used to test the effect.

Test a Perceptual Phenomenon - Stroop Effect

George Liu

October 16, 2015