- University - Western Governors University
- Course - C749 - Introduction to Data Science
- Publish Date - 09/05/2017
Overview
The stroop task consists of completing a test by which contains a list of words displayed in an alternative color of ink or background that is different from the printed word. The purpose of this task/test is to analyze the length of time it takes to identify the printed word within a given word set. The test generally consists of two word conditions congruent vs. incongruent.
This document contains an analysis of the stroop effect by measuring sample data containing incongruent vs. congruent time results.
The project contains a set of Questions for Investigation, for consistency purposes the list of questions will be provided below and can be identified with a header following this pattern Q# - Question.
Questions for Investigation
- What is our independent variable? What is our dependent variable?
- What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.
- Report some descriptive statistic regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.
- Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.
- Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations.
Q1: What is our independent variable? What is our dependent variable?
The given data set stroopdata.csv contains both an independent and dependent variable. Although at first glance the independent variable may not appear directly, we can determine based on contextual information that the independent variable is the the word list which was used to generate the sample data set.
The dependent variable within the given data set is the response time in seconds from all participants (24).
Q3 Report some descriptive statistic regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.
In order to accurately report a descriptive statistic we must first define variables that will be used for reporting.
Define Variables for Descriptive Analysis
# load sample data and assign congruent incongruent
sampledata <- read.csv("stroopdata.csv");
congruent = sampledata$Congruent;
incongruent = sampledata$Incongruent;
# define central/variability tendancies
congruent_mean = round(mean(congruent), 2);
congruent_median = round(median(congruent), 2);
congruent_sd = round(sd(congruent), 2);
incongruent_mean = round(mean(incongruent), 2);
incongruent_median = round(median(incongruent), 2);
incongruent_sd = round(sd(incongruent), 2);
sampledata_summary = summary(sampledata);
# altogether , print out each variable in it's context
cat(str_interp("
Congruent Mean: ${congruent_mean}
Congruent Median: ${congruent_median}
Congruent Standard Deviation: ${congruent_sd}
Incongruent Mean: ${incongruent_mean}
Incongruent Median: ${incongruent_median}
Incongruent Standard Deviation: ${incongruent_sd}
"))
Congruent Mean: 14.05
Congruent Median: 14.36
Congruent Standard Deviation: 3.56
Incongruent Mean: 22.02
Incongruent Median: 21.02
Incongruent Standard Deviation: 4.8
Min. 1st Qu. Median Mean 3rd Qu. Max.
8.63 11.90 14.36 14.05 16.20 22.33
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.69 18.72 21.02 22.02 24.05 35.26
# show the data in a table for quick analysis
DT::datatable(data=sampledata)
Q4: Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.
#plot to show the distribution of data and comparison between congruent vs. incongruent.
p1 <- plot_ly(sampledata, x = ~congruent, type = "box", name = "Congruent") %>%
add_trace(name = "Incongruent", x = ~incongruent, type = "box") %>%
layout(xaxis = list(title = "Time in Seconds"), width = 800, autosize = F, margin = list( l = 100, b = 50))
Specifying width/height in layout() is now deprecated.
Please specify in ggplotly() or plot_ly()
Observing the Plot
- While analyzing the box plots we can determine that there are two distinct outliers within the Incongruent dataset.
- We can also determine that the Higher Third Quartile of the Congruent dataset is consistent with the Median Quartile of the Incongruent data.
Conclusion
During the statistical analysis I started by evaluating the t-critical which was determined to be 8.026. If we take another look at the formula defined above we would expect a the value to be < 1.714 in order for our Null Hypothesis to be accepted. According to the formula our results are \(t-statistic = 8.026 > 1.714\). Due to these findings we have no choice but to reject our null hypothesis. If we recall the null hypothesis stated that there were no significant differences in the response time between congruent and incongruent data.
