Dataset Overview: In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed. In each case, the time it takes to name the ink colors in equally-sized lists were measured through an experiment. The dataset contains in each row the performance for one participant by word type.

Variables: ID : participant id Congruent: total time for a participant for congruent words condition Incongruent: total time for a participant for incongruent words condition

Scope of Analysis: Our initial assumption is on an average the time a participant takes to classify incongruent words condition should be more than the congruent words condition. We will also try to see what relationships exists among the variables of interest in our analysis.

## [1] "/Users/animesh01/Desktop/Udacity-R/R-Project"

Checking the first 6 rows of dataframe

##   ID Congruent Incongruent
## 1  1    12.079      19.278
## 2  2    16.791      18.741
## 3  3     9.564      21.214
## 4  4     8.630      15.687
## 5  5    14.669      22.803
## 6  6    12.238      20.878

Summary statistics on the dataframe

##        ID          Congruent      Incongruent   
##  Min.   : 1.00   Min.   : 8.63   Min.   :15.69  
##  1st Qu.: 6.75   1st Qu.:11.90   1st Qu.:18.72  
##  Median :12.50   Median :14.36   Median :21.02  
##  Mean   :12.50   Mean   :14.05   Mean   :22.02  
##  3rd Qu.:18.25   3rd Qu.:16.20   3rd Qu.:24.05  
##  Max.   :24.00   Max.   :22.33   Max.   :35.26

First 6 rows of reshaped data

##   ID  WordType TotalTime
## 1  1 Congruent    12.079
## 2  2 Congruent    16.791
## 3  3 Congruent     9.564
## 4  4 Congruent     8.630
## 5  5 Congruent    14.669
## 6  6 Congruent    12.238

The dependent variable in our analysis is ‘TotalTime’ and the independent variable is ‘WordType’. As the time it takes to classify a word depends on the word type.

Summary statistics on reshaped data

##        ID               WordType    TotalTime    
##  Min.   : 1.00   Congruent  :24   Min.   : 8.63  
##  1st Qu.: 6.75   Incongruent:24   1st Qu.:14.42  
##  Median :12.50                    Median :17.73  
##  Mean   :12.50                    Mean   :18.03  
##  3rd Qu.:18.25                    3rd Qu.:21.17  
##  Max.   :24.00                    Max.   :35.26

Univariate Analysis

Observation: The above visualization is a combination of 2 univariate plots: histogram and density plot. The y axis is labeled as per density scale. As we can see from the above visualization the Congruent variable is not uniformly distributed & normally distributed. Infact we can observe that the visual is right skewed distribution. Hence, we will need to perform some transformation in order to make it a normal distributed visualization.

Summary Statistics before log transformation

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.63   11.90   14.36   14.05   16.20   22.33

Observation: From the above visualization we can infer that the distribution is still not normally distributed therefore we will need to perform some alternate transformation procedure in order to make it normally distributed.

Summary statistics after log transformation

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.155   2.476   2.664   2.612   2.785   3.106

Observation: From the above summary statictics table we can infer that the scale has reduced from 8.63-22.33 to 2.155-3.106 after performing the log transformation.

Observation: The above visualization is a combination of 2 univariate plots: histogram and density plot. The y axis is labeled as per density scale. As we can see from the above visualization the Incongruent variable is not uniformly distributed & normally distributed. Infact we can observe that the visual is right skewed distribution. Hence, we will need to perform some transformation in order to make it a normal distributed visualization.

Summary Statistics before log transformation

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.69   18.72   21.02   22.02   24.05   35.26

Observation: From the above visualization we can infer that the distribution is still not normally distributed therefore we will need to perform some alternate transformation procedure in order to make it normally distributed.

Summary Statistics after log transformation

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.753   2.929   3.045   3.072   3.180   3.563

Observation: From the above summary statictics table we can infer that the scale has reduced from 15.69-35.26 to 2.75-3.56 after performing the log transformation.

Bivariate Analysis

Observation: From the above visualization we can infer that there is a positive linear relationship between the congruent and incongruent attribute in our dataframe. We have used a regression line to best fit the linear model in our analysis.

Finding the correlation coefficient between congruent & incongruent variable

## [1] 0.3518195

Observation: We have a correlation coefficient value of 0.35 in our analysis which depicts a weak uphill (positive) linear relationship.

Observation: The above visualization is a combination of 2 wordtypes: congruent and incongruent wordtypes (nominal data). The y axis is labeled as per density scale. We are comparing the distribution based on the wordtype in our dataframe. As we can see from the above visualization the Incongruent variable is less right skewed compared to congruent variable.

## $stats
##         [,1]    [,2]
## [1,]  8.6300 15.6870
## [2,] 11.7115 18.6925
## [3,] 14.3565 21.0175
## [4,] 16.3975 24.2090
## [5,] 22.3280 26.2820
## 
## $n
## [1] 24 24
## 
## $conf
##          [,1]     [,2]
## [1,] 12.84519 19.23834
## [2,] 15.86781 22.79666
## 
## $out
## [1] 35.255 34.288
## 
## $group
## [1] 2 2
## 
## $names
## [1] "Congruent"   "Incongruent"

Observation: The above visualization helps us to figure out if we have any outliers in our distribution by wordtype. In our analysis we can observe that incongruent variable has 2 outliers which have total time more than 30 however there are no outliers in case of congruent attribute.

Measure of Variability:

Range: Congruent : 13.69 Incongruent: 10.6

IQR (Inter Quartile Range): Congruent : Q3-Q1 = 4.686 Incongruent: Q3-Q1 = 5.5165

Measure of Central Tendency:

Mean: Congruent : 14.05 Incongruent: 22.02

Median: Congruent : 14.3565 Incongruent: 21.0175

Observation: We have used a regression line to fit the linear model by wordtype. From the above visualization we can infer that across the plot points by wordtype it takes more time to classify an incongruent variable compared to a congruent variable. This analysis aligns with our assumption.

Hypothesis Testing

For the hypothesis testing we will use the stroop dataframe. Since the sample size is 24 for both Congruent & Incongruent wordtypes and we have a sample size less than 30 for individual wordtypes hence we will use t test for our hypothesis testing. We will need to use paired t-test to compare two population means wherein we have two samples in which observations in one sample can be paired with observations in the other sample.

The hypothesis is as follows: u1 = population mean time to classify congruent wordtype, u2 = population mean time to classify incongruent wordtype

Null Hypothesis (H0): u1 = u2 (Population mean time to classify both the wordtype is said to be the same) Alternate Hypothesis (H1): u1 < u2 (Population mean time to classify incongruent word type is greater than congruent wordtype)

T-Test Results

## 
##  Paired t-test
## 
## data:  Stroop$Congruent and Stroop$Incongruent
## t = -8.0207, df = 23, p-value = 4.103e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.019028  -5.910555
## sample estimates:
## mean of the differences 
##               -7.964792

Observation: We performed paired t test wherein the confidence level was set to 95%. From the results we can infer that p values is less than 0.05 therefore we will reject null hypothesis and our alternate hypothesis is true which suggests that the mean time to classify incongruent word type is greater than congruent wordtype. Which confirms our initial assumption.

Observation: A simple plot of difference between one sample and the other. Points below the blue line indicate observations where Incongruent is greater than Congruent wordtype. Therefore conditions wherein (Congruent - Incongruent) is negative. As we can observe from the above visualization in all the conditions the time take to classify Incongruent wordtype is greater than congruent wordtype.

Observation: In the above visualization we can observe histogram plotted on difference between the two populations from a paired t-test. Bins with negative values indicate observations with higher value for Incongruent vs Congruent wordtype.

Conclusion

  1. The dependent variable in our analysis is ‘TotalTime’ and the independent variable is ‘WordType’. As the time it takes to classify a word depends on the word type.

  2. The appropriate set of hypothesis for this task are as follows: Null Hypothesis (H0): u1 = u2 (Population mean time to classify both the wordtype is said to be the same) Alternate Hypothesis (H1): u1 < u2 (Population mean time to classify incongruent word type is greater than congruent wordtype) We performed paired t test and concluded our results based on p value which was 4.103e-08 which is less than 0.05 therefore our alternate hypothesis was true.

  3. Summary Statistics are as follow: Measure of Central Tendency: Mean (Congruent : 14.05 , Incongruent: 22.02) Median (Congruent : 14.3565, Incongruent: 21.0175) Measure of Variability: Range (Congruent : 13.69 , Incongruent: 10.6) Inter Quartile Range (Congruent : 4.686, Incongruent : 5.5165)

  4. We were able to conclude through paired t test that the mean time to classify incongruent word type is greater than congruent wordtype which confirmed our initial assumption.