Rensselaer Polytechnic Institute

Version 1

1. Setting

System under test:

In this recipe we will consider data from an online personality test. This personality test asks 50 questions that the participant must answer with a choice of five levels of varying agreement (1 = disagree, 2 = slightly disagree, 3 = neutral, 4 = slightly agree, 5 = agree). Each question corresponds to one of five personality traits that are being tested. Scores for each trait are then calculated using the answers provided. For the purposes of this example we are only concerned with one personality trait, extroversion, and the possible effect different attributes of the participants might have on the extroversion score they achieve.

Factors and Levels:

For my analysis I chose the following four factors:

  • Age: 6 levels
    +(13-20),(20-29),(30-39),(40-49),(50-59),(>=60)
  • Gender: 3 levels
    +Male = 1, Female = 2, Other = 3
  • Dominate Hand: 3 levels
    +right = 1, left = 2, both = 3
  • Native English Speaker: 2 levels
    +yes = 1, no = 2

Continuous Variables:

There are no continuous variables in this recipe.

Response Variable:

The response variable in this recipe is extroversion score. This variable was not included in the original data set but was calculated using the standard formula for the total extroversion score based on the responses to the 10 questions (E1-E10) in the personality test. The values for extroversion score (E_Total) range from 0 to 40 and were calculated using the below formula.
\[E~Total~ = 20 + E1 - E2 + E3 - E4 + E5 - E6 + E7 - E8 + E9 - E10\]

The Data:

The original data set includes 19,719 observations of 57 variables. The first 7 variables collect information about the test participant and the rest are the participant’s responses to the 50 questions regarding the five personality traits. Since we are only looking at variables relating to extroversion and will only consider four independent factors, the original data set was modified to include only the relevant variables.

head(extro)
##   age engnat gender hand E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E_Total
## 1  53      1      1    1  4  2  5  2  5  1  4  3  5   1      34
## 2  46      1      2    1  2  2  3  3  3  3  1  5  1   5      12
## 3  14      2      2    1  5  1  1  4  5  1  1  5  5   1      25
## 4  19      2      2    1  2  5  2  4  3  4  3  4  4   5      12
## 5  25      2      2    1  3  1  3  3  3  1  3  1  3   5      24
## 6  31      1      2    1  1  5  2  4  1  3  2  4  1   5       6

A closer look at the data frame revealed that there were many impossible values for the age factor as well as some impossible values for gender, dominate hand, and native English speaker. As you can see below, the mean value of age was 50767 and the maximum value was 999999999. You can also see that the minimum value for the other three factors is zero which does not correspond to any of the prescribed levels.

summary(extro)
##       age                engnat          gender           hand     
##  Min.   :       13   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:       18   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.00  
##  Median :       22   Median :1.000   Median :2.000   Median :1.00  
##  Mean   :    50767   Mean   :1.365   Mean   :1.617   Mean   :1.13  
##  3rd Qu.:       31   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.00  
##  Max.   :999999999   Max.   :2.000   Max.   :3.000   Max.   :3.00  
##        E1              E2             E3              E4       
##  Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:2.00   1st Qu.:3.000   1st Qu.:2.000  
##  Median :3.000   Median :3.00   Median :4.000   Median :3.000  
##  Mean   :2.629   Mean   :2.76   Mean   :3.417   Mean   :3.152  
##  3rd Qu.:4.000   3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.00   Max.   :5.000   Max.   :5.000  
##        E5              E6              E7              E8       
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :4.000   Median :2.000   Median :3.000   Median :3.000  
##  Mean   :3.432   Mean   :2.453   Mean   :2.867   Mean   :3.376  
##  3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##        E9             E10           E_Total     
##  Min.   :0.000   Min.   :0.000   Min.   : 0.00  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:13.00  
##  Median :3.000   Median :4.000   Median :20.00  
##  Mean   :3.094   Mean   :3.585   Mean   :20.11  
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:27.00  
##  Max.   :5.000   Max.   :5.000   Max.   :40.00

To fix this all rows with an age value greater than 122 were deleted from the data set. I chose a value of 122 as the cutoff because the age of the oldest person ever recorded was 122. I also extracted all rows that had a value of zero for any of the other factors.

clean_extro <- subset(extro, age <= 122)
clean_extro <- subset(clean_extro, gender > 0)
clean_extro <- subset(clean_extro, hand >0)
clean_extro <- subset(clean_extro, engnat >0)

Finally, I grouped the many levels of age into the six levels I described above. I determined the levels based on 10 year intervals from 20 to 60. Participants younger than 20 were grouped together and participants 60 and older were grouped together.

The resulting data had the below structure.

str(clean_extro)
## 'data.frame':    19452 obs. of  15 variables:
##  $ age    : Factor w/ 6 levels "<20","20-29",..: 5 4 1 1 2 3 2 2 3 1 ...
##  $ engnat : int  1 1 2 2 2 1 1 2 1 1 ...
##  $ gender : int  1 2 2 2 2 2 2 1 2 2 ...
##  $ hand   : int  1 1 1 1 1 1 1 1 3 1 ...
##  $ E1     : int  4 2 5 2 3 1 5 4 3 1 ...
##  $ E2     : int  2 2 1 5 1 5 1 3 1 4 ...
##  $ E3     : int  5 3 1 2 3 2 5 5 5 2 ...
##  $ E4     : int  2 3 4 4 3 4 1 3 1 5 ...
##  $ E5     : int  5 3 5 3 3 1 5 5 5 2 ...
##  $ E6     : int  1 3 1 4 1 3 1 1 1 4 ...
##  $ E7     : int  4 1 1 3 3 2 5 4 5 1 ...
##  $ E8     : int  3 5 5 4 1 4 4 3 2 4 ...
##  $ E9     : int  5 1 5 4 3 1 4 4 5 1 ...
##  $ E10    : int  1 5 1 5 5 5 1 3 3 5 ...
##  $ E_Total: num  34 12 25 12 24 6 36 29 35 5 ...

2. Experimental Design

This experiment will be conducted as a 6 X 3 X 3 X 2 Factorial Design. 1-Way ANOVA tests will be conducted for each of the four factors.

2-Way ANOVA tests will be conducted for all of the 2 factor interactions. In each case the null hypothesis will be as follows

While I am treating this experiment as a factorial design, there are numerous assumption violations and design flaws that I will now discuss.

Randomize:

Participants in this personality test were self selected. In a true factorial design participants would be randomly selected from a normal population. For each combination of factors there would be the same number of participants. Then there would be a randomly determined run order for all of the combinations.

Because we have no control over the participants and the order in which they took the test, we do not satisfy the randomization requirements of factorial design.

Replicate:

There is replication as there were multiple observations for each combination of factors. However, because there is no control over who participated in the test, there are not the same number of replications for each combination of factors.

There are not repeated measures since every participant only took the test once. Although it is possible that the same participant took the test multiple times, for the purposes of this experiment we will assume that is not the case.

Block:

I did not use blocking in my design. In a true factorial design I would have identified nuisance factors and implemented controls to limit the potential effects on the response variable.

3. Statistical Analysis

Statistical Analysis was done in RStudio and including graphical analysis and ANOVA testing.

Exploratory Data Analysis:

In this section we will look at descriptive summary statistics and a series of graphical representations of the data.

Descriptive Statistics Summary

summary(clean_extro)
##     age           engnat          gender           hand      
##  <20  :6705   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  20-29:7471   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  30-39:2533   Median :1.000   Median :2.000   Median :1.000  
##  40-49:1524   Mean   :1.369   Mean   :1.619   Mean   :1.135  
##  50-59: 877   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.000  
##  60<= : 342   Max.   :2.000   Max.   :3.000   Max.   :3.000  
##        E1              E2             E3              E4       
##  Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:2.00   1st Qu.:3.000   1st Qu.:2.000  
##  Median :3.000   Median :3.00   Median :4.000   Median :3.000  
##  Mean   :2.629   Mean   :2.76   Mean   :3.418   Mean   :3.153  
##  3rd Qu.:4.000   3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.00   Max.   :5.000   Max.   :5.000  
##        E5              E6              E7              E8       
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :4.000   Median :2.000   Median :3.000   Median :3.000  
##  Mean   :3.433   Mean   :2.452   Mean   :2.868   Mean   :3.377  
##  3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##        E9             E10           E_Total     
##  Min.   :0.000   Min.   :0.000   Min.   : 0.00  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:13.00  
##  Median :3.000   Median :4.000   Median :20.00  
##  Mean   :3.094   Mean   :3.587   Mean   :20.11  
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:27.00  
##  Max.   :5.000   Max.   :5.000   Max.   :40.00

Graphical Analysis

First we will look a simple histogram of our response variable, Extroversion Score.

As you can see, Extroversion Score is normally distributed between 0 and 40.

Next we will analyze the main effects of the four factors using a boxplot.

From this boxplot we observe some difference in the median response values for each level but overall the responses seem fairly consistent.

From this boxplot there seems to be a fairly significant difference between participants that selected Other for Gender as compared to those that selected either Male or Female.

From this boxplot there seems to be almost no difference in response among the three levels.

From this boxplot we observe a slight difference in response between Native and Non-Native English speakers.

Testing

ANOVA test for one factor

For each of these tests we will assume an alpha value equal to 0.05.

ANOVA test for Age factor:

##                    Df  Sum Sq Mean Sq F value Pr(>F)    
## clean_extro$age     5   18649    3730    44.3 <2e-16 ***
## Residuals       19446 1636998      84                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This resulted in a p-value of <2e-16 which allows us to reject our null hypothesis and conclude that the difference in variation between the levels of age is due to something more than natural variation.

ANOVA test for Gender factor:

##                       Df  Sum Sq Mean Sq F value  Pr(>F)    
## clean_extro$gender     1    2778    2778   32.69 1.1e-08 ***
## Residuals          19450 1652869      85                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This resulted in a p-value of 1.1e-8 which allows us to reject our null hypothesis and conclude that the difference in variation between the levels of gender is due to something more than natural variation.

ANOVA test for Dominate Hand factor:

##                     Df  Sum Sq Mean Sq F value Pr(>F)  
## clean_extro$hand     1     250  250.12   2.939 0.0865 .
## Residuals        19450 1655396   85.11                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This resulted in a p-value of 0.0865 which does not allow us to reject our null hypothesis that there is no explained variance between levels of Hand Dominance.

ANOVA test for Native English Speaker factor:

##                       Df  Sum Sq Mean Sq F value   Pr(>F)    
## clean_extro$engnat     1    2212    2212   26.02 3.41e-07 ***
## Residuals          19450 1653435      85                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This resulted in a p-value of 3.41e-7 which allows us to reject our null hypothesis and conclude that the difference in variation between the levels of Native English Speakers is due to something more than natural variation.

ANOVA test for two factors and interaction plots.

Next we will look at interaction effects using ANOVA tests for two factors at a time. Since there are four factors, we will conduct six two-factor ANOVA tests. Will we also look at interaction plots for each pair of factors. Again we will assume an alpha value of 0.05 in each case.

Age and Gender

##                                       Df  Sum Sq Mean Sq F value   Pr(>F)
## clean_extro$age                        5   18649    3730  44.438  < 2e-16
## clean_extro$gender                     1    3442    3442  41.014 1.55e-10
## clean_extro$age:clean_extro$gender     5    1944     389   4.632 0.000316
## Residuals                          19440 1631612      84                 
##                                       
## clean_extro$age                    ***
## clean_extro$gender                 ***
## clean_extro$age:clean_extro$gender ***
## Residuals                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Gender 1 = Male, 2 = Female, and 3 = Other.

From the ANOVA test we get a p-value of 0.000316 which allows us to reject the null hypothesis that the interaction of Age and Gender produce only unexplained differences in variation.

By looking at the interaction plot we see that while Male and Female lines run parallel to each other, the line for Other intersects. Also, we observe that the Female line is always above the Male line which could suggest a main effect for Gender. Likewise the positive slope of the Male and Female gender lines suggests that there could be main effect for Age as well.

Age and Dominate Hand

##                                     Df  Sum Sq Mean Sq F value Pr(>F)    
## clean_extro$age                      5   18649    3730  44.307 <2e-16 ***
## clean_extro$hand                     1     185     185   2.202  0.138    
## clean_extro$age:clean_extro$hand     5     373      75   0.886  0.490    
## Residuals                        19440 1636440      84                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Dominate Hand 1 = Right, 2 = Left, 3 = Other.

From the ANOVA test we get a p-value of 0.490 which does not allow us to reject the null hypothesis.

By looking at the interaction plot there is nothing that points to either interaction or main effects.

Age and Native English Speaker

##                                       Df  Sum Sq Mean Sq F value   Pr(>F)
## clean_extro$age                        5   18649    3730  44.385  < 2e-16
## clean_extro$engnat                     1    2126    2126  25.304 4.94e-07
## clean_extro$age:clean_extro$engnat     5    1300     260   3.095  0.00853
## Residuals                          19440 1633571      84                 
##                                       
## clean_extro$age                    ***
## clean_extro$engnat                 ***
## clean_extro$age:clean_extro$engnat ** 
## Residuals                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Native English Speaker 1 = Yes, 2 = No.

From the ANOVA test we get a p-value of 0.00853 which allows us to reject the null hypothesis that the interaction of Age and Native English Speaker produce only unexplained differences in variation.

Also, by looking at the interaction plot we see that the lines for Native and Non-Native English Speakers cross twice which suggests a possible interaction effect. There does not seem to be a main effect for either factor based on the plot.

Gender and Dominate Hand

##                                        Df  Sum Sq Mean Sq F value  Pr(>F)
## clean_extro$gender                      1    2778  2777.6  32.691 1.1e-08
## clean_extro$hand                        1     235   234.7   2.762  0.0965
## clean_extro$gender:clean_extro$hand     1     216   216.2   2.545  0.1107
## Residuals                           19448 1652418    85.0                
##                                        
## clean_extro$gender                  ***
## clean_extro$hand                    .  
## clean_extro$gender:clean_extro$hand    
## Residuals                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Gender 1 = Male, 2 = Female, and 3 = Other. Recall that for Dominate Hand 1 = Right, 2 = Left, 3 = Other.

From the ANOVA test we get a p-value of 0.1107 which does not allow us to reject the null hypothesis.

Also, by looking at the interaction plot we see no obvious indications of interaction or main effects.

Gender and Native English Speaker

##                                          Df  Sum Sq Mean Sq F value
## clean_extro$gender                        1    2778  2777.6  32.718
## clean_extro$engnat                        1    1831  1830.7  21.564
## clean_extro$gender:clean_extro$engnat     1       7     6.7   0.079
## Residuals                             19448 1651031    84.9        
##                                         Pr(>F)    
## clean_extro$gender                    1.08e-08 ***
## clean_extro$engnat                    3.44e-06 ***
## clean_extro$gender:clean_extro$engnat    0.779    
## Residuals                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Gender 1 = Male, 2 = Female, and 3 = Other. Recall that for Native English Speaker 1 = Yes, 2 = No.

From the ANOVA test we get a p-value of 0.779 which does not allow us to reject the null hypothesis.

Also, by looking at the interaction plot we see no obvious indications of interaction or main effects.

Dominate Hand and Native English Speaker

##                                        Df  Sum Sq Mean Sq F value   Pr(>F)
## clean_extro$hand                        1     250   250.1   2.943   0.0863
## clean_extro$engnat                      1    2324  2323.6  27.338 1.73e-07
## clean_extro$hand:clean_extro$engnat     1      69    68.8   0.810   0.3682
## Residuals                           19448 1653004    85.0                 
##                                        
## clean_extro$hand                    .  
## clean_extro$engnat                  ***
## clean_extro$hand:clean_extro$engnat    
## Residuals                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recall that for Dominate Hand 1 = Right, 2 = Left, 3 = Other. Recall that for Native English Speaker 1 = Yes, 2 = No.

From the ANOVA test we get a p-value of 0.3682 which does not allow us to reject the null hypothesis.

Also, by looking at the interaction plot we see no obvious indications of interaction effects. However, there does seem to be a main effect for Native English Speakers.

Estimation

From the statistical analysis done above, we can make the following estimations on the main and interaction effects of the four factors.

Main Effects

  • Age
    +main effect observed
    +increased age seems to positively effect Extroversion
  • Gender
    +there seemed to be a main effect when only the male and female levels were considered
    +the inclusion of Other as a level for gender made the analysis of gender more difficult
  • Dominate Hand
    +no observed main effect
  • Native English Speaker
    +main effect observed
    +native English speakers seem to positively influence Extroversion

Interaction Effects

Of the six two factor interactions assessed, interaction effects were observed for the following:

  • Age and Gender
    +as compared to Male and Female, the level, Other, and the higher Age levels seemed to interact to positively influence Extroversion
    +due to the relatively small number of Other observations, this estimation could easily be incorrect
  • Age and Native English Speaker
    +possible interaction effect

4. References to the Literature

Literature: Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition, PDF
Source of data: http://personality-testing.info/_rawdata/
explanation of test: http://personality-testing.info/printable/big-five-personality-test.pdf

5. Appendices

Raw Data

The raw data for this experiment can be obtained by downloaded the Big Five Personality Test at this [link]http://personality-testing.info/_rawdata/.

Complete R Code

#import dataframe
personality_df <- read.delim("~/Design of Experiments/data.csv")

#delete unapplicable columns from dataframe. Leave only extroversion score columns and factors
extro <- subset(personality_df, select = -c(race, source, country, N1:O10))

#make sure it worked
str(extro_df)

#define new vector to represent total extroversion score
Escore <- c(20 + extro$E1 - extro$E2 + extro$E3 - extro$E4 + extro$E5 - extro$E6 + extro$E7 - extro$E8 + extro$E9 - extro$E10)

#verify it worked
Escore[1]

#add extroversion total score as column to dataframe
extro$E_Total <- Escore

#check to see column was added
head(extro)

#clean data by extracting rows with impossible values
clean_extro <- subset(extro, age <= 122)
clean_extro <- subset(clean_extro, gender > 0)
clean_extro <- subset(clean_extro, hand >0)
clean_extro <- subset(clean_extro, engnat >0)

#group age in 6 levels. change variable type from character to factor
clean_extro$age <- replace(clean_extro$age, clean_extro$age < 20, "<20")
clean_extro$age <- replace(clean_extro$age, clean_extro$age >= 20 & clean_extro$age < 30, "20-29")
clean_extro$age <- replace(clean_extro$age, clean_extro$age >= 30 & clean_extro$age < 40, "30-39")
clean_extro$age <- replace(clean_extro$age, clean_extro$age >= 40 & clean_extro$age < 50, "40-49")
clean_extro$age <- replace(clean_extro$age, clean_extro$age >= 50 & clean_extro$age < 60, "50-59" )
clean_extro$age <- replace(clean_extro$age, clean_extro$age >= 60 | clean_extro$age ==100 | clean_extro$age ==118, "60<=")
clean_extro$age <- as.factor(clean_extro$age)

#check structure of data
str(clean_extro)

#display summary statistics
summary(clean_extro)

#graphical analysis
#histogram
hist(clean_extro$E_Total, main = "Extroversion Score", xlab = "Score")

#boxplots of 4 factors
boxplot(clean_extro$E_Total ~ clean_extro$age, main = "Extroversion response to Age", ylab = "Extroversion Score", xlab = "Age")
boxplot(clean_extro$E_Total ~ clean_extro$gender, main = "Extroversion response to Gender", ylab = "Extroversion Score", xlab = "Gender", names = c("Male", "Female", "Other"))
boxplot(clean_extro$E_Total ~ clean_extro$hand, main = "Extroversion response to Dominate Hand", ylab = "Extroversion Score", xlab = "Dominate Hand", names = c("Right", "Left", "Both"))
boxplot(clean_extro$E_Total ~ clean_extro$engnat, main = "Extroversion response to Native English Speaker", ylab = "Extroversion Score", xlab = "Native English Speakder", names = c("Yes", "No"))

#one factor anova tests
summary(aov(clean_extro$E_Total ~ clean_extro$age))
summary(aov(clean_extro$E_Total ~ clean_extro$gender))
summary(aov(clean_extro$E_Total ~ clean_extro$hand))
summary(aov(clean_extro$E_Total ~ clean_extro$engnat))

#two factor anova tests and interaction plots
#age and gender
summary(aov(clean_extro$E_Total ~ clean_extro$age*clean_extro$gender))
interaction.plot(clean_extro$age, clean_extro$gender, clean_extro$E_Total, xlab = "Age", ylab = "Extroversion Score", trace.label = "Gender")

#age and hand dominance
summary(aov(clean_extro$E_Total ~ clean_extro$age*clean_extro$hand))
interaction.plot(clean_extro$age, clean_extro$hand, clean_extro$E_Total, xlab = "Age", ylab = "Extroversion Score", trace.label = "Hand")

#age and native english speaker
summary(aov(clean_extro$E_Total ~ clean_extro$age*clean_extro$engnat))
interaction.plot(clean_extro$age, clean_extro$engnat, clean_extro$E_Total, xlab = "Age", ylab = "Extroversion Score", trace.label = "Native English Speaker")

#gender and hand dominance
summary(aov(clean_extro$E_Total ~ clean_extro$gender*clean_extro$hand))
interaction.plot(clean_extro$gender, clean_extro$hand, clean_extro$E_Total, xlab = "Gender", ylab = "Extroversion Score", trace.label = "Dominate Hand")

#gender and native english speaker
summary(aov(clean_extro$E_Total ~ clean_extro$gender*clean_extro$engnat))
interaction.plot(clean_extro$gender, clean_extro$engnat, clean_extro$E_Total, xlab = "Gender", ylab = "Extroversion Score", trace.label = "Native English Speaker")

#hand dominance and native english speaker
summary(aov(clean_extro$E_Total ~ clean_extro$hand*clean_extro$engnat))
interaction.plot(clean_extro$hand, clean_extro$engnat, clean_extro$E_Total, xlab = "Dominate Hand", ylab = "Extroversion Score", trace.label = "Native English Speaker")