The main analysis of interest in most randomized experiments is the ‘Average Treatment Effect’ (ATE), which looks at differences on the key dependent variable(s) by randomly assigned experimental group. In this tutorial, we will look at a variety of standard ways to estimate the ATE from any experimental study. Specifically, we will cover…
One important thing to keep in mind is that when analyzing randomized experiments you do not need, nor should you use, control variables. The process of randomization gives us equivalent groups on all things that could be influential on the DV except for the randomly assigned experimental group.
Another important factor to keep in mind prior to estimating the ATE is that the structure of the dependent variable and the number of groups in the experiment will change what analyses should be conducted. This tutorial will discuss how to identify the analytic technique that you should use based on these factors.
In this tutorial, we will be looking at an experiment that randomly assigned participants to view an ice cream store that was flying one of four flags - Biden, Trump, BLM, or Thin Blue Line - and then were asked if they would visit the store if in mood for ice cream and how appropriate it would be for people to boycott the store.
To assess the number of groups and scale points in your experiment,
use the freq
function from the descr
package.
It quickly will reveal the number of scale points as well as the
frequency of occurrence in your data.
freq(data$group_boycott)
## data$group_boycott
## Frequency Percent Valid Percent
## biden 41 22.778 23.70
## blm 36 20.000 20.81
## thin blue 57 31.667 32.95
## trump 39 21.667 22.54
## NA's 7 3.889
## Total 180 100.000 100.00
freq(data$group_boycott_2)
## data$group_boycott_2
## Frequency Percent Valid Percent
## 1 39 21.67 48.75
## 2 41 22.78 51.25
## NA's 100 55.56
## Total 180 100.00 100.00
freq(data$Q22.5) #DV 1 - Likelihood Visit Store
## How likely would you be to visit this specific ice cream store if you came across it on vacation and were in the mood for ice cream?
## Frequency Percent Valid Percent
## 1 21 11.667 12.139
## 2 15 8.333 8.671
## 3 40 22.222 23.121
## 4 55 30.556 31.792
## 5 42 23.333 24.277
## NA's 7 3.889
## Total 180 100.000 100.000
freq(data$Q22.6) #DV 1 - Appropriate to Fly the Flag
## How appropriate do you think it is for small businesses to display overtly political signs in their stores?
## Frequency Percent Valid Percent
## 1 28 15.556 16.18
## 2 29 16.111 16.76
## 3 45 25.000 26.01
## 4 38 21.111 21.97
## 5 33 18.333 19.08
## NA's 7 3.889
## Total 180 100.000 100.00
Results show that the group_boycott
experimental
assignment variable has 4 values which correspond to the 4 flags the
participants could have been randomly assigned to see. This means that a
technique like a t-test or a 2-group proportions test are not
appropriate to use on this variable since they both require exactly 2
groups. However, the group_boycott_2
variable does have 2,
and only 2, groups so this variable would be appropriate to use with
those techniques (note that group_boycott_2
is a subset of
the group_boycott
measure and is only used for pedadogical
purposes).
Looking at the two dependent variables, we see that they both take on 5 ordered scale points meaning we can use a variety of techniques to analyze the ATE for these DVs.
This should always be the first thing you determine before starting your analysis. Ensure you know how many groups you have in your experiment, if the DV is ordered or a factor, and how many scale points it takes on. Also don’t forget to rely on the survey codebook to assist you in determining these things since the labels will not always appear in R (this is an example of that).
T-tests are appropriate statistical technique to use when analyzing
experimental data with you have 2, and only 2 groups - think control and
treatment, plus the dependent variable is either continuous or ordinal
with 5+ scale points. Because we determined earlier that the
group_boycott_2
IV has 2 groups and the DV’s for this
experiment are a 5-point ordered scale we can use a t-test to estimate
the ATE.
Also note that this analysis only looks at Republicans in the sample but the results are the same for Democrats, just reversed,
#t.teat(dv ~ iv, data=data) # General Format for T-tests from the stats package
t.test(Q22.5 ~ group_boycott_2, data=rep_data )
##
## Welch Two Sample t-test
##
## data: Q22.5 by group_boycott_2
## t = 3.687, df = 13.709, p-value = 0.00252
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## 0.7963142 3.0218677
## sample estimates:
## mean in group 1 mean in group 2
## 4.000000 2.090909
t.test(Q22.6 ~ group_boycott_2, data=rep_data )
##
## Welch Two Sample t-test
##
## data: Q22.6 by group_boycott_2
## t = 1.4333, df = 15.214, p-value = 0.172
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -0.3915198 2.0051562
## sample estimates:
## mean in group 1 mean in group 2
## 3.625000 2.818182
What a t-test is effectively doing is testing whether or not there is a significant difference on the DV of interest between your two groups. Here, we see one significant finding - for DV 1 - and one non-significant finding - for DV 2. We can evaluate significance by examining the p-value which is reported in the results looking for it to be less than <.05 (or <.1 if you wrote that into your pre-analysis plan and have a justification for using a higher threshold than standard).
For the DV1 model, we see a p-value of .0025 which is less than .05 indicating a significant difference in likelihood that respondent would visit the ice cream store depending on what flag was flying. We also see the reported means as well and see that group1=4 and group2=2.09. These are Republican respondents and group1=Trump flag and group2=Biden flag indicating that Republicans would be significantly more likely to visit an ice cream store that was flying a Trump flag versus a Biden flag.
For the DV2 model, we see a p-value of .172, which is larger than .05 indicating a non-significant finding. This question dealt with appropriateness of boycotting a store based on the flag and indicates that regardless of the flag flying Republicans think the appropriateness of displaying the flag is the same. While there is a mean difference between the two groups on DV2, the statistical test tells us that the difference does not rise to a level of significance which means we would reject our alternative hypothesis that differences exist.
ANOVA is an appropriate statistical technique to use when analyzing
experimental data with you have 3+ groups plus the dependent variable is
either continuous or ordinal with 5+ scale points. Because we determined
earlier that the group_boycott
IV has 4 groups and the DV’s
for this experiment are a 5-point ordered scale we can use an ANOVA to
estimate the ATE.
Here we will use the aov
function from the
stats
package to conduct the analysis of variance test.
With ANOVA, you have to run additional lines of code to get the mean
values by individual groups and to pairwise significance. These are
important to do because the ANOVA test gives you the overall
significance but not the direct comparisons you are interested in with
your experiments. The significance tests used in ANOVA varies based on
the specific test you specify. Here we will look at Tukey’s Honestly
Significant Differences (HSD) but know there are other types as
well.
#Estimate the ANOVA & save the model for review
anova_result <- aov(Q22.5 ~ group_boycott, data = rep_data)
summary(anova_result) #Without using summary you will not get the model p-value
## Df Sum Sq Mean Sq F value Pr(>F)
## group_boycott 3 28.18 9.392 6.473 0.00072 ***
## Residuals 60 87.06 1.451
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Extract the mean of the DV for each group
means <- aggregate(Q22.5 ~ group_boycott, data = rep_data, FUN = mean) #FUN=mean is telling r to run the mean function for the DV by the specified groups
means
## group_boycott Q22.5
## 1 biden 2.090909
## 2 blm 3.142857
## 3 thin blue 3.739130
## 4 trump 4.000000
# Add Pairwise Comparison Tests - Tukey
tukey_result <- TukeyHSD(anova_result) #Must have saved the ANOVA results for use here
print(tukey_result) #Shows the results for each pairwise comparison
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Q22.5 ~ group_boycott, data = rep_data)
##
## $group_boycott
## diff lwr upr p adj
## blm-biden 1.0519481 -0.2305515 2.334448 0.1441790
## thin blue-biden 1.6482213 0.4813398 2.815103 0.0023347
## trump-biden 1.9090909 0.6623588 3.155823 0.0008501
## thin blue-blm 0.5962733 -0.4827244 1.675271 0.4675840
## trump-blm 0.8571429 -0.3077452 2.022031 0.2208285
## trump-thin blue 0.2608696 -0.7753600 1.297099 0.9097167
There are quite a few numbers produced with this code but not all are necessary to review. Starting at the top, we see the overall ANOVA test, which reveals a p-value of .00072 indicating the model overall is significant. The means table shows us the average of the DV for each individual group (higher values here = higher likelihood to visit the ice cream shop). The Biden flag treatment group had the lowest mean, 2.09, indicating that Republicans were the least likely to say they would go to the ice cream shop with a Biden flag flying while the Trump flag treatment group had the highest mean, 4. The other two groups were in between these two means. However, this does not tell us if there are significant differences between these groups.
We turn to the Tukey HSD results to examine the pairwise significant differences for each of the combinations. There will always be (group count-1)! number of comparisons. Here with 4 groups, that gives us (4-1)!=3!=6 comparison points. The Tukey results gives you the difference between groups, the lower and upper bounds of the difference as well as the p-value. We see two significant differences in the results (p<.05): Biden group versus Thin Blue line group and Biden group versus Trump group. This shows that Republicans were significantly more likely to say they would visit an ice cream shop that was flying either the Trump or Thin Blue Line flag compared to the Biden flag but that the other comparisons were not significant. The Biden group versus the Black Lives Matter group was close to significance here but did not quite reach standard levels of significance.
Linear regression is an appropriate statistical technique to use when
analyzing experimental data with you have 2+ groups plus the dependent
variable is either continuous or ordinal with 5+ scale points. Because
we determined earlier that the group_boycott
IV has 4
groups and the DV’s for this experiment are a 5-point ordered scale we
can use linear regression to estimate the ATE. We could also use linear
regression on the group_boycott_2
variable as well but we
will just focus on the full experimental conditions.
For linear regression to work appropriately with our non-ordered
experimental groups, we need to ensure that R considers it be either a
factor or categorical. If R considers it to be numerical, then it will
return biased results because it assumes there is an order to the group
assignment, where that naturally does not exist. We do this with the
Base R function class
.
#Check the class of the experimental group variable - should be factor or categorical
class(data$group_boycott)
## [1] "character"
library(ggeffects)
## Warning: package 'ggeffects' was built under R version 4.2.3
#Estimate the linear regression
lm_results<-lm(Q22.5 ~ group_boycott, data = rep_data)
#Call the results for review
summary(lm_results)
##
## Call:
## lm(formula = Q22.5 ~ group_boycott, data = rep_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.739 -1.000 0.000 1.000 2.909
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.0909 0.3632 5.757 3.11e-07 ***
## group_boycottblm 1.0519 0.4853 2.167 0.034177 *
## group_boycottthin blue 1.6482 0.4416 3.733 0.000423 ***
## group_boycotttrump 1.9091 0.4718 4.046 0.000151 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.205 on 60 degrees of freedom
## Multiple R-squared: 0.2445, Adjusted R-squared: 0.2067
## F-statistic: 6.473 on 3 and 60 DF, p-value: 0.0007204
#Create nicer table - note we will review visualization of results later in semester
ggpredict(lm_results, terms=c("group_boycott")) #More on this command when we get to visualization
## # Predicted values of How likely would you be to visit this specific ice cream store if you came across it on vacation and were in the mood for ice cream?
##
## group_boycott | Predicted | 95% CI
## ----------------------------------------
## blm | 3.14 | [2.50, 3.79]
## thin blue | 3.74 | [3.24, 4.24]
## biden | 2.09 | [1.36, 2.82]
## trump | 4.00 | [3.40, 4.60]
By estimating the linear regression and then using the
summary
or stargazer
commands we can evaluate
the results. First thing to know in the results is that the comparisons
in the results table are always to the group that is NOT displayed. Here
that is the Biden flag experimental group. We can change the reference
group - see below - but overall results will not vary, just what is
displayed in the table.
The estimates in a bivariate model are regression coefficients and simply reflect the mean differences between the group and the reference group. We want to evaluate the p-values for each comparison. In this approach, we see significant differences between the Biden group and all three other groups. Remember with Tukey HSD, we only saw significance between Biden-TBL and Biden-Trump. The difference here is caused by Tukey’s adjustments for multiple comparisons.
So which one is correct? I have published multiple papers using linear regression over using Tukey’s HSD and ANOVA without having any issues from methodological reviewers so both can be considered correct. If you see a p-value that is close in a linear regression, like we do here for Biden-BLM, go ahead and run the ANVOA with Tukey’s HSD comparison. If conclusions do not match, then it is your determination as the researcher about which one to use.
You can easily change the reference group when running regressions if you want a specific comparison to be returned in the model summary. But note that changing the reference group does not change anything else about the model - just what is displayed in the results table.
We will use the relevel
function from the
stats
package for this.
#Tells you what the levels of your experimental groups are called so you can update
freq(rep_data$group_boycott)
## rep_data$group_boycott
## Frequency Percent
## biden 11 17.19
## blm 14 21.88
## thin blue 23 35.94
## trump 16 25.00
## Total 64 100.00
#For relevel to work, your experimental variable MUST be a factor and not categorical
rep_data$group_boycott<-as.factor(rep_data$group_boycott)
# Relevel the factor with "thin blue" as the new reference group
rep_data$group_boycott_b <- relevel(rep_data$group_boycott, ref = "thin blue")
#Estimate the new linear regression with the new reference group
lm_results<-lm(Q22.5 ~ group_boycott_b, data = rep_data)
#Call the results for review
summary(lm_results)
##
## Call:
## lm(formula = Q22.5 ~ group_boycott_b, data = rep_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.739 -1.000 0.000 1.000 2.909
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7391 0.2512 14.887 < 2e-16 ***
## group_boycott_bbiden -1.6482 0.4416 -3.733 0.000423 ***
## group_boycott_bblm -0.5963 0.4083 -1.460 0.149423
## group_boycott_btrump 0.2609 0.3921 0.665 0.508439
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.205 on 60 degrees of freedom
## Multiple R-squared: 0.2445, Adjusted R-squared: 0.2067
## F-statistic: 6.473 on 3 and 60 DF, p-value: 0.0007204
Now, we see a different set of comparisons than we saw before and that is entirely because we changed the reference group. Note that the overall model is the exact same as before with the only difference being the comparisons shown in the table.
When your DV is exactly 2 scale points, you need to change the analytic technique you use to account for the different structure of the DV. There are two approaches that can be used here depending on the number of groups in your experiment. If you have 2 groups, you can use the 2-sample proportions test but with 3+ you will need to use logistic regression.
We will start by using the 2-sample proportions test. The first thing we need to do is check the DV to ensure it has 2 scale points and then recode them to be 0 and 1 if they are not already (note that Qualtrics usually exports dichotomous variables as 1 and 2 so is likely that the adjustment will need to be made).
#Always check your DV before estimating any model to ensure it has appropriate number of scale points. Here we need to see 2 possible values coded as 0 & 1.
freq(rep_data$Q23.6) #Results show that the values are 1 & 2 so we need to recode
## Have you ever been diagnosed by a doctor with a mental health condition? Your individual response will not be shared nor associated with your personally identifiable information.
## Frequency Percent
## 1 16 25
## 2 48 75
## Total 64 100
rep_data$dv_2 <- ifelse(rep_data$Q23.6 == 1, 0, 1)#Here we use the ifelse command to change the values of 1 to 0 and else to 1
freq(rep_data$dv_2)
## rep_data$dv_2
## Frequency Percent
## 0 16 25
## 1 48 75
## Total 64 100
#Basic code prop.test(table(data$IV, data$DV)) Note the DV comes SECOND in this code
prop_result <- prop.test(table(rep_data$group_boycott_2, rep_data$dv_2))
## Warning in prop.test(table(rep_data$group_boycott_2, rep_data$dv_2)):
## Chi-squared approximation may be incorrect
# Print the test result
print(prop_result)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: table(rep_data$group_boycott_2, rep_data$dv_2)
## X-squared = 0.98894, df = 1, p-value = 0.32
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.6425595 0.1652868
## sample estimates:
## prop 1 prop 2
## 0.1250000 0.3636364
Now we have run the 2-sample proportions test, which returns a variety of metrics and numbers. The two key things to look at are the p-value (.32 here which is not <.05 indicating no significant differences between groups) and the sample proportion estimates for each group (here .12 in group 1 and .36 in group 2). The sample estimates tell you the proportion of the respondents who are a 0 on the DV so in group 1 12.5% answered 0 to the DV question while in group 2 36% said the same thing. The sample size is low in this example which is why differences that big are not significant.
With this data structure, you can also use a logistic regression. If you have more than 2 groups, you will have to run a logit since the 2-samples proportion test requires exactly 2 groups.
To estimate a logit model, the code is similar to a linear regression
except with a few alterations to account for the dichomotous outcome
variable. Instead of using lm
we now use glm
,
take my class in the fall to understand the differences, but we also
have to specify the family of glm’s, which here is binomial (since the
DV has 2 scale points) and the link is logit (could also use
probit).
library(ggeffects) #Allows us to easily pull out the predicted proportions by group
logit<-glm(dv_2 ~ group_boycott, data=rep_data, family=binomial(link="logit"))
summary(logit)
##
## Call:
## glm(formula = dv_2 ~ group_boycott, family = binomial(link = "logit"),
## data = rep_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.03933 0.03199 0.69450 0.85195 0.95077
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.5596 0.6268 0.893 0.372
## group_boycottblm 0.7397 0.9039 0.818 0.413
## group_boycottthin blue 0.2671 0.7734 0.345 0.730
## group_boycotttrump 1.3863 0.9820 1.412 0.158
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 71.979 on 63 degrees of freedom
## Residual deviance: 69.293 on 60 degrees of freedom
## AIC: 77.293
##
## Number of Fisher Scoring iterations: 4
ggpredict(logit, terms=c("group_boycott")) #More on this command when we get to visualization
## # Predicted probabilities of dv_2
##
## group_boycott | Predicted | 95% CI
## ----------------------------------------
## biden | 0.64 | [0.34, 0.86]
## blm | 0.79 | [0.51, 0.93]
## thin blue | 0.70 | [0.48, 0.85]
## trump | 0.87 | [0.61, 0.97]
Results from our logit model are similar to the standard linear regression model when you first examine it. However, the coefficients are no longer directly interpretable as they are in linear regression as they represent the logged odds change rather than the direct change in the DV as in linear regression.
We use the ggeffects
package to easily calculate the
proportions (here when the value = 1 unlike above that was when value =
0) by group. While none of the differences are significant as seen in
the regression table, the proportions will give you the raw value by
group, which can then be used to report the results.
The final technique we will cover in this tutorial is running crosstabs and Chi2 calculations for when you have 2 factor/categorical variables. We use the chi2 calculation here because we are not interested in pairwise comparisons but if overall we have the same proportions in each experimental group. This should be used when running your balance tests to ensure randomization worked roughly as desired.
We will use the CrossTable
function from the
descr
package for this. You can do this in
tidyverse
as well but it takes more steps so for efficiency
sake I use CrossTable
.
Here we look at differences in some demographics by experimental
group for the entire dataset rather than just Republicans as above. In
CrossTable
, you specify exactly what percentages you want
to see and if you want the chi2 calculation to be conducted. As long as
you ensure the IV comes first and the DV second you will not need to
change any of the other values in the code.
Here we tell R to return percentages by row prop.r=TRUE
,
which gives us the percentage of gender identity within each
experimental group (Q23.3 = gender identity). We do not want to use the
prop.c=TRUE
function since that gives us % of each
experimental group for gender identity which is not what we are
interested in. We also tell R to return the chi2 value with
chisq=TRUE
. The prop.chisq=FALSE
command tells
you of the overall chi2 calculation how much each cell contributes but
that is not our interest here since we are only interested in overall
significance.
#CrossTable(data$IV, data$DV, expected = FALSE, chisq=TRUE,
#prop.c=FALSE, prop.r=TRUE, prop.t=FALSE, prop.chisq = FALSE)
CrossTable(data$group_boycott, data$Q23.3, expected = FALSE, chisq=TRUE,
prop.c=FALSE, prop.r=TRUE, prop.t=FALSE, prop.chisq = FALSE)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation may
## be incorrect
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 173
##
##
## | data$Q23.3
## data$group_boycott | 1 | 2 | 3 | Row Total |
## -------------------|-----------|-----------|-----------|-----------|
## biden | 19 | 22 | 0 | 41 |
## | 0.463 | 0.537 | 0.000 | 0.237 |
## -------------------|-----------|-----------|-----------|-----------|
## blm | 17 | 19 | 0 | 36 |
## | 0.472 | 0.528 | 0.000 | 0.208 |
## -------------------|-----------|-----------|-----------|-----------|
## thin blue | 27 | 29 | 1 | 57 |
## | 0.474 | 0.509 | 0.018 | 0.329 |
## -------------------|-----------|-----------|-----------|-----------|
## trump | 22 | 17 | 0 | 39 |
## | 0.564 | 0.436 | 0.000 | 0.225 |
## -------------------|-----------|-----------|-----------|-----------|
## Column Total | 85 | 87 | 1 | 173 |
## -------------------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 3.072236 d.f. = 6 p = 0.7997274
##
##
##
The value to review is the p-value listed at the bottom of the table. Here it is .799 which is not significant. We would say in our paper that we ran a balance test and there was not a significant difference in gender identity between groups.
Next, we look at educational attainment as well. With more values in the education variable, the table might render a little off but we are still mostly interested in the p-value. Once again, the p-value is large, .71, indicating there are not significant differences between the experimental groups on educational attainment.
CrossTable(data$group_boycott, data$Q23.4, expected = FALSE, chisq=TRUE,
prop.c=FALSE, prop.r=TRUE, prop.t=FALSE, prop.chisq = FALSE)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation may
## be incorrect
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 173
##
##
## | data$Q23.4
## data$group_boycott | 1 | 2 | 3 | 4 | 5 | Row Total |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## biden | 13 | 17 | 7 | 4 | 0 | 41 |
## | 0.317 | 0.415 | 0.171 | 0.098 | 0.000 | 0.237 |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## blm | 11 | 10 | 11 | 3 | 1 | 36 |
## | 0.306 | 0.278 | 0.306 | 0.083 | 0.028 | 0.208 |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## thin blue | 17 | 23 | 13 | 4 | 0 | 57 |
## | 0.298 | 0.404 | 0.228 | 0.070 | 0.000 | 0.329 |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## trump | 11 | 15 | 12 | 1 | 0 | 39 |
## | 0.282 | 0.385 | 0.308 | 0.026 | 0.000 | 0.225 |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 52 | 65 | 43 | 12 | 1 | 173 |
## -------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 8.862156 d.f. = 12 p = 0.7146569
##
##
##