========================================
CHI-SQUARE TEST OF INDEPENDENCE OVERVIEW
========================================
PURPOSE
To test if there is an association between two categorical
variables.
NOTES
Normality does not apply to Chi-Square tests because data is only
categorical.
==========
HYPOTHESES
==========
NULL HYPOTHESIS
There is no association between the two categorical variables.
ALTERNATE HYPOTHESIS
There is an association between the two categorical variables.
………………………………………………………..
QUESTION
What are the null and alternate hypotheses for your research?
H0:
H1:
………………………………………………………..
======================
IMPORT EXCEL FILE CODE
======================
PURPOSE OF THIS CODE
Imports your Excel dataset automatically into R Studio.
You need to import your dataset every time you want to analyze your
data in R Studio.
INSTALL REQUIRED PACKAGE
The package only needs to be installed once.
The code for this task is provided below. Remove the hashtag below
to convert the note into code.
install.packages(“readxl”)
LOAD THE PACKAGE
You must always reload the package you want to use.
The code for this task is provided below. Remove the hashtag below
to convert the note into code.
library(readxl)
IMPORT THE EXCEL FILE INTO R STUDIO
Download the Excel file from One Drive and save it to your
desktop.
Right-click the Excel file and click “Copy as path” from the
menu.
In R Studio, replace the example path below with your actual
path.
Replace backslashes with forward slashes / or double them //:
✘ WRONG “C:.xlsx”
✔ CORRECT “C:/Users/Joseph/Desktop/mydata.xlsx”
✔ CORRECT “C:\Users\Joseph\Desktop\mydata.xlsx”
Replace “dataset” with the name of your excel data (without the
.xlsx)
An example of the code for this task is provided below.
You can edit the code below and remove the hashtag to use the code
below.
dataset <-
read_excel(“C:/Users/Joseph/Desktop/dataset.xlsx”)
=========================
VISUALLY DISPLAY THE DATA
=========================
PURPOSE
Visually display the data.
A frequency table can be used instead of a bar graph to visually
display the data.
CREATE A FREQUENCY TABLE
Also called a “contingency table” for Chi-Square Test of
Independence.
Replace “dataset” with the name of your dataset (without the
.xlsx)
Replace “Variable1” with the R code name of your first variable
Replace “Variable2” with the R code name of your second
variable
Remove the hashtag to use the code.
contingencytable <- table(dataset\(Variable1, dataset\)Variable2)
====================================
CHI-SQUARE TEST OF INDEPENDENCE CODE
====================================
PURPOSE
Determine if the null or alternate hypothesis was supported.
CONDUCT THE TEST
Do NOT edit the code.
Remove the hashtags to use the code.
chisq_indep <- chisq.test(contingencytable)
print(chisq_indep)
DETERMINE STATISTICAL SIGNIFICANCE
If results were statistically significant (p < .05), continue to
the effect size section below.
If results were NOT statistically significant (p > .05), do NOT
calculate the effect size.
Instead, skip to the reporting section below.
NOTE: Getting results that are not statistically significant does
NOT mean you switch to a different test.
================
EFFECT SIZE CODE
================
PURPOSE
Determine how strong the relationship was between the two
variables.
INSTALL REQUIRED PACKAGE
The package only needs to be installed once.
Remove the hashtag to use the code below.
install.packages(“lsr”)
LOAD THE PACKAGE
Always reload the package you want to use.
Remove the hashtag to use the code below.
library(lsr)
CALCULATE CRAMER’S V
Do NOT edit the code.
Remove the hashtags to use the code below.
cramers_v <- cramersV(contingency_table)
cat(“Cramer’s V (effect size):”, round(cramers_v, 3), “”)
DETERMINE YOUR DEGREES OF FREEDOM (DF)
Check your chi-square test of independence output for “df”
DETERMINE THE SIZE OF THE EFFECT
Ranges from 0 (no relationship) to 1.00 (perfect relationship).
Example for df of 1: A Cramer’s V of 0.31 indicates a moderate
association between the two variables.
df = 1
0.00 to 0.09 = Negligible
0.10 to 0.29 = Small
0.30 to 0.49 = Medium
0.50 and above = Large
df = 2
0.00 to 0.06 = Negligible
0.07 to 0.20 = Small
0.21 to 0.34 = Medium
0.35 and above = Large
df = 3
0.00 to 0.05 = Negligible
0.06 to 0.16 = Small
0.17 to 0.28 = Medium
0.29 and above = Large
df can be larger than this, but for our class we will go no larger
than 3
==========================
RESEARCH REPORT ON RESULTS
==========================
………………………………………….
QUESTION
What were the results? Write them in a paragraph.
Put the paragraph in a Word Document.
………………………………………….
DIRECTIONS
Collect the information listed below and turn it into a
paragraph.
For your results summary, you should report the following
information:
1. The name of the inferential test used (Chi-Square Test of
Independence)
2. The names of the two categorical variables you analyzed (use
proper labels, not R code names)
3. The sample size (n)
4. Whether there WAS a statistically significant difference (p <
.05) or NOT (p > .05)
5. The degrees of freedom (df)
6. The chi-square statistic value (χ²)
7. The p-value (report exact p-value if p less than .05 but greater
than .001, otherwise write p > .05 or p < .001)
8. If the result was significant, indicate which categories were
associated
To put 5, 6, 7, and 8 together:
Format: χ²(df#, N = ##) = χ²#.##, p = .
Example: χ²(2, N = 90) = 9.67, p = .008
9. The effect size using Cramér’s V
10. Interpret the strength of the association (small, medium,
large), based on df
EXAMPLE REPORT
LIST OF DATA
1. Chi-Square Test of Independence
2. Gender (male, female) and ice cream flavor preference (chocolate,
vanilla)
3. 100 participants
4. Statistically significant (p < .05)
6. df = 1
7. χ²= 20.01
8. p = .000000443
9. Males preferred chocolate and females preferred vanilla.
10. W = 0.033, medium size
PARAGRAPH
A Chi-Square Test of Independence was conducted to examine the
association between
gender (male, female) and ice cream flavor preference (chocolate,
vanilla)
among 100 participants.
There was a statistically significant association between gender and
ice cream flavor preference,
χ²(1, N = 100) = 20.01, p < .001.
Males preferred chocolate and females preferred vanilla.
The effect size was medium (Cramér’s V = 0.45).
R Markdown
This is an R Markdown document. Markdown is a simple formatting
syntax for authoring HTML, PDF, and MS Word documents. For more details
on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be
generated that includes both content as well as the output of any
embedded R code chunks within the document. You can embed an R code
chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Including Plots
You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.