========================================

CHI-SQUARE TEST OF INDEPENDENCE OVERVIEW

========================================

PURPOSE

To test if there is an association between two categorical variables.

NOTES

Normality does not apply to Chi-Square tests because data is only categorical.

==========

HYPOTHESES

==========

NULL HYPOTHESIS

There is no association between the two categorical variables.

ALTERNATE HYPOTHESIS

There is an association between the two categorical variables.

………………………………………………………..

QUESTION

What are the null and alternate hypotheses for your research?

H0:

H1:

………………………………………………………..

======================

IMPORT EXCEL FILE CODE

======================

PURPOSE OF THIS CODE

Imports your Excel dataset automatically into R Studio.

You need to import your dataset every time you want to analyze your data in R Studio.

INSTALL REQUIRED PACKAGE

The package only needs to be installed once.

The code for this task is provided below. Remove the hashtag below to convert the note into code.

install.packages(“readxl”)

LOAD THE PACKAGE

You must always reload the package you want to use.

The code for this task is provided below. Remove the hashtag below to convert the note into code.

library(readxl)

IMPORT THE EXCEL FILE INTO R STUDIO

Download the Excel file from One Drive and save it to your desktop.

Right-click the Excel file and click “Copy as path” from the menu.

In R Studio, replace the example path below with your actual path.

Replace backslashes  with forward slashes / or double them //:

✘ WRONG “C:.xlsx”

✔ CORRECT “C:/Users/Joseph/Desktop/mydata.xlsx”

✔ CORRECT “C:\Users\Joseph\Desktop\mydata.xlsx”

Replace “dataset” with the name of your excel data (without the .xlsx)

An example of the code for this task is provided below.

You can edit the code below and remove the hashtag to use the code below.

dataset <- read_excel(“C:/Users/Joseph/Desktop/dataset.xlsx”)

=========================

VISUALLY DISPLAY THE DATA

=========================

PURPOSE

Visually display the data.

A frequency table can be used instead of a bar graph to visually display the data.

CREATE A FREQUENCY TABLE

Also called a “contingency table” for Chi-Square Test of Independence.

Replace “dataset” with the name of your dataset (without the .xlsx)

Replace “Variable1” with the R code name of your first variable

Replace “Variable2” with the R code name of your second variable

Remove the hashtag to use the code.

contingencytable <- table(dataset\(Variable1, dataset\)Variable2)

====================================

CHI-SQUARE TEST OF INDEPENDENCE CODE

====================================

PURPOSE

Determine if the null or alternate hypothesis was supported.

CONDUCT THE TEST

Do NOT edit the code.

Remove the hashtags to use the code.

chisq_indep <- chisq.test(contingencytable)

print(chisq_indep)

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to the effect size section below.

If results were NOT statistically significant (p > .05), do NOT calculate the effect size.

Instead, skip to the reporting section below.

NOTE: Getting results that are not statistically significant does NOT mean you switch to a different test.

================

EFFECT SIZE CODE

================

PURPOSE

Determine how strong the relationship was between the two variables.

INSTALL REQUIRED PACKAGE

The package only needs to be installed once.

Remove the hashtag to use the code below.

install.packages(“lsr”)

LOAD THE PACKAGE

Always reload the package you want to use.

Remove the hashtag to use the code below.

library(lsr)

CALCULATE CRAMER’S V

Do NOT edit the code.

Remove the hashtags to use the code below.

cramers_v <- cramersV(contingency_table)

cat(“Cramer’s V (effect size):”, round(cramers_v, 3), “”)

DETERMINE YOUR DEGREES OF FREEDOM (DF)

Check your chi-square test of independence output for “df”

DETERMINE THE SIZE OF THE EFFECT

Ranges from 0 (no relationship) to 1.00 (perfect relationship).

Example for df of 1: A Cramer’s V of 0.31 indicates a moderate association between the two variables.

df = 1

0.00 to 0.09 = Negligible

0.10 to 0.29 = Small

0.30 to 0.49 = Medium

0.50 and above = Large

df = 2

0.00 to 0.06 = Negligible

0.07 to 0.20 = Small

0.21 to 0.34 = Medium

0.35 and above = Large

df = 3

0.00 to 0.05 = Negligible

0.06 to 0.16 = Small

0.17 to 0.28 = Medium

0.29 and above = Large

df can be larger than this, but for our class we will go no larger than 3

==========================

RESEARCH REPORT ON RESULTS

==========================

………………………………………….

QUESTION

What were the results? Write them in a paragraph.

Put the paragraph in a Word Document.

………………………………………….

DIRECTIONS

Collect the information listed below and turn it into a paragraph.

For your results summary, you should report the following information:

1. The name of the inferential test used (Chi-Square Test of Independence)

2. The names of the two categorical variables you analyzed (use proper labels, not R code names)

3. The sample size (n)

4. Whether there WAS a statistically significant difference (p < .05) or NOT (p > .05)

5. The degrees of freedom (df)

6. The chi-square statistic value (χ²)

7. The p-value (report exact p-value if p less than .05 but greater than .001, otherwise write p > .05 or p < .001)

8. If the result was significant, indicate which categories were associated

To put 5, 6, 7, and 8 together:

Format: χ²(df#, N = ##) = χ²#.##, p = .

Example: χ²(2, N = 90) = 9.67, p = .008

9. The effect size using Cramér’s V

10. Interpret the strength of the association (small, medium, large), based on df

EXAMPLE REPORT

LIST OF DATA

1. Chi-Square Test of Independence

2. Gender (male, female) and ice cream flavor preference (chocolate, vanilla)

3. 100 participants

4. Statistically significant (p < .05)

6. df = 1

7. χ²= 20.01

8. p = .000000443

9. Males preferred chocolate and females preferred vanilla.

10. W = 0.033, medium size

PARAGRAPH

A Chi-Square Test of Independence was conducted to examine the association between

gender (male, female) and ice cream flavor preference (chocolate, vanilla)

among 100 participants.

There was a statistically significant association between gender and ice cream flavor preference,

χ²(1, N = 100) = 20.01, p < .001.

Males preferred chocolate and females preferred vanilla.

The effect size was medium (Cramér’s V = 0.45).

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.