Chi-Squared Test Assignment

Rationale

The goal of this dataset is to explore the differing views on immigration as a “top issue” between viewers of CNN and Fox News, two major news networks in the United States that cater to different ideological and political audiences. By examining the proportion of CNN viewers who consider immigration as a top issue compared to Fox News viewers, we can gain insights into the role that media consumption plays in shaping political opinions, specifically on a contentious issue like immigration.

Hypothesis

The proportion of CNN viewers that view Immigration as either a top issue or non top issue, differ from Fox viewers.

Variables & method

In this analysis, the dependent variable (DV) was whether respondents considered immigration as a “top issue” or “non-top issue,” and the independent variable (IV) was the respondents’ preferred news network, specifically CNN or Fox. The data was loaded from a CSV file and recoded to assign the DV and IV for analysis. To explore the relationship between the two variables, a visualization was created using a stacked bar chart that displayed the distribution of immigration views by news network preference. A crosstabulation was then generated to show both the counts and percentages of respondents across the categories. Finally, a Chi-squared test of independence was conducted to statistically assess whether differences in immigration views were associated with network preference.

Results & Discussion

The bar chart and crosstabulation provided a clear visual and numerical breakdown of how CNN and Fox viewers differed in their views on immigration as a top issue. The Chi-squared test results further confirmed whether this difference was statistically significant. A significant p-value (p < 0.05) would indicate that immigration salience varies systematically between CNN and Fox viewers, supporting the hypothesis that the proportion of viewers who see immigration as a top issue differs by preferred network. Conversely, a non-significant result would suggest that immigration views are relatively consistent regardless of network preference. Overall, the combination of descriptive statistics, visualization, and inferential testing provides a comprehensive approach to understanding how news media preference may shape issue prioritization.

	CNN	Fox
Crosstabulation of DV by IV
Counts and (Column Percentages)
1 Top issue	35 (11.7%)	115 (38.3%)
2 Not top issue	265 (88.3%)	185 (61.7%)

Test	Chi-squared Statistic	Degrees of Freedom	p-value
Chi-squared Test Results
Test of Independence between DV and IV
Chi-squared Test of Independence	55.476	1	0.000

Code

# ------------------------------
# Setup: Install and load packages
# ------------------------------
if (!require("tidyverse")) install.packages("tidyverse")   # Data wrangling & plotting
if (!require("gmodels")) install.packages("gmodels")       # Crosstabs
if (!require("gt")) install.packages("gt")                 # Table formatting

library(tidyverse)
library(gmodels)
library(gt)

# ------------------------------
# Load the data
# ------------------------------
# Replace "YOURFILENAME.csv" with your dataset name
mydata <- read.csv("TopIssue.csv") #Edit

# ------------------------------
# Define Dependent (DV) and Independent (IV) variables
# ------------------------------
# Replace YOURDVNAME and YOURIVNAME with actual column names in your data
mydata$DV <- mydata$Immigration #Edit
mydata$IV <- mydata$PreferredNetwork #Edit

# ------------------------------
# Visualization: Stacked bar chart of IV by DV
# ------------------------------
graph <- ggplot(mydata, aes(x = IV, fill = DV)) +
  geom_bar(colour = "black") +
  scale_fill_brewer(palette = "Paired") +
  labs(
    title = "Distribution of DV by IV",
    x = "Independent Variable",
    y = "Count",
    fill = "Dependent Variable"
  )

#Show the graph
graph

# ------------------------------
# Crosstabulation of DV by IV (DV = rows, IV = columns)
# ------------------------------

crosstab <- mydata %>%
  count(DV, IV) %>%
  group_by(IV) %>%
  mutate(RowPct = 100 * n / sum(n)) %>%
  ungroup() %>%
  mutate(Cell = paste0(n, "\n(", round(RowPct, 1), "%)")) %>%
  select(DV, IV, Cell) %>%
  pivot_wider(names_from = IV, values_from = Cell)

# Format into gt table
crosstab_table <- crosstab %>%
  gt(rowname_col = "DV") %>%
  tab_header(
    title = "Crosstabulation of DV by IV",
    subtitle = "Counts and (Column Percentages)"
  ) %>%
  cols_label(
    DV = "Dependent Variable"
  )

# Show the polished crosstab table
crosstab_table

# ------------------------------
# Chi-squared test of independence
# ------------------------------
options(scipen = 999)  # Prevents scientific notation
chitestresults <- chisq.test(mydata$DV, mydata$IV)

# ------------------------------
# Format Chi-squared test results into a table
# ------------------------------
chitest_summary <- tibble(
  Test   = "Chi-squared Test of Independence",
  Chi_sq = chitestresults$statistic,
  df     = chitestresults$parameter,
  p      = chitestresults$p.value
)

chitest_table <- chitest_summary %>%
  gt() %>%
  # Round χ² and p-value to 3 decimals, df to integer
  fmt_number(columns = c(Chi_sq, p), decimals = 3) %>%
  fmt_number(columns = df, decimals = 0) %>%
  tab_header(
    title = "Chi-squared Test Results",
    subtitle = "Test of Independence between DV and IV"
  ) %>%
  cols_label(
    Test   = "Test",
    Chi_sq = "Chi-squared Statistic",
    df     = "Degrees of Freedom",
    p      = "p-value"
  )

# Show the formatted results table
chitest_table