Set the default theme & font for plots and tables

font_size = 12

af_theme <- theme_bw(base_size = font_size) +
  theme(
    plot.title = element_text(size = font_size),        # Plot title
    axis.title = element_text(size = font_size),        # Axis titles
    axis.text = element_text(size = font_size),         # Axis tick labels
    legend.title = element_text(size = font_size),      # Legend title
    legend.text = element_text(size = font_size),       # Legend text
    strip.text = element_text(size = font_size)         # Facet variable's name
    )

ggplot2::theme_set(af_theme)

load survey dataset

df <- as.data.frame(readRDS("Israel Survey/data/il_pe.RDS"))

# Create age group variable (move to measures)
df$age_group <- af_create_groups(df$age, c(18, 30, 45, 60, Inf), 
                                 c("18-30", "31-45", "46-60", "60+"))

1 Similarity Score Calculation Method

This analysis calculates similarity scores between different groups (e.g., Traditional vs Secular, Traditional vs Religious) to determine which groups have more similar characteristics across various variables.

1.1 For Nominal (Categorical) Variables

Method: Cramér’s V with Similarity Transformation

  1. Create contingency table: Cross-tabulate the comparison groups with the variable categories

  2. Remove empty cells: Filter out rows and columns with zero observations to ensure valid statistical analysis

  3. Calculate Cramér’s V: This measures the strength of association between the two variables

    • Uses chi-square test statistic: χ² = Σ((observed - expected)²/expected)
    • Cramér’s V = √(χ²/n) / √(min(rows-1, cols-1))
    • Where n = total sample size
  4. Convert to similarity: Similarity = 1 - Cramér’s V

    • Cramér’s V ranges from 0 (no association) to 1 (perfect association)
    • Our similarity score ranges from 0 (completely different) to 1 (identical distributions)

Interpretation: Higher similarity scores indicate that the two groups have more similar voting patterns, gender distributions, etc.

1.2 For Numerical Variables

Method: Kolmogorov-Smirnov Test P-value

  1. Extract data: Get the numerical values for each comparison group

  2. Perform KS test: Compare the distributions of the two groups

    • Tests the null hypothesis that both samples come from the same distribution
  3. Use p-value as similarity: Higher p-values indicate more similar distributions

    • P-value near 1.0 = very similar distributions
    • P-value near 0.0 = very different distributions

Interpretation: Higher p-values indicate that the age distributions (or other numerical variables) of the two groups are more similar.

1.3 Overall Similarity Calculation

For each wave and overall analysis:

  1. Calculate individual similarity scores for each variable
  2. Compute the average similarity score across all variables for each comparison pair
  3. The comparison pair with the highest average similarity score is recommended for grouping

1.4 Statistical Considerations

  • Missing data: Variables with insufficient data return NA values
  • Small sample warnings: Chi-square tests may use simulation when expected frequencies are low
  • Tied values: Kolmogorov-Smirnov tests handle tied numerical values appropriately
  • Scale: All similarity scores range from 0 to 1, making them directly comparable across different variable types

This method provides a standardized way to compare how similar different groups are across multiple types of variables, helping to make evidence-based decisions about category grouping.

2 Similarity of Religiosity Groups:

2.1 Traditional vs. Securlar & Religious

# Run the analysis
result <- af_compare_religiosity_grouping(
  df = df,
  nominal_vars = c("vote", "vote2022", "pe_left_center_right"),
  numerical_vars = c(),
  comparison_var = "religiosity",
  comparison_levels = list(
    c("Traditional", "Secular"),
    c("Traditional", "Religious")
  ),
  wave_var = "Wave"  
)

# Display recommendation
cat(result$recommendation)

Based on overall similarity scores, the most similar pair is: Traditional vs Religious (similarity score: 0.704)


# Display similarity scores table
knitr::kable(result$summary_by_wave, 
             caption = "Average Similarity Scores by Wave",
             digits = 3)
Average Similarity Scores by Wave
wave Traditional vs Secular_similarity Traditional vs Religious_similarity
Overall 0.631 0.704
Wave Fifth 0.499 0.759
Wave First 0.628 0.776
Wave Fourth 0.700 0.665
Wave Second 0.616 0.713
Wave Sixth 0.630 0.709
Wave Third 0.696 0.662

# # Display detailed results table
# knitr::kable(result$similarity_scores, 
#              caption = "Detailed Similarity Scores",
#              digits = 3)

# Display plots
# Access plots by variable name
result$plots$overall

result$plots[["vote"]]

result$plots[["vote2022"]]

result$plots[["pe_left_center_right"]]

2.2 National Ultra-orthodox vs. Orthodox & National Religious

# Run the analysis
result <- af_compare_religiosity_grouping(
  df = df,
  nominal_vars = c("vote", "vote2022", "pe_left_center_right"),
  numerical_vars = c(),
  comparison_var = "religiosity",
  comparison_levels = list(
    c("National Ultra-Orthodox", "Orthodox"),
    c("National Ultra-Orthodox", "National Religious")
  ),
  wave_var = "Wave"  
)

# Display recommendation
cat(result$recommendation)

Based on overall similarity scores, the most similar pair is: National Ultra-Orthodox vs Orthodox (similarity score: 0.711)


# Display similarity scores table
knitr::kable(result$summary_by_wave, 
             caption = "Average Similarity Scores by Wave",
             digits = 3)
Average Similarity Scores by Wave
wave National Ultra-Orthodox vs Orthodox_similarity National Ultra-Orthodox vs National Religious_similarity
Overall 0.711 0.590
Wave Fifth 0.868 0.686
Wave First 0.779 0.575
Wave Fourth 0.470 0.549
Wave Second 0.859 0.491
Wave Sixth 0.623 0.911
Wave Third 0.479 0.561

# # Display detailed results table
# knitr::kable(result$similarity_scores, 
#              caption = "Detailed Similarity Scores",
#              digits = 3)

# Display plots
# Access plots by variable name
result$plots$overall

result$plots[["vote"]]

result$plots[["vote2022"]]

result$plots[["pe_left_center_right"]]

3 Detailed Voting & Political orientation Patterns

af_plot_combinations_chart(df, c("religiosity", "vote"))

af_plot_combinations_chart(df, c("religiosity", "vote2022"))

af_plot_combinations_chart(df, c("religiosity", "pe_left_center_right"))