Load dataset from IL project file

df <- as.data.frame(readRDS("Israel Survey/data/il_pe.RDS"))

1 A Trick of the Tail

Analyzing levels of extremism and their trends requires identifying the more extremist part of the population i.e. the tail of the distribution. The selected tail should have enough samples for statistical analysis and enough resolution (number of unique values) differentiating the samples levels of extremism. In this analysis we compare three methods of calculating a threshold that will identify the tail: Quantile, Median + K x MAD, and Median + K x Qn. The population with extremism level above the threshold will be considered as the more extremist part of the population.

1.1 The Main Challenge

Calculating the extremist threshold create a challenge due to two main reasons: + The size of the tail is too small (we want ~ 100 samples) + The size of the tail is too high (non-extremist population is included in it) + The tails does not have enough granularity between samples (low number of unique values)

1.2 Threshold Calculation Methods

Quantile-based (Top N%) approach simply defines the tail as the top N% of the data. For example, the top 15% might be considered the tail. It’s straightforward to understand and implement but doesn’t account for the shape of the distribution.

Pros:

  • Simple to understand and communicate
  • Directly controls the proportion of data in the tail
  • Consistent across different datasets

Cons:

  • Doesn’t account for the shape of the distribution

Median + K x MAD approach uses the median and the Median Absolute Deviation (MAD) to establish the extremism threshold. It’s robust against outliers and works well for non-normal distributions. The threshold is typically set at Median + k * MAD, where k is set to allow big enough size of the extremism tail.

Pros:

  • Robust to outliers, as it uses the median instead of the mean
  • Accounts for the spread of the data
  • Works well for non-normal distributions

Cons:

  • May not be as intuitive to interpret as percentile-based methods
  • The choice K * MAD can be somewhat arbitrary

Median + K x Qn approach uses the median and the Rousseeuw-Croux Qn estimator to establish the extremism threshold. The Qn estimator is more efficient than MAD and better handles skewed distributions common in extremism research. The threshold is set at Median + k * Qn.

Pros:

  • More efficient than MAD for non-normal distributions
  • Robust to outliers and asymmetric distributions
  • Better suited for extremism data which is often skewed

Cons:

  • Requires the robustbase package
  • Less commonly known than MAD
  • The choice of K factor can be somewhat arbitrary

2 Select most suitable K

# Define the three methods to compare
methods <- c("Quantile", "MAD", "Tau")
q_pct <- 0.85
k_factor <- 1.5
k_range <- seq(1, 2.5, 0.1)
min_tail_size <- 100
min_unique_values <- 5
moderate_threshold <- NULL
note <- paste0("Minimal tail size = ", min_tail_size, " | Minimal unique values = ", min_unique_values)
threshold_subtitle <- paste0("k factor = ", k_factor, " | Quantile percentage = ", q_pct*100, "%")
external_legend <- list(c("red", "dashed"), c("blue", "dashed"), c("darkgreen", "dashed"))
legend_labels <- c("Quantile", "Median+MAD", "Median+TAU")


# Set method parameters
method_params <- list(
  q_pct = q_pct,          # percentile for Quantile method
  k_factor = k_factor     # Factor for MAD, Sn, Qn, TAu methods
)

group_selections <- list(
  Wave = c("First", "Second", "Third", "Fourth", "Fifth", "Sixth")
)

2.1 Ideology

results <- af_test_robust_k_suitability(df, "pe_ideology", wave_var = "Wave", method = "MAD", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 341 288 183 162 255 247
Tail % 21.2 17.9 20.7 23.2 16.7 22.2
Tail Mean 6.852 6.345 6.199 6.247 6.599 6.523
Unique Values 2 8 13 16 10 11
Suitable ✗ Not Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable
Suggested K 1 1.7 1.8 2 1.8 1.8
Minimal tail size = 100 | Minimal unique values = 5

results <- af_test_robust_k_suitability(df, "pe_ideology", wave_var = "Wave", method = "Tau", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 282 187 130 106 172 180
Tail % 17.5 11.6 14.7 15.2 11.3 16.2
Tail Mean 7 6.567 6.425 6.574 6.842 6.754
Unique Values 1 2 4 5 4 4
Suitable ✗ Not Suitable ✗ Not Suitable ✗ Not Suitable ✓ Suitable ✗ Not Suitable ✗ Not Suitable
Suggested K 1 1.4 1.4 1.5 1.4 1.4
Minimal tail size = 100 | Minimal unique values = 5
result <- af_compare_tail_methods(df, 
                                  group_selections = group_selections, 
                                  measure_var = "pe_ideology",
                                  methods = methods,
                                  method_params = method_params)
  
# gt::gt(result$results)
# af_combine_plots(result$plots)
af_combine_plots(result$plots, 
                 main_title = "Cognitive (Ideology) Dimension - Tail Threshold",
                 subtitle = threshold_subtitle, note = "", common_legend = FALSE,
                 external_legend = external_legend, legend_labels = legend_labels)

2.2 Violence

results <- af_test_robust_k_suitability(df, "pe_violence", wave_var = "Wave", method = "MAD", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 578 505 282 201 467 360
Tail % 35.9 31.4 31.8 28.8 30.6 32.3
Tail Mean 2.383 2.718 2.608 2.616 2.678 2.677
Unique Values 438 469 272 195 449 345
Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable
Suggested K 2.5 2.5 2.5 2.5 2.5 2.5
Minimal tail size = 100 | Minimal unique values = 5

results <- af_test_robust_k_suitability(df, "pe_violence", wave_var = "Wave",  method = "Tau", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 481 427 236 173 398 302
Tail % 29.9 26.6 26.6 24.8 26.1 27.1
Tail Mean 2.594 2.95 2.843 2.815 2.884 2.902
Unique Values 399 407 230 170 387 292
Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable
Suggested K 2.5 2.5 2.5 2.5 2.5 2.5
Minimal tail size = 100 | Minimal unique values = 5
result <- af_compare_tail_methods(df, 
                                  group_selections = group_selections, 
                                  measure_var = "pe_violence",
                                  methods = methods,
                                  method_params = method_params)
  
# gt::gt(result$results)
af_combine_plots(result$plots, 
                 main_title = "Behavioral (Violence) Dimension - Tail Threshold",
                 subtitle = threshold_subtitle, note = "", common_legend = FALSE,
                 external_legend = external_legend, legend_labels = legend_labels)

2.3 Intolernace

results <- af_test_robust_k_suitability(df, "pe_intolerance", wave_var = "Wave", method = "MAD", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 260 273 131 111 231 174
Tail % 16.2 17 14.8 15.9 15.2 15.6
Tail Mean 6.285 6.251 6.446 6.301 6.407 6.272
Unique Values 208 199 103 90 186 114
Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable ✓ Suitable
Suggested K 2.2 2.1 1.7 1.6 2.1 1.9
Minimal tail size = 100 | Minimal unique values = 5

results <- af_test_robust_k_suitability(df, "pe_intolerance", wave_var = "Wave", method = "Tau", 
                                        k_factor = k_factor, k_range = k_range,
                                        min_tail_size = min_tail_size, min_unique_values = min_unique_values, 
                                        moderate_threshold = moderate_threshold, suggest_optimal = TRUE)

# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary
K Factor: 1.5 | Waves: 6
Metric First Second Third Fourth Fifth Sixth
Sample Size 1608 1607 886 697 1524 1114
Tail Size 154 170 71 61 121 94
Tail % 9.6 10.6 8 8.8 7.9 8.4
Tail Mean 6.583 6.482 6.765 6.592 6.774 6.512
Unique Values 103 96 44 40 76 35
Suitable ✓ Suitable ✓ Suitable ✗ Not Suitable ✗ Not Suitable ✓ Suitable ✗ Not Suitable
Suggested K 1.6 1.6 1.3 1.2 1.6 1.4
Minimal tail size = 100 | Minimal unique values = 5
result <- af_compare_tail_methods(df, 
                                  group_selections = group_selections, 
                                  measure_var = "pe_intolerance",
                                  methods = methods,
                                  method_params = method_params)
  
# gt::gt(result$results)
af_combine_plots(result$plots, 
                 main_title = "Social (Intolerance) Dimension - Tail Threshold",
                 subtitle = threshold_subtitle, note = "", common_legend = FALSE,
                 external_legend = external_legend, legend_labels = legend_labels)

3 Community Check

3.1 Political Orientation

# Generate the warning table
warning_table <- af_extremism_tail_warnings(
  df = df,  # Your original survey data
  community_var = "pe_left_center_right",
  wave_var = "Wave",
  dimensions = c("pe_ideology", "pe_violence", "pe_intolerance"),
  threshold_type = "MAD",
  k_factor = 1.5,
  tail_size_warning = 10
)

# Display the table
warning_table
Extremism Tail Size Analysis by Community and Wave
Method: MAD | K-factor: 1.5 | Warning threshold: 10 individuals1
Community First Second Third Fourth Fifth Sixth
Cognitive
right 198 (22%) 157 (17.7%) 90 (16.6%) 70 (16.7%) 122 (13.7%) 107 (16.2%)
center 60 (12.9%) 47 (9.9%) 29 (12.2%) 41 (21.7%) 73 (16%) 80 (24.4%)
left 83 (34.2%) 84 (34.3%) 64 (60.4%) 51 (57.3%) 60 (33.7%) 60 (48.4%)
Behavioral
right 357 (39.6%) 342 (38.6%) 194 (35.8%) 116 (27.7%) 268 (30.1%) 184 (27.8%)
center 141 (30.4%) 114 (24%) 69 (29%) 59 (31.2%) 140 (30.6%) 127 (38.7%)
left 80 (32.9%) 49 (20%) 19 (17.9%) 26 (29.2%) 59 (33.1%) 49 (39.5%)
Social
right 204 (22.6%) 233 (26.3%) 110 (20.3%) 94 (22.4%) 188 (21.1%) 140 (21.1%)
center 49 (10.6%) 31 (6.5%) 18 (7.6%) 17 (9%) 38 (8.3%) 28 (8.5%)
left 7 (2.9%) 9 (3.7%) 3 (2.8%) 0 (0%) 5 (2.8%) 6 (4.8%)
1 Values shown as: count (percentage of community). Bold red indicates tail size below warning threshold.

3.2 Religiosity

# Generate the warning table
warning_table <- af_extremism_tail_warnings(
  df = df,  # Your original survey data
  community_var = "pe_religiosity",
  wave_var = "Wave",
  dimensions = c("pe_ideology", "pe_violence", "pe_intolerance"),
  threshold_type = "MAD",
  k_factor = 1.5,
  tail_size_warning = 10
)

# Display the table
warning_table
Extremism Tail Size Analysis by Community and Wave
Method: MAD | K-factor: 1.5 | Warning threshold: 10 individuals1
Community First Second Third Fourth Fifth Sixth
Cognitive
Secular 146 (21.3%) 151 (20.7%) 101 (27.2%) 90 (30.7%) 128 (21.1%) 148 (33%)
Religious 126 (18.1%) 84 (12.7%) 58 (15.3%) 57 (19.1%) 80 (11.6%) 68 (13.8%)
National Religious 24 (41.4%) 16 (30.8%) 8 (17.8%) 6 (15.8%) 11 (16.7%) 10 (19.2%)
Ultra-Orthodox 45 (26.9%) 37 (23%) 16 (17.6%) 9 (13.2%) 36 (22.2%) 21 (17.6%)
Behavioral
Secular 231 (33.7%) 209 (28.6%) 121 (32.5%) 84 (28.7%) 198 (32.7%) 166 (37%)
Religious 273 (39.2%) 225 (33.9%) 131 (34.7%) 100 (33.6%) 219 (31.7%) 150 (30.4%)
National Religious 18 (31%) 10 (19.2%) 7 (15.6%) 5 (13.2%) 12 (18.2%) 9 (17.3%)
Ultra-Orthodox 56 (33.5%) 61 (37.9%) 23 (25.3%) 12 (17.6%) 38 (23.5%) 35 (29.4%)
Social
Secular 56 (8.2%) 66 (9%) 42 (11.3%) 38 (13%) 34 (5.6%) 40 (8.9%)
Religious 129 (18.5%) 126 (19%) 59 (15.6%) 50 (16.8%) 144 (20.9%) 84 (17%)
National Religious 12 (20.7%) 15 (28.8%) 7 (15.6%) 7 (18.4%) 15 (22.7%) 14 (26.9%)
Ultra-Orthodox 63 (37.7%) 66 (41%) 23 (25.3%) 16 (23.5%) 38 (23.5%) 36 (30.3%)
1 Values shown as: count (percentage of community). Bold red indicates tail size below warning threshold.