Load dataset from IL project file
Analyzing levels of extremism and their trends requires identifying the more extremist part of the population i.e. the tail of the distribution. The selected tail should have enough samples for statistical analysis and enough resolution (number of unique values) differentiating the samples levels of extremism. In this analysis we compare three methods of calculating a threshold that will identify the tail: Quantile, Median + K x MAD, and Median + K x Qn. The population with extremism level above the threshold will be considered as the more extremist part of the population.
Calculating the extremist threshold create a challenge due to two main reasons: + The size of the tail is too small (we want ~ 100 samples) + The size of the tail is too high (non-extremist population is included in it) + The tails does not have enough granularity between samples (low number of unique values)
Quantile-based (Top N%) approach simply defines the tail as the top N% of the data. For example, the top 15% might be considered the tail. It’s straightforward to understand and implement but doesn’t account for the shape of the distribution.
Pros:
Cons:
Median + K x MAD approach uses the median and the Median Absolute Deviation (MAD) to establish the extremism threshold. It’s robust against outliers and works well for non-normal distributions. The threshold is typically set at Median + k * MAD, where k is set to allow big enough size of the extremism tail.
Pros:
Cons:
Median + K x Qn approach uses the median and the Rousseeuw-Croux Qn estimator to establish the extremism threshold. The Qn estimator is more efficient than MAD and better handles skewed distributions common in extremism research. The threshold is set at Median + k * Qn.
Pros:
Cons:
# Define the three methods to compare
methods <- c("Quantile", "MAD", "Tau")
q_pct <- 0.85
k_factor <- 1.5
k_range <- seq(1, 2.5, 0.1)
min_tail_size <- 100
min_unique_values <- 5
moderate_threshold <- NULL
note <- paste0("Minimal tail size = ", min_tail_size, " | Minimal unique values = ", min_unique_values)
threshold_subtitle <- paste0("k factor = ", k_factor, " | Quantile percentage = ", q_pct*100, "%")
external_legend <- list(c("red", "dashed"), c("blue", "dashed"), c("darkgreen", "dashed"))
legend_labels <- c("Quantile", "Median+MAD", "Median+TAU")
# Set method parameters
method_params <- list(
q_pct = q_pct, # percentile for Quantile method
k_factor = k_factor # Factor for MAD, Sn, Qn, TAu methods
)
group_selections <- list(
Wave = c("First", "Second", "Third", "Fourth", "Fifth", "Sixth")
)
results <- af_test_robust_k_suitability(df, "pe_ideology", wave_var = "Wave", method = "MAD",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 341 | 288 | 183 | 162 | 255 | 247 |
Tail % | 21.2 | 17.9 | 20.7 | 23.2 | 16.7 | 22.2 |
Tail Mean | 6.852 | 6.345 | 6.199 | 6.247 | 6.599 | 6.523 |
Unique Values | 2 | 8 | 13 | 16 | 10 | 11 |
Suitable | ✗ Not Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable |
Suggested K | 1 | 1.7 | 1.8 | 2 | 1.8 | 1.8 |
Minimal tail size = 100 | Minimal unique values = 5 |
results <- af_test_robust_k_suitability(df, "pe_ideology", wave_var = "Wave", method = "Tau",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 282 | 187 | 130 | 106 | 172 | 180 |
Tail % | 17.5 | 11.6 | 14.7 | 15.2 | 11.3 | 16.2 |
Tail Mean | 7 | 6.567 | 6.425 | 6.574 | 6.842 | 6.754 |
Unique Values | 1 | 2 | 4 | 5 | 4 | 4 |
Suitable | ✗ Not Suitable | ✗ Not Suitable | ✗ Not Suitable | ✓ Suitable | ✗ Not Suitable | ✗ Not Suitable |
Suggested K | 1 | 1.4 | 1.4 | 1.5 | 1.4 | 1.4 |
Minimal tail size = 100 | Minimal unique values = 5 |
result <- af_compare_tail_methods(df,
group_selections = group_selections,
measure_var = "pe_ideology",
methods = methods,
method_params = method_params)
# gt::gt(result$results)
# af_combine_plots(result$plots)
af_combine_plots(result$plots,
main_title = "Cognitive (Ideology) Dimension - Tail Threshold",
subtitle = threshold_subtitle, note = "", common_legend = FALSE,
external_legend = external_legend, legend_labels = legend_labels)
results <- af_test_robust_k_suitability(df, "pe_violence", wave_var = "Wave", method = "MAD",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 578 | 505 | 282 | 201 | 467 | 360 |
Tail % | 35.9 | 31.4 | 31.8 | 28.8 | 30.6 | 32.3 |
Tail Mean | 2.383 | 2.718 | 2.608 | 2.616 | 2.678 | 2.677 |
Unique Values | 438 | 469 | 272 | 195 | 449 | 345 |
Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable |
Suggested K | 2.5 | 2.5 | 2.5 | 2.5 | 2.5 | 2.5 |
Minimal tail size = 100 | Minimal unique values = 5 |
results <- af_test_robust_k_suitability(df, "pe_violence", wave_var = "Wave", method = "Tau",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 481 | 427 | 236 | 173 | 398 | 302 |
Tail % | 29.9 | 26.6 | 26.6 | 24.8 | 26.1 | 27.1 |
Tail Mean | 2.594 | 2.95 | 2.843 | 2.815 | 2.884 | 2.902 |
Unique Values | 399 | 407 | 230 | 170 | 387 | 292 |
Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable |
Suggested K | 2.5 | 2.5 | 2.5 | 2.5 | 2.5 | 2.5 |
Minimal tail size = 100 | Minimal unique values = 5 |
result <- af_compare_tail_methods(df,
group_selections = group_selections,
measure_var = "pe_violence",
methods = methods,
method_params = method_params)
# gt::gt(result$results)
af_combine_plots(result$plots,
main_title = "Behavioral (Violence) Dimension - Tail Threshold",
subtitle = threshold_subtitle, note = "", common_legend = FALSE,
external_legend = external_legend, legend_labels = legend_labels)
results <- af_test_robust_k_suitability(df, "pe_intolerance", wave_var = "Wave", method = "MAD",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "MAD K Factor Analysis Summary", note = note)
comprehensive_table
MAD K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 260 | 273 | 131 | 111 | 231 | 174 |
Tail % | 16.2 | 17 | 14.8 | 15.9 | 15.2 | 15.6 |
Tail Mean | 6.285 | 6.251 | 6.446 | 6.301 | 6.407 | 6.272 |
Unique Values | 208 | 199 | 103 | 90 | 186 | 114 |
Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable | ✓ Suitable |
Suggested K | 2.2 | 2.1 | 1.7 | 1.6 | 2.1 | 1.9 |
Minimal tail size = 100 | Minimal unique values = 5 |
results <- af_test_robust_k_suitability(df, "pe_intolerance", wave_var = "Wave", method = "Tau",
k_factor = k_factor, k_range = k_range,
min_tail_size = min_tail_size, min_unique_values = min_unique_values,
moderate_threshold = moderate_threshold, suggest_optimal = TRUE)
# Comprehensive multi-metric table
comprehensive_table <- af_create_comprehensive_k_table(results, title = "Tau K Factor Analysis Summary", note = note)
comprehensive_table
Tau K Factor Analysis Summary | ||||||
K Factor: 1.5 | Waves: 6 | ||||||
Metric | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Sample Size | 1608 | 1607 | 886 | 697 | 1524 | 1114 |
Tail Size | 154 | 170 | 71 | 61 | 121 | 94 |
Tail % | 9.6 | 10.6 | 8 | 8.8 | 7.9 | 8.4 |
Tail Mean | 6.583 | 6.482 | 6.765 | 6.592 | 6.774 | 6.512 |
Unique Values | 103 | 96 | 44 | 40 | 76 | 35 |
Suitable | ✓ Suitable | ✓ Suitable | ✗ Not Suitable | ✗ Not Suitable | ✓ Suitable | ✗ Not Suitable |
Suggested K | 1.6 | 1.6 | 1.3 | 1.2 | 1.6 | 1.4 |
Minimal tail size = 100 | Minimal unique values = 5 |
result <- af_compare_tail_methods(df,
group_selections = group_selections,
measure_var = "pe_intolerance",
methods = methods,
method_params = method_params)
# gt::gt(result$results)
af_combine_plots(result$plots,
main_title = "Social (Intolerance) Dimension - Tail Threshold",
subtitle = threshold_subtitle, note = "", common_legend = FALSE,
external_legend = external_legend, legend_labels = legend_labels)
# Generate the warning table
warning_table <- af_extremism_tail_warnings(
df = df, # Your original survey data
community_var = "pe_left_center_right",
wave_var = "Wave",
dimensions = c("pe_ideology", "pe_violence", "pe_intolerance"),
threshold_type = "MAD",
k_factor = 1.5,
tail_size_warning = 10
)
# Display the table
warning_table
Extremism Tail Size Analysis by Community and Wave | ||||||
Method: MAD | K-factor: 1.5 | Warning threshold: 10 individuals1 | ||||||
Community | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Cognitive | ||||||
right | 198 (22%) | 157 (17.7%) | 90 (16.6%) | 70 (16.7%) | 122 (13.7%) | 107 (16.2%) |
center | 60 (12.9%) | 47 (9.9%) | 29 (12.2%) | 41 (21.7%) | 73 (16%) | 80 (24.4%) |
left | 83 (34.2%) | 84 (34.3%) | 64 (60.4%) | 51 (57.3%) | 60 (33.7%) | 60 (48.4%) |
Behavioral | ||||||
right | 357 (39.6%) | 342 (38.6%) | 194 (35.8%) | 116 (27.7%) | 268 (30.1%) | 184 (27.8%) |
center | 141 (30.4%) | 114 (24%) | 69 (29%) | 59 (31.2%) | 140 (30.6%) | 127 (38.7%) |
left | 80 (32.9%) | 49 (20%) | 19 (17.9%) | 26 (29.2%) | 59 (33.1%) | 49 (39.5%) |
Social | ||||||
right | 204 (22.6%) | 233 (26.3%) | 110 (20.3%) | 94 (22.4%) | 188 (21.1%) | 140 (21.1%) |
center | 49 (10.6%) | 31 (6.5%) | 18 (7.6%) | 17 (9%) | 38 (8.3%) | 28 (8.5%) |
left | 7 (2.9%) | 9 (3.7%) | 3 (2.8%) | 0 (0%) | 5 (2.8%) | 6 (4.8%) |
1 Values shown as: count (percentage of community). Bold red indicates tail size below warning threshold. |
# Generate the warning table
warning_table <- af_extremism_tail_warnings(
df = df, # Your original survey data
community_var = "pe_religiosity",
wave_var = "Wave",
dimensions = c("pe_ideology", "pe_violence", "pe_intolerance"),
threshold_type = "MAD",
k_factor = 1.5,
tail_size_warning = 10
)
# Display the table
warning_table
Extremism Tail Size Analysis by Community and Wave | ||||||
Method: MAD | K-factor: 1.5 | Warning threshold: 10 individuals1 | ||||||
Community | First | Second | Third | Fourth | Fifth | Sixth |
---|---|---|---|---|---|---|
Cognitive | ||||||
Secular | 146 (21.3%) | 151 (20.7%) | 101 (27.2%) | 90 (30.7%) | 128 (21.1%) | 148 (33%) |
Religious | 126 (18.1%) | 84 (12.7%) | 58 (15.3%) | 57 (19.1%) | 80 (11.6%) | 68 (13.8%) |
National Religious | 24 (41.4%) | 16 (30.8%) | 8 (17.8%) | 6 (15.8%) | 11 (16.7%) | 10 (19.2%) |
Ultra-Orthodox | 45 (26.9%) | 37 (23%) | 16 (17.6%) | 9 (13.2%) | 36 (22.2%) | 21 (17.6%) |
Behavioral | ||||||
Secular | 231 (33.7%) | 209 (28.6%) | 121 (32.5%) | 84 (28.7%) | 198 (32.7%) | 166 (37%) |
Religious | 273 (39.2%) | 225 (33.9%) | 131 (34.7%) | 100 (33.6%) | 219 (31.7%) | 150 (30.4%) |
National Religious | 18 (31%) | 10 (19.2%) | 7 (15.6%) | 5 (13.2%) | 12 (18.2%) | 9 (17.3%) |
Ultra-Orthodox | 56 (33.5%) | 61 (37.9%) | 23 (25.3%) | 12 (17.6%) | 38 (23.5%) | 35 (29.4%) |
Social | ||||||
Secular | 56 (8.2%) | 66 (9%) | 42 (11.3%) | 38 (13%) | 34 (5.6%) | 40 (8.9%) |
Religious | 129 (18.5%) | 126 (19%) | 59 (15.6%) | 50 (16.8%) | 144 (20.9%) | 84 (17%) |
National Religious | 12 (20.7%) | 15 (28.8%) | 7 (15.6%) | 7 (18.4%) | 15 (22.7%) | 14 (26.9%) |
Ultra-Orthodox | 63 (37.7%) | 66 (41%) | 23 (25.3%) | 16 (23.5%) | 38 (23.5%) | 36 (30.3%) |
1 Values shown as: count (percentage of community). Bold red indicates tail size below warning threshold. |