Understanding why removing outliers with the 1.5 interquantile range is a probably a bad idea …

Case 1 - strategic voters

Once upon a time there was the elections in the IFSA Forest there are two candidates: Ash annd Fagus

The job to do is to expand the a nearby abandoned farmland, and cleraly Ash is better at it than Fagus. Let’s consider that that Ash real score should be 9 while Fagus 6.

As we all know Fagus is extremely arrogant and cannot tolerate other trees, so asks their supporters to give him 10 and 0 to Ash.

This doesn’t look good so outliers are removed…

Let’s consider n_F the number of Fagus supporters (hence strategic voters) and see what happens for different numbers of n_F. To simply calculations there are 100 voters in the forest election.

# Define voting system
no_outlier <- function(votes) mean(votes)

remove_outliers <-  function(votes) {
  low <- quantile(votes, .25) - 1.5 * IQR(votes)
  high <- quantile(votes, .75) + 1.5 * IQR(votes)

  ifelse(between(votes, low, high), votes, NA)
}

outlier_simple <- function(votes){
  remove_outliers(votes) %>%
    mean(na.rm = TRUE)
}

outlier_mean <- function(votes){
  votes <- remove_outliers(votes)
  votes %>%
    replace_na(mean(votes, na.rm = TRUE)) %>%
    mean()
}

outlier_median <- function(votes){
  votes <- remove_outliers(votes)
  votes %>%
    replace_na(median(votes, na.rm = TRUE)) %>%
    mean()
}
calc_elections <- function(votes){
    votes %>%
    pivot_longer(everything(), names_to = "Tree", values_to = "votes") %>%
    group_by(Tree) %>%
    summarize(
      no_outlier = no_outlier(votes),
      outlier_simple = outlier_simple(votes),
      outlier_mean = outlier_mean(votes),
      outlier_median = outlier_median(votes),
      n_out = remove_outliers(votes) %>% is.na() %>%  sum()
    )
}
# Compare voting systems
voting_outcome_strategic <- function(n_F) {
  tibble(Ash = c(rep(9, (100 - n_F)), rep(0, n_F)),
         Fagus = c(rep(6, (100 - n_F)), rep(10, n_F))) %>%
         calc_elections()
}
voting_outcome_strategic(5)
scenarios_forest <- tibble(
  n_F = seq(1, 100),
  outcome = map(n_F, voting_outcome_strategic)
)

The three different outliers replacement methods basic provide the same results so for simplicy are omitted in the plot

And actually we can see that the outlier removal has an effect only when there is a small number of strategic votes

scenarios_forest %>%
  unnest(outcome) %>%
  select(-outlier_mean, -outlier_median) %>%
  pivot_longer(c(-n_F, -Tree, -n_out), names_to = "type", values_to = "score") %>%
  ggplot(aes(n_F, score, colour = type, linetype = Tree)) +
  geom_line() +
  ggthemes::scale_color_colorblind() +
  labs(x = "Number of strategic votes (only for beech)")

If the number of strategic votes removing outliers first has no impact and then it actually help the Fagus!

scenarios_forest %>%
    unnest(outcome) %>%
    filter(Tree == "Ash") %>%
    mutate(outlier_effect = outlier_simple - no_outlier ) %>%
    ggplot(aes(n_F, outlier_effect)) +
        geom_line()

Case 2 - Honest voting

let’s now imagine that everyone got annoyed by the Fagus behaviour (we can dream right …) and the Fagus is replaced the Birch, which actually is as good as the Ash.

Now the inhabitants of the forest are all very honest and just give the candidate the score that they deserve depending on the benefit they receive from each Tree. So everyone gives 9 to the Ash and 9 to the Birch.

Now imagine that there is the mushroom that lives at the edge of the forest and he is allergic to the pollen of the Ash, poor mushroom for him having a new ash would be big problem (that’s honest voting that’s not strategy), but they understand that the ash is also a good tree so decided at the end of giving a score of 7 to the ash.

Let’s see what happens with the outliers vs no_outliers voting system…

votes <- tibble(
  Ash = c(rep(9, 99), 7),
  Birch = rep(9, 100)
)

calc_elections(votes)

Poor mushroom! He had an honest concern and now their completely valid vote was removed (will there be a mushroom revolution)

LS0tCnRpdGxlOiAiT3V0bGllciByZW1vdmFsIGZvciBJRlNBIGVsZWN0aW9ucyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3IsIGluY2x1ZGUgPSBGfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKdGhlbWVfc2V0KHRoZW1lX2J3KCkpCmBgYAoKVW5kZXJzdGFuZGluZyB3aHkgcmVtb3Zpbmcgb3V0bGllcnMgd2l0aCB0aGUgMS41IGludGVycXVhbnRpbGUgcmFuZ2UgaXMgYSBwcm9iYWJseSBhIGJhZCBpZGVhIC4uLgoKIyBDYXNlIDEgLSBzdHJhdGVnaWMgdm90ZXJzCgpPbmNlIHVwb24gYSB0aW1lIHRoZXJlIHdhcyB0aGUgZWxlY3Rpb25zIGluIHRoZSBJRlNBIEZvcmVzdCB0aGVyZSBhcmUgdHdvIGNhbmRpZGF0ZXM6IEFzaCBhbm5kIEZhZ3VzCgpUaGUgam9iIHRvIGRvIGlzIHRvIGV4cGFuZCB0aGUgYSBuZWFyYnkgYWJhbmRvbmVkIGZhcm1sYW5kLCBhbmQgY2xlcmFseSBBc2ggaXMgYmV0dGVyIGF0IGl0IHRoYW4gRmFndXMuIExldCdzIGNvbnNpZGVyIHRoYXQgdGhhdCBBc2ggcmVhbCBzY29yZSBzaG91bGQgYmUgOSB3aGlsZSBGYWd1cyA2LgoKQXMgd2UgYWxsIGtub3cgRmFndXMgaXMgZXh0cmVtZWx5IGFycm9nYW50IGFuZCBjYW5ub3QgdG9sZXJhdGUgb3RoZXIgdHJlZXMsIHNvIGFza3MgdGhlaXIgc3VwcG9ydGVycyB0byBnaXZlIGhpbSAxMCBhbmQgMCB0byBBc2guCgpUaGlzIGRvZXNuJ3QgbG9vayBnb29kIHNvIG91dGxpZXJzIGFyZSByZW1vdmVkLi4uCgoKTGV0J3MgY29uc2lkZXIgYG5fRmAgdGhlIG51bWJlciBvZiBGYWd1cyBzdXBwb3J0ZXJzIChoZW5jZSBzdHJhdGVnaWMgdm90ZXJzKSBhbmQgc2VlIHdoYXQgaGFwcGVucyBmb3IgZGlmZmVyZW50IG51bWJlcnMgb2YgYG5fRmAuClRvIHNpbXBseSBjYWxjdWxhdGlvbnMgdGhlcmUgYXJlIDEwMCB2b3RlcnMgaW4gdGhlIGZvcmVzdCBlbGVjdGlvbi4KCgpgYGB7cn0KIyBEZWZpbmUgdm90aW5nIHN5c3RlbQpub19vdXRsaWVyIDwtIGZ1bmN0aW9uKHZvdGVzKSBtZWFuKHZvdGVzKQoKcmVtb3ZlX291dGxpZXJzIDwtICBmdW5jdGlvbih2b3RlcykgewogIGxvdyA8LSBxdWFudGlsZSh2b3RlcywgLjI1KSAtIDEuNSAqIElRUih2b3RlcykKICBoaWdoIDwtIHF1YW50aWxlKHZvdGVzLCAuNzUpICsgMS41ICogSVFSKHZvdGVzKQoKICBpZmVsc2UoYmV0d2Vlbih2b3RlcywgbG93LCBoaWdoKSwgdm90ZXMsIE5BKQp9CgpvdXRsaWVyX3NpbXBsZSA8LSBmdW5jdGlvbih2b3Rlcyl7CiAgcmVtb3ZlX291dGxpZXJzKHZvdGVzKSAlPiUKICAgIG1lYW4obmEucm0gPSBUUlVFKQp9CgpvdXRsaWVyX21lYW4gPC0gZnVuY3Rpb24odm90ZXMpewogIHZvdGVzIDwtIHJlbW92ZV9vdXRsaWVycyh2b3RlcykKICB2b3RlcyAlPiUKICAgIHJlcGxhY2VfbmEobWVhbih2b3RlcywgbmEucm0gPSBUUlVFKSkgJT4lCiAgICBtZWFuKCkKfQoKb3V0bGllcl9tZWRpYW4gPC0gZnVuY3Rpb24odm90ZXMpewogIHZvdGVzIDwtIHJlbW92ZV9vdXRsaWVycyh2b3RlcykKICB2b3RlcyAlPiUKICAgIHJlcGxhY2VfbmEobWVkaWFuKHZvdGVzLCBuYS5ybSA9IFRSVUUpKSAlPiUKICAgIG1lYW4oKQp9CmBgYAoKCmBgYHtyfQpjYWxjX2VsZWN0aW9ucyA8LSBmdW5jdGlvbih2b3Rlcyl7CiAgICB2b3RlcyAlPiUKICAgIHBpdm90X2xvbmdlcihldmVyeXRoaW5nKCksIG5hbWVzX3RvID0gIlRyZWUiLCB2YWx1ZXNfdG8gPSAidm90ZXMiKSAlPiUKICAgIGdyb3VwX2J5KFRyZWUpICU+JQogICAgc3VtbWFyaXplKAogICAgICBub19vdXRsaWVyID0gbm9fb3V0bGllcih2b3RlcyksCiAgICAgIG91dGxpZXJfc2ltcGxlID0gb3V0bGllcl9zaW1wbGUodm90ZXMpLAogICAgICBvdXRsaWVyX21lYW4gPSBvdXRsaWVyX21lYW4odm90ZXMpLAogICAgICBvdXRsaWVyX21lZGlhbiA9IG91dGxpZXJfbWVkaWFuKHZvdGVzKSwKICAgICAgbl9vdXQgPSByZW1vdmVfb3V0bGllcnModm90ZXMpICU+JSBpcy5uYSgpICU+JSAgc3VtKCkKICAgICkKfQojIENvbXBhcmUgdm90aW5nIHN5c3RlbXMKdm90aW5nX291dGNvbWVfc3RyYXRlZ2ljIDwtIGZ1bmN0aW9uKG5fRikgewogIHRpYmJsZShBc2ggPSBjKHJlcCg5LCAoMTAwIC0gbl9GKSksIHJlcCgwLCBuX0YpKSwKICAgICAgICAgRmFndXMgPSBjKHJlcCg2LCAoMTAwIC0gbl9GKSksIHJlcCgxMCwgbl9GKSkpICU+JQogICAgICAgICBjYWxjX2VsZWN0aW9ucygpCn0KYGBgCgpgYGB7cn0Kdm90aW5nX291dGNvbWVfc3RyYXRlZ2ljKDUpCmBgYAoKYGBge3J9CnNjZW5hcmlvc19mb3Jlc3QgPC0gdGliYmxlKAogIG5fRiA9IHNlcSgxLCAxMDApLAogIG91dGNvbWUgPSBtYXAobl9GLCB2b3Rpbmdfb3V0Y29tZV9zdHJhdGVnaWMpCikKYGBgCgpUaGUgdGhyZWUgZGlmZmVyZW50IG91dGxpZXJzIHJlcGxhY2VtZW50IG1ldGhvZHMgYmFzaWMgcHJvdmlkZSB0aGUgc2FtZSByZXN1bHRzIHNvIGZvciBzaW1wbGljeSBhcmUgb21pdHRlZCBpbiB0aGUgcGxvdAoKQW5kIGFjdHVhbGx5IHdlIGNhbiBzZWUgdGhhdCB0aGUgb3V0bGllciByZW1vdmFsIGhhcyBhbiBlZmZlY3Qgb25seSB3aGVuIHRoZXJlIGlzIGEgc21hbGwgbnVtYmVyIG9mIHN0cmF0ZWdpYyB2b3RlcwoKYGBge3J9CnNjZW5hcmlvc19mb3Jlc3QgJT4lCiAgdW5uZXN0KG91dGNvbWUpICU+JQogIHNlbGVjdCgtb3V0bGllcl9tZWFuLCAtb3V0bGllcl9tZWRpYW4pICU+JQogIHBpdm90X2xvbmdlcihjKC1uX0YsIC1UcmVlLCAtbl9vdXQpLCBuYW1lc190byA9ICJ0eXBlIiwgdmFsdWVzX3RvID0gInNjb3JlIikgJT4lCiAgZ2dwbG90KGFlcyhuX0YsIHNjb3JlLCBjb2xvdXIgPSB0eXBlLCBsaW5ldHlwZSA9IFRyZWUpKSArCiAgZ2VvbV9saW5lKCkgKwogIGdndGhlbWVzOjpzY2FsZV9jb2xvcl9jb2xvcmJsaW5kKCkgKwogIGxhYnMoeCA9ICJOdW1iZXIgb2Ygc3RyYXRlZ2ljIHZvdGVzIChvbmx5IGZvciBiZWVjaCkiKQoKYGBgCgoKSWYgdGhlIG51bWJlciBvZiBzdHJhdGVnaWMgdm90ZXMgcmVtb3Zpbmcgb3V0bGllcnMgZmlyc3QgaGFzIG5vIGltcGFjdCBhbmQgdGhlbiBpdCBhY3R1YWxseSBoZWxwIHRoZSBGYWd1cyEKCmBgYHtyfQpzY2VuYXJpb3NfZm9yZXN0ICU+JQogICAgdW5uZXN0KG91dGNvbWUpICU+JQogICAgZmlsdGVyKFRyZWUgPT0gIkFzaCIpICU+JQogICAgbXV0YXRlKG91dGxpZXJfZWZmZWN0ID0gb3V0bGllcl9zaW1wbGUgLSBub19vdXRsaWVyICkgJT4lCiAgICBnZ3Bsb3QoYWVzKG5fRiwgb3V0bGllcl9lZmZlY3QpKSArCiAgICAgICAgZ2VvbV9saW5lKCkKYGBgCgoKIyBDYXNlIDIgLSBIb25lc3Qgdm90aW5nCgpsZXQncyBub3cgaW1hZ2luZSB0aGF0IGV2ZXJ5b25lIGdvdCBhbm5veWVkIGJ5IHRoZSBGYWd1cyBiZWhhdmlvdXIgKHdlIGNhbiBkcmVhbSByaWdodCAuLi4pIGFuZCB0aGUgRmFndXMgaXMgcmVwbGFjZWQgdGhlIEJpcmNoLCB3aGljaCBhY3R1YWxseSBpcyBhcyBnb29kIGFzIHRoZSBBc2guCgpOb3cgdGhlIGluaGFiaXRhbnRzIG9mIHRoZSBmb3Jlc3QgYXJlIGFsbCB2ZXJ5IGhvbmVzdCBhbmQganVzdCBnaXZlIHRoZSBjYW5kaWRhdGUgdGhlIHNjb3JlIHRoYXQgdGhleSBkZXNlcnZlIGRlcGVuZGluZyBvbiB0aGUgYmVuZWZpdCB0aGV5IHJlY2VpdmUgZnJvbSBlYWNoIFRyZWUuClNvIGV2ZXJ5b25lIGdpdmVzIDkgdG8gdGhlIEFzaCBhbmQgOSB0byB0aGUgQmlyY2guCgpOb3cgaW1hZ2luZSB0aGF0IHRoZXJlIGlzIHRoZSBtdXNocm9vbSB0aGF0IGxpdmVzIGF0IHRoZSBlZGdlIG9mIHRoZSBmb3Jlc3QgYW5kIGhlIGlzIGFsbGVyZ2ljIHRvIHRoZSBwb2xsZW4gb2YgdGhlIEFzaCwgcG9vciBtdXNocm9vbSBmb3IgaGltIGhhdmluZyBhIG5ldyBhc2ggd291bGQgYmUgYmlnIHByb2JsZW0gKHRoYXQncyBob25lc3Qgdm90aW5nIHRoYXQncyBub3Qgc3RyYXRlZ3kpLCBidXQgdGhleSB1bmRlcnN0YW5kIHRoYXQgdGhlIGFzaCBpcyBhbHNvIGEgZ29vZCB0cmVlIHNvIGRlY2lkZWQgYXQgdGhlIGVuZCBvZiBnaXZpbmcgYSBzY29yZSBvZiA3IHRvIHRoZSBhc2guCgpMZXQncyBzZWUgd2hhdCBoYXBwZW5zIHdpdGggdGhlIG91dGxpZXJzIHZzIG5vX291dGxpZXJzIHZvdGluZyBzeXN0ZW0uLi4KCmBgYHtyfQp2b3RlcyA8LSB0aWJibGUoCiAgQXNoID0gYyhyZXAoOSwgOTkpLCA3KSwKICBCaXJjaCA9IHJlcCg5LCAxMDApCikKCmNhbGNfZWxlY3Rpb25zKHZvdGVzKQpgYGAKClBvb3IgbXVzaHJvb20hIEhlIGhhZCBhbiBob25lc3QgY29uY2VybiBhbmQgbm93IHRoZWlyIGNvbXBsZXRlbHkgdmFsaWQgdm90ZSB3YXMgcmVtb3ZlZCAod2lsbCB0aGVyZSBiZSBhIG11c2hyb29tIHJldm9sdXRpb24p