Measuring Toxicity, Threats, and Attacks on Identity

Overview and Introduction

While we can measure the overall sentiment of tweets using basic bag-of-words approaches to identify the proportion of negative versus positive words surrounding keywords, we can also perform more nuanced analysis of the sentiment of text using the PeRspective API. Using several trained machine learning models, we can measure the perceived impact of a comment (in this case, tweet) on a conversation. This package has been specifically designed to analyze “comments” or in other words “a single post to a web page’s comments section, a forum post, a message to a mailing list, a chat message…”. The models used to score comments are Convolutional Neural Network (CNN) trained models with GloVe work embeddings. They have been created using thousands of comments from online forums such as Wikipedia and New York Times. Each of these comments has been human-coded to train the models. The PeRspective API has several models, including alpha models and experimental models. For the purpose of this analysis, we will be using the alpha ‘TOXICITY’ model along with two experimental models: ‘IDENTITY_ATTACK’ and ‘THREAT’. Additionally, the ‘SEXUALLY_EXPLICIT’ and ‘FLIRTATION’ experimental models will be explored specifically in relation to tweets that mention women candidates.

The Models

The models mentioned above are specified below:

  • TOXICITY
    • rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion
  • IDENTITY_ATTACK
    • negative or hateful comments targeting someone because of their identity
  • THREAT
    • describes an intention to inflict pain, injury, or violence against an individual or group
  • SEXUALLY_EXPLICIT
    • contains references to sexual acts, body parts, or other lewd content
  • FLIRTATION
    • pickup lines, complimenting appearance, subtle sexual innuendos, etc.

Each of these models score individual comments on a continuous probability scale from 0 to 1. The models return “model attribute scores” for each tweet where a higher value from 0 to 1 indicates a greater likelihood of the attribute level. In other words, the model predicts the probability that a tweet, for example, will be perceived as rude, disrespectful, or unreasonable (TOXICITY). Using these scores, we can investigate which topics elicit higher toxicity/threat/identity_attack scores and from who, whether men or women are more likely to engage in toxic language, the parties with the most candidates using toxic language, and more. Below I investigate each of the core issues outlined in the key issue sentiment section: the economy, immigration, the environment, indigenous issues, and gender and feminism. Tweets have been grouped by topic according to the following custom topic dictionary (note french and english tweets are both scored using PeRspective):

Custom English Topic Dictionary

  • Immigration
    • immigrant(s), immigrate, immigration, immigrations
    • migrant, migrate, migration
    • refuge, refugee, refugees, border
  • Economy
    • economic(s), economically, economy
    • job(s), employ, employee, employer, employment
    • gdp, worker(s), workforce, workplace
    • tax, taxation, taxable, taxes
    • budget, budget2019, business, businesses
  • Environment
    • environment(al), environmentalist, environmentally
    • climate, global warming, pipeline(s), sustainability, sustainable
    • carbon, kinder morgan, transmountain, trans mountain
  • Indigenous Issues
    • indigenous, native, metis, inuit, first nation(s), firstnation(s), aboriginal
    • treaty, chief, white paper, self government, reconciliation
  • Gender/Feminism
    • gender, feminism, feminist, women(s), girl, woman

Custom French Topic Dictionary

  • Immigration
    • immigrant(s), immigrate, immigration, immigrations, l’immigration
    • migration, migrant, migrateur, migratoire
    • refuge, refugee, refugees, border
  • Economy
    • economic(s), economically, economy, economie
    • job, emploi, emplois, employer(s), employeur, pib
    • ouvrir, ouvrier, workforce, workplace
    • tax, taxation, taxable, taxes, taxer
    • budget, budget2019, business, businesses impôt, impôts
  • Environment
    • environment(al), environmentalist, environnement, environnemental
    • environnementaliste, environnementalistes, climat, climate, climatechange, climate change
    • global warming, pipeline, sustainable, sustainability, durabilité
    • carbon, carbone, carbone taxe, kinder morgan, transmountain, trans mountain
  • Indigenous Issues
    • indigène, indigenous, metis, inuit, inuites, inuits, first nation(s), firstnation(s), aboriginal
    • traité, treaty, chief, white paper, self government, reconciliation, réconciliation
  • Gender/Feminism
    • gender, sexes, féminisme, féministe, femme, women, girl, fille, woman

Tweets on Immigration

Measuring Toxicity, Threats, and Identity Attacks

Tweets on Indigenous Issues

Measuring Toxicity, Threats, and Identity Attacks