Silver Assignment 4

Set up Python within R environment:

knitr::opts_chunk$set(echo = TRUE)

#reticulate allows for python code in R environment
library(reticulate)

#set environment and options specific for reticulate
use_condaenv("r-reticulate", required=TRUE)

Import packages:

import pandas as pd
import numpy as np
import string
import re
import html
import unicodedata
import nltk
from nltk.corpus import stopwords
import emoji
import matplotlib.pyplot as plt
import datetime as dt
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import tensorflow as tf
from transformers import (
    AutoTokenizer,
    TFBertForSequenceClassification,
    TFTrainingArguments
)
import datasets
import evaluate

Bring in data:

df = pd.read_csv('data_huang_devansh.csv', usecols=['Content','Label'])

See here for information about data: https://data.mendeley.com/datasets/9sxpkmm8xn/1

This is a compilation of over 800 thousand text snippets which have been hand-labeled for whether or not they constitute hate speech.

The data is made up of only two columns: - Content: the text of the tweet - Label: the hand-coded value for whether the tweet is considered hate speech (1) or not (0)

Let’s lightly tidy and then glance at the data:

#retain only string values
df = df[df['Content'].apply(type)==str]

#remove duplicate Content values
df.drop_duplicates(subset='Content', inplace=True)

df.sample(10, random_state=11219)['Content']

## 740181    RT @geechiegal843: I act like ion ee kno dat b...
## 448765    XD FUCK! FUCK! FUUUUUUUUUUUUCK This is amazing...
## 515694    "\n\nand your atual post:\n\n Hello, Sorry I c...
## 410886    You suck Kat and Andre. Poorest form ever!\nOn...
## 742422                     Talk about pussy power &#128530;
## 663120                                              niggers
## 789583    Saturday night spent in bed on my own cus ever...
## 81354        ITS NOT A PERSONAL ATTACK HES MY BRO STOP B...
## 92492     Quick, tell someone with real power to block m...
## 401243    @iqy007 @alwalawalbaraaa @DanieleRaineri One w...
## Name: Content, dtype: object

Immediately I can tell there are some aspects of the text worth cleaning up for model-building purposes, including: - Capitalization - Stray spaces and punctuation - Removing user @ mentions - Removing “RT” signifier for retweets I also know from the data documentation that there are emojis, datetimes, and hyperlinks in the data. I should account for all of these.

Let’s also quickly check our class distribution:

df['Label'].value_counts()

## Label
## 0    467033
## 1     93352
## Name: count, dtype: int64

While there are notably fewer instances of hate speech than non-hate speech, this is still a workable distribution. Further, in real world social media moderation, it is highly common that only a small minority of content would violate a hate speech policy.

At this point, I think it’s best to sample our data down to about 50k rows (determined by testing). In an ideal world we would train on all of our data, but via trial and error I have determined training on 500k rows is untenable with my setup.

At this stage, I’ll take my samples based on the proportions in the dataset.

hs_sample_count = int(df['Label'].value_counts()[1]/len(df)*50000)
non_hs_sample_count = 50000 - hs_sample_count

#select 25k hate speech samples
df_hs = df[df['Label']==1].sample(hs_sample_count, random_state=11219)
#select 25k non hate speech samples
df_non_hs = df[df['Label']==0].sample(non_hs_sample_count, random_state=11219)
#concatenate
df1 = pd.concat([df_hs, df_non_hs], axis=0)
#sample to randomly order
df1 = df.sample(50000)

Cleaning

Our SVM and neural network may require some different cleaning techniques, but would benefit from some top-line, model-agnostic cleaning.

First let’s see how the emojis are presented in our text data (since sometimes they appear as the actual symbol, othertimes as a standardized code.)

df1['emoji_list'] = df1['Content'].apply(emoji.emoji_list)

df1['emoji_count'] = df1['emoji_list'].apply(len)

df1.sort_values(by='emoji_count', ascending=False).iloc[5,0]

## 'Šupak meraklije haha😂😂😂😂😂😟😟😟😂😂😂😂'

This shows that the emojis themselves appear in the text and should be handled accordingly.

Now I can do some model-agnostic cleaning:

# Pre‑compiled patterns
URL_RE      = re.compile(r"https?://\S+|www\.\S+", flags=re.IGNORECASE)
USERHANDLE_RE = re.compile(r"@\w+")             # matches @username
RT_RE       = re.compile(r"^(RT)\b\s*", flags=re.IGNORECASE)
WS_RE       = re.compile(r"\s+")

def clean_shared(text):
  
    text = str(text)
    # 1. HTML entity & Unicode NFC normalization
    text = html.unescape(text)
    text = unicodedata.normalize("NFC", text)

    # 2. Remove 'RT' at start
    text = RT_RE.sub("", text)

    # 3. Remove @mentions anywhere
    text = USERHANDLE_RE.sub("", text)

    # 4. Replace URLs
    text = URL_RE.sub("<URL>", text)

    # 5. Collapse whitespace
    text = WS_RE.sub(" ", text)

    # 6. Final trim & lowercase
    return text.strip().lower()

Run cleaning function:

#clean data
df1['text_clean'] = df1['Content'].apply(clean_shared)

#reduce back to only relevant columns
df1 = df1[['Content', 'text_clean', 'Label']]

See cleaned data:

df1.sample(20, random_state=11219)['text_clean']

## 596334                                         {{unblock|yo
## 363211    ` peace, shaad iko, thank you for referring th...
## 72297     ` == ha, ha! == you are jealous? dont be, no n...
## 616358       " you're welcome and good luck with the cup! "
## 253212    we are currently making the unreferenced claim...
## 586368    " radiation therapy implications jellytussle, ...
## 343630    ` == mind explaining this edit? == this one? i...
## 600782    i was last in newry in 2003, and they were cel...
## 750978    * lives in chicago * thinks gun control works ...
## 98321     ` :::i am not going to do biasing here but the...
## 39241     == evidence on adil's sockpuppetry == hi mores...
## 650232    venanalysis is not my main information source,...
## 6004      ` :::::::::oh mark, for shame! why lie? i didn...
## 334039    ` :::``1894...simple transformer substations, ...
## 391399    == wikicup 2015 fp nomination == hi there, jus...
## 250952    ` == ``please read`` notice == i'm curious why...
## 62127     == marc mysterio == hi, this kww editor appear...
## 414862    so twitter is calling kat crazy eyes!!!! bless...
## 726615    shit i wish it was socially acceptable for me ...
## 754426    i know she is very good friends with siobhan a...
## Name: text_clean, dtype: object

There is still a lot of stray punctuation: we can take that out for our SVM, but retain it for our neural network where punctuation tends to retain its semantic value.

SVM

I will borrow a function from my previous assignment for cleaning and lemmatizing my text.

#pre‑compile once
NUM_RE       = re.compile(r"\d+")
PUNCT_REMOVE = string.punctuation.replace("<", "").replace(">", "")

def clean_and_lemmatize(text, lemmatizer):
    #remove digits
    text = NUM_RE.sub("", text)
    #remove punctuation (but keep < and >)
    text = text.translate(str.maketrans("", "", PUNCT_REMOVE))
    #collapse any new whitespace and trim
    text = re.sub(r"\s+", " ", text).strip()
    #tokenize & lemmatize
    tokens = nltk.word_tokenize(text)
    lemmas = [lemmatizer.lemmatize(t) for t in tokens]
    return " ".join(lemmas)

lemmatizer = WordNetLemmatizer()

df1['text_clean_SVM'] = df1['text_clean'].apply(lambda x: clean_and_lemmatize(x, lemmatizer))

Now I can vectorize the data. I’ll use TF-IDF vectorization, which relies on relative frequency rather than raw word counts. I’ll use a max_features value of 5000, so as not to overdo it given my relatively small sample.

#create the vectorizer
vectorizer_1g_5k = TfidfVectorizer(
    stop_words='english',
    max_features=5000     
)

#fit and transform the text column
X_SVM = vectorizer_1g_5k.fit_transform(df1['text_clean_SVM'])

And set y:

y_SVM = df1['Label']

Now we can split the data:

X_train_SVM, X_test_SVM, y_train_SVM, y_test_SVM = train_test_split(
    X_SVM, y_SVM, test_size=0.2, random_state=11219
)

While in the past I have experimented with different kernels for SVMs, the linear kernel is generally understood to be the best option for text classification

#linear kernel
clf_linear1 = SVC(kernel='linear')
clf_linear1.fit(X_train_SVM, y_train_SVM)

SVC(kernel='linear')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

pred_linear1 = clf_linear1.predict(X_test_SVM)

print("Linear Kernel")

## Linear Kernel

print(classification_report(y_test_SVM, pred_linear1))

##               precision    recall  f1-score   support
## 
##            0       0.89      0.97      0.93      8333
##            1       0.71      0.39      0.51      1667
## 
##     accuracy                           0.87     10000
##    macro avg       0.80      0.68      0.72     10000
## weighted avg       0.86      0.87      0.86     10000

Not bad accuracy overall–but much lower for hate speech, especially for hate speech. This could be an issue of class distribution and, specifically, the comparatively low number of hate speech examples even if the overall sample of 50k should be sufficient.

Experiment 1: Class Distribution & Majority Undersampling

Since I’ll be downsampling already, it may be best to simultaneously under-sample the majority class to retain more examples of hate speech. While class balance isn’t required for my models, it may help to ensure any accuracy achievements are well distributed.

#select 25k hate speech samples
df_hs = df[df['Label']==1].sample(25000, random_state=11219)
#select 25k non hate speech samples
df_non_hs = df[df['Label']==0].sample(25000, random_state=11219)
#concatenate
df2 = pd.concat([df_hs, df_non_hs], axis=0)
#sample to randomly order
df2 = df2.sample(50000)

Re-clean

#clean data
df2['text_clean'] = df2['Content'].apply(clean_shared)

#reduce back to only relevant columns
df2 = df2[['Content', 'text_clean', 'Label']]

Re-clean (specific for SVM) and lemmatize:

lemmatizer = WordNetLemmatizer()

df2['text_clean_SVM'] = df2['text_clean'].apply(lambda x: clean_and_lemmatize(x, lemmatizer))

Re-vectorize:

#create the vectorizer
vectorizer_1g_5k = TfidfVectorizer(
    stop_words='english',
    max_features=5000     
)

#fit and transform the text column
X_SVM2 = vectorizer_1g_5k.fit_transform(df2['text_clean_SVM'])

And set y:

y_SVM2 = df2['Label']

Re-split the data:

X_train_SVM2, X_test_SVM2, y_train_SVM2, y_test_SVM2 = train_test_split(
    X_SVM2, y_SVM2, test_size=0.2, random_state=11219
)

Re-train model:

#linear kernel
clf_linear2 = SVC(kernel='linear')
clf_linear2.fit(X_train_SVM2, y_train_SVM2)

SVC(kernel='linear')

pred_linear2 = clf_linear2.predict(X_test_SVM2)

Accuracy report:

print("Linear Kernel")

## Linear Kernel

print(classification_report(y_test_SVM2, pred_linear2))

##               precision    recall  f1-score   support
## 
##            0       0.80      0.82      0.81      4977
##            1       0.82      0.80      0.81      5023
## 
##     accuracy                           0.81     10000
##    macro avg       0.81      0.81      0.81     10000
## weighted avg       0.81      0.81      0.81     10000

2-grams and Max Features

We can see a major positive impact on hate speech identification from the re-sampling, though there is a small dip in performance for non-hate speech.

Let’s see if we can boost performance by expanding my features to include bi-grams. While this massively increases my number of features, the cap on the overall max features should prevent the analysis from getting out of hand.

Re-vectorize:

#create the vectorizer
vectorizer_2g_5k = TfidfVectorizer(
    stop_words='english',
    ngram_range=(1,2),
    max_features=5000
)

#fit and transform the text column
X_SVM3 = vectorizer_2g_5k.fit_transform(df2['text_clean_SVM'])

y_SVM3 = df2['Label']

Re-split the data:

X_train_SVM3, X_test_SVM3, y_train_SVM3, y_test_SVM3 = train_test_split(
    X_SVM3, y_SVM3, test_size=0.2, random_state=11219
)

Re-train model:

#linear kernel
clf_linear3 = SVC(kernel='linear')
clf_linear3.fit(X_train_SVM3, y_train_SVM3)

SVC(kernel='linear')

pred_linear3 = clf_linear3.predict(X_test_SVM3)

Accuracy report:

print("Linear Kernel")

## Linear Kernel

print(classification_report(y_test_SVM3, pred_linear3))

##               precision    recall  f1-score   support
## 
##            0       0.80      0.82      0.81      4977
##            1       0.81      0.80      0.81      5023
## 
##     accuracy                           0.81     10000
##    macro avg       0.81      0.81      0.81     10000
## weighted avg       0.81      0.81      0.81     10000

Interestingly, we see very little change here. We can stick with 1-grams.

Let’s look at the most important features:

#extract feature names and class coefficients
feature_names = np.array(vectorizer_1g_5k.get_feature_names_out())
coefs = clf_linear2.coef_.toarray()  # shape = (n_classes, n_features)
#for binary, coefs[0] is for class “1” vs “0” decision boundary

#identify top‑k features for each class
def top_features(class_coef, names, k=20):
    #largest positive = pushes toward class 1 (hate)
    top_pos_idxs = np.argsort(class_coef)[-k:][::-1]
    #largest negative = pushes toward class 0 (non‑hate)
    top_neg_idxs = np.argsort(class_coef)[:k]
    return pd.DataFrame({
        "token_pos": names[top_pos_idxs],
        "coef_pos":  class_coef[top_pos_idxs],
        "token_neg": names[top_neg_idxs],
        "coef_neg":  class_coef[top_neg_idxs],
    })

top20 = top_features(coefs[0], feature_names, k=20)
print(top20)

##    token_pos  coef_pos     token_neg  coef_neg
## 0      idiot  5.672962            xd -2.086868
## 1    asshole  4.778240       claimed -2.002501
## 2     retard  4.448382            ex -1.878344
## 3     faggot  4.401133        bihday -1.674916
## 4     stupid  4.126624         staff -1.672323
## 5   retarded  4.086403            rt -1.663113
## 6      loser  3.897166     tradition -1.656991
## 7       fuck  3.855585     mentioned -1.651375
## 8     sexist  3.763206     happiness -1.651329
## 9       twat  3.742369            pp -1.645938
## 10      hell  3.672426        stated -1.642616
## 11     moron  3.572866       youtube -1.634592
## 12    nigger  3.468446      affected -1.633192
## 13  bullshit  3.452559    stormfront -1.618396
## 14      spic  3.412869       average -1.608695
## 15      piss  3.353462     published -1.600724
## 16      crap  3.347687       pattern -1.594814
## 17   bastard  3.310819       article -1.580812
## 18     whore  3.237551          bias -1.572208
## 19     mongy  3.227393  particularly -1.572154

Neural Network

Building a FastText FFNN model.

#prepare data
texts = df2['text_clean'].tolist()
labels = df2['Label'].values

#tokenize into integer sequences
max_words = 10000
maxlen    = 100

tokenizer = Tokenizer(num_words=max_words, oov_token='<OOV>')
tokenizer.fit_on_texts(texts)
seqs = tokenizer.texts_to_sequences(texts)
X = pad_sequences(seqs, maxlen=maxlen, padding='post')
y = labels

#train/val split
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=11219
)

#build FastText‑style model
model = models.Sequential([
    layers.Embedding(input_dim=max_words, output_dim=100, input_length=maxlen),
    layers.GlobalAveragePooling1D(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid'),
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

#train
es = callbacks.EarlyStopping(patience=2, restore_best_weights=True)
model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=128,
    callbacks=[es],
    verbose=2
)

## Epoch 1/10
## 313/313 - 2s - loss: 0.5745 - accuracy: 0.6957 - val_loss: 0.4624 - val_accuracy: 0.7813 - 2s/epoch - 6ms/step
## Epoch 2/10
## 313/313 - 2s - loss: 0.4265 - accuracy: 0.8115 - val_loss: 0.4059 - val_accuracy: 0.8260 - 2s/epoch - 5ms/step
## Epoch 3/10
## 313/313 - 1s - loss: 0.3717 - accuracy: 0.8416 - val_loss: 0.3994 - val_accuracy: 0.8234 - 1s/epoch - 5ms/step
## Epoch 4/10
## 313/313 - 1s - loss: 0.3387 - accuracy: 0.8571 - val_loss: 0.4000 - val_accuracy: 0.8249 - 1s/epoch - 5ms/step
## Epoch 5/10
## 313/313 - 1s - loss: 0.3221 - accuracy: 0.8663 - val_loss: 0.4074 - val_accuracy: 0.8240 - 1s/epoch - 5ms/step
## <tf_keras.src.callbacks.History object at 0x38ef320b0>

#evaluate
loss, acc = model.evaluate(X_val, y_val, verbose=0)
print(f'Validation accuracy: {acc:.3f}')

## Validation accuracy: 0.823

Full eval:

#predict on the validation set
y_pred = (model.predict(X_val) >= 0.5).astype(int).ravel()

## 
  1/313 [..............................] - ETA: 12s
121/313 [==========>...................] - ETA: 0s 
257/313 [=======================>......] - ETA: 0s
313/313 [==============================] - 0s 388us/step

#print a report
print(classification_report(y_val, y_pred, target_names=['non‑hate','hate']))

##               precision    recall  f1-score   support
## 
##     non‑hate       0.86      0.77      0.81      5000
##         hate       0.79      0.87      0.83      5000
## 
##     accuracy                           0.82     10000
##    macro avg       0.83      0.82      0.82     10000
## weighted avg       0.83      0.82      0.82     10000

Make the data as big as possible while retaining class balance

df['Label'].value_counts()

## Label
## 0    467033
## 1     93352
## Name: count, dtype: int64

df3 = pd.concat([df[df['Label']==0].sample(93352, random_state=11219),
           df[df['Label']==1]],
           axis=0).sample(186704)
           
df3['text_clean'] = df3['Content'].apply(clean_shared)

df3['Label'].value_counts()

## Label
## 1    93352
## 0    93352
## Name: count, dtype: int64

#prepare data
texts = df3['text_clean'].tolist()
labels = df3['Label'].values

#tokenize into integer sequences
max_words = 10000
maxlen    = 100

tokenizer = Tokenizer(num_words=max_words, oov_token='<OOV>')
tokenizer.fit_on_texts(texts)
seqs = tokenizer.texts_to_sequences(texts)
X = pad_sequences(seqs, maxlen=maxlen, padding='post')
y = labels

#train/val split
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=11219
)

#build FastText‑style model
model = models.Sequential([
    layers.Embedding(input_dim=max_words, output_dim=100, input_length=maxlen),
    layers.GlobalAveragePooling1D(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid'),
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

#train
es = callbacks.EarlyStopping(patience=2, restore_best_weights=True)
model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=128,
    callbacks=[es],
    verbose=2
)

## Epoch 1/10
## 1167/1167 - 5s - loss: 0.4616 - accuracy: 0.7823 - val_loss: 0.3834 - val_accuracy: 0.8330 - 5s/epoch - 5ms/step
## Epoch 2/10
## 1167/1167 - 5s - loss: 0.3728 - accuracy: 0.8405 - val_loss: 0.3722 - val_accuracy: 0.8374 - 5s/epoch - 5ms/step
## Epoch 3/10
## 1167/1167 - 6s - loss: 0.3532 - accuracy: 0.8484 - val_loss: 0.3702 - val_accuracy: 0.8384 - 6s/epoch - 5ms/step
## Epoch 4/10
## 1167/1167 - 6s - loss: 0.3399 - accuracy: 0.8521 - val_loss: 0.3767 - val_accuracy: 0.8345 - 6s/epoch - 5ms/step
## Epoch 5/10
## 1167/1167 - 5s - loss: 0.3285 - accuracy: 0.8564 - val_loss: 0.3754 - val_accuracy: 0.8360 - 5s/epoch - 4ms/step
## <tf_keras.src.callbacks.History object at 0x38e314130>

#evaluate
loss, acc = model.evaluate(X_val, y_val, verbose=0)
print(f'Validation accuracy: {acc:.3f}')

## Validation accuracy: 0.838

#predict on the validation set
y_pred = (model.predict(X_val) >= 0.5).astype(int).ravel()

## 
   1/1167 [..............................] - ETA: 27s
 124/1167 [==>...........................] - ETA: 0s 
 258/1167 [=====>........................] - ETA: 0s
 392/1167 [=========>....................] - ETA: 0s
 526/1167 [============>.................] - ETA: 0s
 662/1167 [================>.............] - ETA: 0s
 797/1167 [===================>..........] - ETA: 0s
 931/1167 [======================>.......] - ETA: 0s
1068/1167 [==========================>...] - ETA: 0s
1167/1167 [==============================] - 0s 375us/step

#print a report
print(classification_report(y_val, y_pred, target_names=['non‑hate','hate']))

##               precision    recall  f1-score   support
## 
##     non‑hate       0.85      0.82      0.84     18671
##         hate       0.83      0.86      0.84     18670
## 
##     accuracy                           0.84     37341
##    macro avg       0.84      0.84      0.84     37341
## weighted avg       0.84      0.84      0.84     37341

Change layer width:

#build FastText‑style model
model = models.Sequential([
    layers.Embedding(input_dim=max_words, output_dim=100, input_length=maxlen),
    layers.GlobalAveragePooling1D(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid'),
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

#train
es = callbacks.EarlyStopping(patience=2, restore_best_weights=True)
model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=128,
    callbacks=[es],
    verbose=2
)

## Epoch 1/10
## 1167/1167 - 6s - loss: 0.4508 - accuracy: 0.7881 - val_loss: 0.3860 - val_accuracy: 0.8314 - 6s/epoch - 5ms/step
## Epoch 2/10
## 1167/1167 - 5s - loss: 0.3675 - accuracy: 0.8413 - val_loss: 0.3715 - val_accuracy: 0.8385 - 5s/epoch - 5ms/step
## Epoch 3/10
## 1167/1167 - 6s - loss: 0.3491 - accuracy: 0.8492 - val_loss: 0.4087 - val_accuracy: 0.8183 - 6s/epoch - 5ms/step
## Epoch 4/10
## 1167/1167 - 6s - loss: 0.3358 - accuracy: 0.8541 - val_loss: 0.3702 - val_accuracy: 0.8380 - 6s/epoch - 5ms/step
## Epoch 5/10
## 1167/1167 - 6s - loss: 0.3231 - accuracy: 0.8575 - val_loss: 0.3742 - val_accuracy: 0.8375 - 6s/epoch - 5ms/step
## Epoch 6/10
## 1167/1167 - 6s - loss: 0.3148 - accuracy: 0.8605 - val_loss: 0.3810 - val_accuracy: 0.8357 - 6s/epoch - 5ms/step
## <tf_keras.src.callbacks.History object at 0x38e391630>

#evaluate
loss, acc = model.evaluate(X_val, y_val, verbose=0)
print(f'Validation accuracy: {acc:.3f}')

## Validation accuracy: 0.838

#predict on the validation set
y_pred = (model.predict(X_val) >= 0.5).astype(int).ravel()

## 
   1/1167 [..............................] - ETA: 24s
 119/1167 [==>...........................] - ETA: 0s 
 247/1167 [=====>........................] - ETA: 0s
 378/1167 [========>.....................] - ETA: 0s
 508/1167 [============>.................] - ETA: 0s
 640/1167 [===============>..............] - ETA: 0s
 769/1167 [==================>...........] - ETA: 0s
 899/1167 [======================>.......] - ETA: 0s
1028/1167 [=========================>....] - ETA: 0s
1160/1167 [============================>.] - ETA: 0s
1167/1167 [==============================] - 0s 389us/step

#print a report
print(classification_report(y_val, y_pred, target_names=['non‑hate','hate']))

##               precision    recall  f1-score   support
## 
##     non‑hate       0.86      0.81      0.83     18671
##         hate       0.82      0.87      0.84     18670
## 
##     accuracy                           0.84     37341
##    macro avg       0.84      0.84      0.84     37341
## weighted avg       0.84      0.84      0.84     37341