Feature Exploration in Ad-Level CS Data

HTI Labs

Purpose/Context

We want to get an understanding of features and relationships between features in our online commerical sex data, both in the aggregate and across specific domains. Currently, features that are especially relevant to our matching algorithm are considered. Current analyses rely on a sample of online CS data representing aprox. 100k ads pulled on August 2021.

Missingness

Missingness by Feature

Note: values with empty brackets (i.e. '[]') are counted as missing.

Feature Presence Correlation Matrix - Skip The Games

Feature Presence Correlation Matrix - List Crawler

Feature Presence Correlation Matrix - Erotic Review

Categorical Features

Domain

Gender

Racial Category

Hispanic/Latino

Multiple Sex Providers

Eye Color

Numeric Features

Age

Weight

Racial Category Count

Phone Number Count

Image Count

Feature Relationships: Categorical ~ Categorical

Feature Relationships: Categorical ~ Numeric

Gender & Age

Hispanic/Latino & Age

Survival Indicator Status & Age

Foriegn Indicator Status & Age

Feature Relationships: Numeric ~ Numeric