Feature Exploration in Ad-Level CS Data

HTI Labs

Purpose/Context

We want to get an understanding of features and relationships between features in our online commerical sex data, both in the aggregate and across specific domains. Currently, features that are especially relevant to our matching algorithm are considered. Current analyses rely on a sample of online CS data representing aprox. 100k ads pulled on August 2021.

Missingness

Missingness by Feature

Note: values with empty brackets (i.e. '[]') are counted as missing.

Feature Presence Correlation Matrix - Skip The Games

Feature Presence Correlation Matrix - List Crawler

Feature Presence Correlation Matrix - Erotic Review

Categorical Features

Domain

Gender

Racial Category

Venue Category

Hispanic/Latino

Eye Color

Numeric Features

Age

Price Per Hour

Weight

Racial Category Count

Venue Count

Phone Number Count

Image Count

Feature Relationships: Categorical ~ Categorical

Venue Correlation - All Ads

Venue Correlation - Skip The Games Ads

Venue Correlation - Listcrawler Ads

Venue Correlation - Erotic Review Ads

Feature Relationships: Categorical ~ Numeric

Gender & Age

Gender & Price

Venue & Age

Venue & Price

Hispanic/Latino Status & Age

Hispanic/Latino Status & Price

Survival Indicator Status & Age

Survival Indicator Status & Price

Foriegn Indicator Status & Age

Foriegn Indicator Status & Price

Feature Relationships: Numeric ~ Numeric

Age & Price - All Ages

Age & Price - Less than 50

Age & Price - Less than 30

Weight & Price