Feature Exploration in CS Profile Data

HTI Labs

Purpose/Context

We want to get an understanding of features and relationships between features in our online CS profile data, both in the aggregate and across specific domains. Currently, features that are especially relevant to our matching algorithm are considered. Current analyses rely on a sample of online CS data representing aprox. 50k CS profiles pulled August 2021.

Missingness

Missingness by Feature

Note: values with empty brackets (i.e. '[]') are counted as missing.

Feature Presence Correlation Matrix - Sip Sap

Categorical Features

Domain

Gender

Racial Category

Hispanic/Latino

Multiple Sex Providers

Eye Color

Numeric Features

Age

Price Per Hour

Weight

Racial Category Count

Phone Number Count

Image Count

Feature Relationships: Categorical ~ Categorical

Feature Relationships: Categorical ~ Numeric

Gender & Age

Gender & Price

Hispanic/Latino Status & Age

Hispanic/Latino Status & Price

Survival Indicator Status & Age

Survival Indicator Status & Price

Foriegn Indicator Status & Age

Foriegn Indicator Status & Price

Feature Relationships: Numeric ~ Numeric

Age & Price - All Ages

Age & Price - Less than 50

Age & Price - Less than 30

Weight & Price