Purpose/Context:
Get an understanding of person-level metrics by relying on the Devontez dataset, which we know to be true to reality/"golden" because of the manual vetting associated with it. These metrics can be treated as a reference point for seeing how our v2 matching processes are working by applying our v2 matching processes to the same set of data and calculating these same metrics.
A note on terminology/methods/decisions:
A “person” is inferred from the Worker.ID field. The Devontez data represents 208 persons.
Because each stat is person-centric, observations (ads) that have no Worker.ID are not considered.
"Ad counts across persons" considers all ads associated with a person, not just unique ads.
Both mean and median are provided. Friendly reminder that the median is a better measurement of central tendency for data with skewed distributions (which this data has), as the median is more robust/not as affected by outliers.
Age-related stats omit ads with no associated age.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 4.0 13.0 57.7 43.0 1103.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 4.034 4.000 66.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.298 2.000 21.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 3.000 4.226 5.000 53.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 6.75 54.00 115.81 172.25 757.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.274 3.000 12.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 2.625 3.000 30.000