This is the best talk I’ve attended in over a year.

Data journalism Principles:

Story Leads data follows Use rigorous but interminable methods Be accurate, be fast, be transparent

Tidyverse is the tool of choice for data.

There is a fivethityeight r package.

Six Types of Data Stories

  1. Novelty
  2. Outlier
  3. Archetype
  4. Trend
  5. Debunking
  6. Forecast

Novelty Data Story: Basic questions are first.

New Data Story danger: Triviality

Remedy: Simple Summaries

Ask yourself: Is this data meaningful to others?

Outlier Stories

Danger: Spurious Result

Tactic: Characters - talk about who the outlier is: who is it, what company is it, etc.

Profile one of the characters from the ourlier group, then introduce the statistics

Ask yourself: Is this really so different?

Archetype Stories

Danger: Oversimplification

Tactic: Modeling

Ask Yourself: What Variables am I leaving out?

Trend

Trends: Terrorism overall declining in the EU, but religiously inspired attacks rising.

done using dplyr, data %>% group_by %>% summarize %>% ggplot Note: This is our workflow with Deferred Maintenance

Danger: Variance - regression to the mean

Tactic: Be Conservative

Ask yourself: Is this signal or noise?

Fun Quote: if you can always tell a valid trend, you should be trading on wall street, not telling data stories

Note: FiveThirtyEight has their own ggplot() theme. Maybe we should develop a CANA Theme

Debunking

Bechdel test: Examines how women are portrayed in movies. 1. Are there 2 or more women, 2. Do they talk to each other, 3. Do they talk to each other about something other than men?

Danger: Confirmation Bias - your own belief in the debunking action.

Tactic: Showcase Failures

Ask Yourself: how much do I want to debunk this?

Quote about p-hacking: Warning: This is evil (statistical) work. Do not go to the dark side. Do not try this at home)

Example of p-hacking: Eating potato chips leads to higher SAT Math scores

Forecast

You work a narrow path here

Danger: Overfitting

Tactic: Simulations and scenarios

Ask Yourself: Am I properly conveying the undcertainty in my model?