Story Leads data follows Use rigorous but interminable methods Be accurate, be fast, be transparent
Tidyverse is the tool of choice for data.
There is a fivethityeight r package.
New Data Story danger: Triviality
Remedy: Simple Summaries
Ask yourself: Is this data meaningful to others?
Danger: Spurious Result
Tactic: Characters - talk about who the outlier is: who is it, what company is it, etc.
Profile one of the characters from the ourlier group, then introduce the statistics
Ask yourself: Is this really so different?
Danger: Oversimplification
Tactic: Modeling
Ask Yourself: What Variables am I leaving out?
Trends: Terrorism overall declining in the EU, but religiously inspired attacks rising.
done using dplyr, data %>% group_by %>% summarize %>% ggplot Note: This is our workflow with Deferred Maintenance
Danger: Variance - regression to the mean
Tactic: Be Conservative
Ask yourself: Is this signal or noise?
Fun Quote: if you can always tell a valid trend, you should be trading on wall street, not telling data stories
Note: FiveThirtyEight has their own ggplot() theme. Maybe we should develop a CANA Theme
Bechdel test: Examines how women are portrayed in movies. 1. Are there 2 or more women, 2. Do they talk to each other, 3. Do they talk to each other about something other than men?
Danger: Confirmation Bias - your own belief in the debunking action.
Tactic: Showcase Failures
Ask Yourself: how much do I want to debunk this?
Quote about p-hacking: Warning: This is evil (statistical) work. Do not go to the dark side. Do not try this at home)
Example of p-hacking: Eating potato chips leads to higher SAT Math scores
You work a narrow path here
Danger: Overfitting
Tactic: Simulations and scenarios
Ask Yourself: Am I properly conveying the undcertainty in my model?