(Slide 1)
Lecture 4: Displaying & Describing Data
Chapters 2 & 3, Whitlock and Schulter, 2nd Ed
Key questions:
What makes a good graph?
What makes a bad graph?
Types of graphs for types of different data
Describing “central tendency”: mean & median
Describing variation: variance & stdev
Basics Rules of plotting data
1) Show the raw data if possible
2) Show distributional info if possible
3) ALWAYS Include error bars around means
4) ALWAYS Include error bars around means
5) Make patterns in the data easy to see
6) Represent magnitude honestly
7) Draw graphical elements clearly
8) Include a legend and label things clearly
Plotting Rule #1: Show the raw data
For the next several slides we will consider this classic dataset made popular by R.A. Fisher

Plotting Rule #1: Show the raw data
Compare the information content of these two graphs

Plotting Rule #1: Show the Data
Multiple datasets can result in the exact same barplot & errorbars
All of these datasets result in the exact same mean, SD, and SE.
Bar plots - even with error bars - therefore can reveal very little about the data
Plotting Rule #3 & 4: ALWAYS use error bars for means
-Means MUST always have an estimate of uncertainty around them
-The range doesn’t count!
-Typically use “standard error” OR “confidence interval”
-Rarely use “standard deviaiton”

Plotting Rule #5: Make Patterns Easy to See
Keep it as simple as possible
Add labels, annotations etc. to the plot
Use both color AND pattern/shape to distinguish groups
Avoid 3D
Use color-blind friendly palettes
Don’t use most of the fancy stuff in Excel!
Plot from Susan Kalisz et al 2014 PNAS
It is very difficult to determine the actual values of this plot

Plotting Rule #5: Make Patterns Easy to See example
This plot has
-what the error bars are is clearly indicated in the plot
-Sample size is also indicated
-This information could be in a caption, but is even easier to find in plot
-All of this can be done in R, but often easier to annotate plots in Power Point

Plotting Rule #6: Represent Magnitudes Honestly
This plot emphasizes a certain aspect of data
Some critics think this is desceptive

What else is missing?
Original Paper: aeaweb.org/articles?id=10.1257/jep.25.1.159 Oreopoulos & Salvanes. 2011. Priceless: The Nonpecuniary Benefits of Schooling. Jrn Econ Persp.
Critiques: econlog.econlib.org/archives/2011/07/job_satisfactio.html scienceblogs.com/principles/2011/07/10/great-moments-in-deceptive-gra/ See statisticshowto.com/misleading-graphs/ for some more interesting examples
(Slide 8)
Histograms of discrete data
Number of extinct birds from each Hawaiian island
-The shows the frequency of each category
-Error bars are not possible w/these data
-This type of plot is useful for general descriptions of data

Histograms of CONTINUOUS data
Histograms are very useful when you
-explore datasets you are seeing for the 1st time
-display data that is skewed or oddly shaped
-Convey similar idea as boxplot
-vertical lines often added to show mean, median, etc
-made with “hist()”" in R
This plot shows the distribution of birthweights from a set of babies born in a hospital. Jude Hendrik Brouwer is shown with the red line.
How could this plot be improved?
Graphing association between “categorical”" variables
aka, Turning a contingency table into a graph (Example 2.3A in textbook)

How could this plot be improved?
What would we do if we had 3+ years of data?
From: Chitwood et al ’15 Do Biological & Bedsite Characteristics Influence Surv. of Neonatal White-Tailed Deer?. PLoS ONE doi:10.1371/journal.pone.0119070
2 Continuous variables: Scatter Plots
Scatter plots are standard way of for plotting two continous variables against each other
Frequently used to visualize data for “regression” analysis
We frequently take the log of numerical data - more on this later

Numerical responses vs categorical variables
Frequently used w/ t-tests, “ANOVA”
X-axis is some kind of category that groups the data (treatments, years, species)
Plotting raw data useful when there is a small to moderate amount of data (say <50)
Box plots better when ther is lots of data
Both mean and median can be used
Sometimes plot raw data along w/error bars

Trends over time
The slope of “best fit” line through these points is an estimate the mean rate of change in the num. of birds over time.
This relates to the topic known as “Regression”

Song Sparrow (Melospiza melodia) counts in Darrtown, OH, USA. From USGUS Breeding Bird Survey (BBS)
Maps can display data
Maps ’r pretty
R can make maps
Tables vs. Graphs?
Graphs are generally better than tables
Tables should follow same principals as graphs
Tables are best for highly detailed info (eg p values, t statistics)
Many papers now include tables of raw data in an appendix
See Gelman et al. 2002. Let’s Practice What We Preach: Turning Tables into Graphs Am. Stat.
Measures of variation
Misc. References
Websites www.biostat.wisc.edu/~kbroman/topten_worstgraphs/ www.americanscientist.org/issues/pub/population-growth-technology-and-tricky-graphs
Papers Wainer. 1984. How to Display Data Badly. Am. Statistician.