John Boyd writes the first article titled, “How to Graph Badly or What NOT to Do”, with the aim that we gain an understanding of how to create ‘accurate and clear’ graphs by learning first how not to. Boyd says “much of the skill lies in NOT making mistakes.”
1.1-1.2
Boyd introduces the idea of chart junk, a concept familair to many artists——that is, chart junk in often not based in mathmateics or science, but rtaher it is fluff that is added to graphs in order to make them look more attractive. When such decordtion is labeled chart junk, it is unnecessary and distracting from the data displayed. On the contrary, it is also important to not have a graph too mundane and unappealing.
Ways in which chart junk can be added include the misuse of the following concepts:
Fonts
Moiré Shading
Pseudo 3-D
Hype, I: Overinterpretation
Hype, II: Graphical Carpet Bombing
1.3
Wainer’s Rules for Bad Graphs
- Show as little data as possible [minimize the data density]
- Hide what data you do show [minimize the ratio of data/ink]
- Show the data inaccurately [ignore the visual metaphor and randomize the connection between graphical elements and the numbers] 4.Use length as the visual metaphor when the area of two-dimensional icons is what is actually perceived 5.Graph data out of context [sparse captions and vague text] 6.Obfuscation #1: Change scales in mid-axis 7.Obfuscation #2: Emphasize the trivial [ignore the important]
- Obfuscation #3: Jiggle the baseline [use different axis ranges for two graphs which will be printed side-by-side and need to be compared] 9.Obfuscation #4: Alabama first! [Order the data by some criterion, such as alphabet- ical order, which is irrelevant to all of the interesting patterns in the data] 10.Obfuscation #5: Label: (a) illegibly (b) incompletely (c) incorrectly (d) ambiguously
- Obfuscation #6: More is murkier: (a) more decimal places and (b) more dimensions
- If it has been done well in the past, think of a new way to do it [New graph types are sometimes needed, but they require a lot of concentration from the reader, and should be used sparingly in good graphics]
1.4
High Data Density
- To my surprise, a theme that is reiterated again by Tufte is that, “high- density illustrations are good: it is possible to pack a tremendous amount of information in a single picture if it is designed carefully.”
- More over, section 1.4 makes a claim that low density graphs are not prefered, but like all things there is a time and a place if it is necessary for making a point.
1.5
Data-Hiding
- This section covers the concept of Data Hiding which is when too much complexity or extraneous elements/information content is obscured.
- As examples, Boyd shows two extremes——one which has far too much complexity and one that conveys nothing at all and another that shows lines grid lines far too bolded to see the data points on the grid. ### 1.5.1
Data-Hiding by Graphing Disparate Quantities on the Same Scale
- A great example is given in this section by Boyd, in which he shows two quantites of different scale projected onto the same graph. Though the quantities are different, they are seemingly the same or close to similar on the graph. He makes note, that the most effective way to project two similar data points of different scale, is to do it in seperate graphs.
1.6
1.7
1.8
Context-Free Data
- A good graph-with-caption will label all the elements of the graph and specify the key parameters of the numerical calculation or experiment that generated the graph.
- It is not sufficient to clearly label a curve “Supercalifrag- ilousness” and state in the caption that the “humdinger” was set at 360 “klingons”
- It has been argued that the primary function of graphs is to facilitate comparisons. A graph will fail through lack of context if its curves fail to make all the important comparisons.
- The most important characteristic of a good graph is that it show enough curves — and the article as a whole contain enough information — so that these kinds of questions can be answered. A graph showing error versus the number of points N is meaningless. A graph showing three curves for three different algorithms may make you immortal.
1.9
Label Woes
- Illegible labels usually result from the following causes: #1. Too small type size. #2. Poor placement. #3. Too few labels.
Chapter 2
The Gospel According to Tufte
Definition 1 (DATA-INK) The non-erasable core of a graphic.
Definition 2 (DATA-INK RATIO)
- data-ink ratio = data-ink/total ink used to print the graphic
- = the proportion of a graphic’s ink devoted to the non-redundant display of data-information
= 1.0 − proportion of a graphic that can be erased without loss of data-information
The concept of “data-ink” doesn’t completely solve the problem because the question of what actually is“non-erasable”dependsonboththeproblemandthereadership.
Tufte’s Five Laws of Data-Ink:
- Above all else show the data.
- Maximize the data-ink ratio
- Erase non-data-ink.
- Erase redundant data-ink.
- Revise and edit.
2.1.1 Show the data:
- Show the data: This is the most important part of the five maxims because the “data-ink” is undefined until one has first developed a purpose for the graphic.
2.1.2 Emphasize the Data:
- “Maximize the data-ink ratio” is a very general precept that motivates Tufte’s remaining three maxims.
- One way is to draw the data curves using thicker lines than the axis lines and frames.
- design touches do matter because scientists and engineers always have too many papers to read and too little time. A paper with clear, easy-to-decode graphs will make a much more lasting impression than one with confusing illustrations that require a lot of concentrated attention.
The key guidelines for a grid are:
- Don’t use a grid unless you really have to.
- Make the grid lines faint compared to the data-curves by drawing the grid as thin lines or dotted lines or by using a thick line for the data.
2.1.4 Erase Non-Data-Ink: Hurrah for Half-Framing!
- Tufte and some other technical artists such as Mary Helen Briscoe (1996) are advocates of another simplification: half-framing, which is to say, drawing only the the usual horizontal and vertical axes and omitting framing lines on top and the right.
2.1.7 Erasing: Eliminating the Graph Entirely
- Sometimes the best way to cope with a flawed graph is to eliminate the illustration entirely, and use a table instead.
2.2 High Data Density
Definition 3 (Date Density)
- data density = number of entries in data array/area of data graphic
2.3 Multifunctioning Graphical Elements
- “Mobilize every graphical element, perhaps several times over, to show the data.” Tufte(1983), pg. 139
- This good advice is not easily implemented, but on those rare occasions when it can, the results can be very useful to the reader.
2.4 Small Multiples or Animations-on-a-Page
- “Illustrations of postage-stamp size are indexed by category or a label, sequenced over time like the frames of a movie, or ordered by a quantitative variable not used in the single image. Information slices are positioned within the eyespan, so that viewers make comparisons at a glance — uninterrupted visual reasoning. Constancy of design puts the emphasis on changes in data, not changes in data frame.”
2.5 One Plus One Is Three
- What he means by this zen-like koan is that elements of a graph interact with one another. Because of this, a good graph is more than a simple addition of its various pieces.
2.7 Word-Labels Are Better Than Letter-Labels
- Another general theme emphasized by Tufte and others: make labels as clear and explicit on the graph itself. For very complicated figures, it may be necessary to use a legend box or to provide a verbal key to the lines (solid: gold, dashed: silver, dotted: brass, etc.) in the caption. However, as much as possible, one write out labels as whole words or numbers.
2.10 Wide is Wonderful: Aesthetics of Aspect Ratio
- Definition 5 (External Aspect Ratio of a Graph) The “external aspect ratio” of a graph is the ratio of its width to its height as it appears on the printed page:
RE ≡ width on page/height on page
2.12 Parallelism
- When multiple images are combined in parallel, the message is easier to grasp because the axes, format and so on are constant and only the data varies. Parallelism is closely related to “small multiples” and “animations-on-a-page”. In the most favorable cases, the parallelism implicit in these concepts can be translated into explicit geometry.
2.13 The Friendly Graphic
Here is a helpful image found at the end of the reading that summarizes the ways to create visualizations in a ‘friendly’ manner:
gozagsdatavizsp20::opendatavizfolder()