Graph Types

QUARC data visualization workshop

plot of chunk unnamed-chunk-2

Graphing by hand

But…why?

How would you graph this?

Number of emergency department visits due to energy drink consumption.

Year Number of ED visits
2007 10056
2009 13110
2011 20132

(2011 SAMHSA.GOV)

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-5

How about this?

Number of days energy drinks are consumed (in last 30 days) and score on Perceived Stress Scale.

Number of energy drinks in last 30 days Score on PSS
2 20
1 30
10 31
2 17
4 34

(Pettit & DeBarr, 2011)

plot of chunk unnamed-chunk-7

And this?

Emergency department visits due to adverse reactions from or abuse/misuse of energy drinks.

Year ED visits due to adverse reactions ED visits due to abuse/misuse
2007 6996 3060
2009 8798 4312
2011 14042 6090

(2011 SAMHSA.GOV)

plot of chunk unnamed-chunk-9

plot of chunk unnamed-chunk-10

Data preparation

Data preparation

Make sure you're data is “tidy”:

  • Observations are rows.
  • Variables are columns.
  • First row (and only first row) consists of variable names.
  • No extra stuff outside the rectangle.

Identifying variable types

Identifying variable types

  • Categorical (nominal, qualitative, factor)

Classifies data by category.

color, species, sex

  • Quantitative (numeric, scale, interval/ratio)

Numerical measurements, usually with meaningful units.

height, GDP, score

Identifying variable types

CAREFUL!

Numbers are not always quantitative.

Do you own a car?

0 = “No”, 1 = “Yes”

What is your zip code?

Data set

Risk factors associated with low infant birth weight, collected at Baystate Medical Center, Springfield, Mass during 1986.

189 women measured on 10 variables.

Data set

low age lwt race smoke ptl ht ui ftv bwt
0 19 182 2 0 0 0 1 0 2523
0 33 155 3 0 0 0 0 3 2551
0 20 105 1 1 0 0 0 1 2557
0 21 108 1 1 0 0 1 2 2594
0 18 107 1 1 0 0 1 0 2600
0 21 124 3 0 0 0 0 0 2622
0 22 118 1 0 0 0 0 1 2637
0 17 103 3 0 0 0 0 1 2637
0 29 123 1 1 0 0 0 1 2663
0 26 113 1 1 0 0 0 0 2665
0 19 95 3 0 0 0 0 0 2722
0 19 150 3 0 0 0 0 1 2733
0 22 95 3 0 0 1 0 0 2751
0 30 107 3 0 1 0 1 2 2750
0 18 100 1 1 0 0 0 0 2769
0 18 100 1 1 0 0 0 0 2769
0 15 98 2 0 0 0 0 0 2778
0 25 118 1 1 0 0 0 3 2782
0 20 120 3 0 0 0 1 0 2807
0 28 120 1 1 0 0 0 1 2821
0 32 121 3 0 0 0 0 2 2835
0 31 100 1 0 0 0 1 3 2835
0 36 202 1 0 0 0 0 1 2836
0 28 120 3 0 0 0 0 0 2863
0 25 120 3 0 0 0 1 2 2877
0 28 167 1 0 0 0 0 0 2877
0 17 122 1 1 0 0 0 0 2906
0 29 150 1 0 0 0 0 2 2920
0 26 168 2 1 0 0 0 0 2920
0 17 113 2 0 0 0 0 1 2920
0 17 113 2 0 0 0 0 1 2920
0 24 90 1 1 1 0 0 1 2948
0 35 121 2 1 1 0 0 1 2948
0 25 155 1 0 0 0 0 1 2977
0 25 125 2 0 0 0 0 0 2977
0 29 140 1 1 0 0 0 2 2977
0 19 138 1 1 0 0 0 2 2977
0 27 124 1 1 0 0 0 0 2922
0 31 215 1 1 0 0 0 2 3005
0 33 109 1 1 0 0 0 1 3033
0 21 185 2 1 0 0 0 2 3042
0 19 189 1 0 0 0 0 2 3062
0 23 130 2 0 0 0 0 1 3062
0 21 160 1 0 0 0 0 0 3062
0 18 90 1 1 0 0 1 0 3062
0 18 90 1 1 0 0 1 0 3062
0 32 132 1 0 0 0 0 4 3080
0 19 132 3 0 0 0 0 0 3090
0 24 115 1 0 0 0 0 2 3090
0 22 85 3 1 0 0 0 0 3090
0 22 120 1 0 0 1 0 1 3100
0 23 128 3 0 0 0 0 0 3104
0 22 130 1 1 0 0 0 0 3132
0 30 95 1 1 0 0 0 2 3147
0 19 115 3 0 0 0 0 0 3175
0 16 110 3 0 0 0 0 0 3175
0 21 110 3 1 0 0 1 0 3203
0 30 153 3 0 0 0 0 0 3203
0 20 103 3 0 0 0 0 0 3203
0 17 119 3 0 0 0 0 0 3225
0 17 119 3 0 0 0 0 0 3225
0 23 119 3 0 0 0 0 2 3232
0 24 110 3 0 0 0 0 0 3232
0 28 140 1 0 0 0 0 0 3234
0 26 133 3 1 2 0 0 0 3260
0 20 169 3 0 1 0 1 1 3274
0 24 115 3 0 0 0 0 2 3274
0 28 250 3 1 0 0 0 6 3303
0 20 141 1 0 2 0 1 1 3317
0 22 158 2 0 1 0 0 2 3317
0 22 112 1 1 2 0 0 0 3317
0 31 150 3 1 0 0 0 2 3321
0 23 115 3 1 0 0 0 1 3331
0 16 112 2 0 0 0 0 0 3374
0 16 135 1 1 0 0 0 0 3374
0 18 229 2 0 0 0 0 0 3402
0 25 140 1 0 0 0 0 1 3416
0 32 134 1 1 1 0 0 4 3430
0 20 121 2 1 0 0 0 0 3444
0 23 190 1 0 0 0 0 0 3459
0 22 131 1 0 0 0 0 1 3460
0 32 170 1 0 0 0 0 0 3473
0 30 110 3 0 0 0 0 0 3544
0 20 127 3 0 0 0 0 0 3487
0 23 123 3 0 0 0 0 0 3544
0 17 120 3 1 0 0 0 0 3572
0 19 105 3 0 0 0 0 0 3572
0 23 130 1 0 0 0 0 0 3586
0 36 175 1 0 0 0 0 0 3600
0 22 125 1 0 0 0 0 1 3614
0 24 133 1 0 0 0 0 0 3614
0 21 134 3 0 0 0 0 2 3629
0 19 235 1 1 0 1 0 0 3629
0 25 95 1 1 3 0 1 0 3637
0 16 135 1 1 0 0 0 0 3643
0 29 135 1 0 0 0 0 1 3651
0 29 154 1 0 0 0 0 1 3651
0 19 147 1 1 0 0 0 0 3651
0 19 147 1 1 0 0 0 0 3651
0 30 137 1 0 0 0 0 1 3699
0 24 110 1 0 0 0 0 1 3728
0 19 184 1 1 0 1 0 0 3756
0 24 110 3 0 1 0 0 0 3770
0 23 110 1 0 0 0 0 1 3770
0 20 120 3 0 0 0 0 0 3770
0 25 241 2 0 0 1 0 0 3790
0 30 112 1 0 0 0 0 1 3799
0 22 169 1 0 0 0 0 0 3827
0 18 120 1 1 0 0 0 2 3856
0 16 170 2 0 0 0 0 4 3860
0 32 186 1 0 0 0 0 2 3860
0 18 120 3 0 0 0 0 1 3884
0 29 130 1 1 0 0 0 2 3884
0 33 117 1 0 0 0 1 1 3912
0 20 170 1 1 0 0 0 0 3940
0 28 134 3 0 0 0 0 1 3941
0 14 135 1 0 0 0 0 0 3941
0 28 130 3 0 0 0 0 0 3969
0 25 120 1 0 0 0 0 2 3983
0 16 95 3 0 0 0 0 1 3997
0 20 158 1 0 0 0 0 1 3997
0 26 160 3 0 0 0 0 0 4054
0 21 115 1 0 0 0 0 1 4054
0 22 129 1 0 0 0 0 0 4111
0 25 130 1 0 0 0 0 2 4153
0 31 120 1 0 0 0 0 2 4167
0 35 170 1 0 1 0 0 1 4174
0 19 120 1 1 0 0 0 0 4238
0 24 116 1 0 0 0 0 1 4593
0 45 123 1 0 0 0 0 1 4990
1 28 120 3 1 1 0 1 0 709
1 29 130 1 0 0 0 1 2 1021
1 34 187 2 1 0 1 0 0 1135
1 25 105 3 0 1 1 0 0 1330
1 25 85 3 0 0 0 1 0 1474
1 27 150 3 0 0 0 0 0 1588
1 23 97 3 0 0 0 1 1 1588
1 24 128 2 0 1 0 0 1 1701
1 24 132 3 0 0 1 0 0 1729
1 21 165 1 1 0 1 0 1 1790
1 32 105 1 1 0 0 0 0 1818
1 19 91 1 1 2 0 1 0 1885
1 25 115 3 0 0 0 0 0 1893
1 16 130 3 0 0 0 0 1 1899
1 25 92 1 1 0 0 0 0 1928
1 20 150 1 1 0 0 0 2 1928
1 21 200 2 0 0 0 1 2 1928
1 24 155 1 1 1 0 0 0 1936
1 21 103 3 0 0 0 0 0 1970
1 20 125 3 0 0 0 1 0 2055
1 25 89 3 0 2 0 0 1 2055
1 19 102 1 0 0 0 0 2 2082
1 19 112 1 1 0 0 1 0 2084
1 26 117 1 1 1 0 0 0 2084
1 24 138 1 0 0 0 0 0 2100
1 17 130 3 1 1 0 1 0 2125
1 20 120 2 1 0 0 0 3 2126
1 22 130 1 1 1 0 1 1 2187
1 27 130 2 0 0 0 1 0 2187
1 20 80 3 1 0 0 1 0 2211
1 17 110 1 1 0 0 0 0 2225
1 25 105 3 0 1 0 0 1 2240
1 20 109 3 0 0 0 0 0 2240
1 18 148 3 0 0 0 0 0 2282
1 18 110 2 1 1 0 0 0 2296
1 20 121 1 1 1 0 1 0 2296
1 21 100 3 0 1 0 0 4 2301
1 26 96 3 0 0 0 0 0 2325
1 31 102 1 1 1 0 0 1 2353
1 15 110 1 0 0 0 0 0 2353
1 23 187 2 1 0 0 0 1 2367
1 20 122 2 1 0 0 0 0 2381
1 24 105 2 1 0 0 0 0 2381
1 15 115 3 0 0 0 1 0 2381
1 23 120 3 0 0 0 0 0 2410
1 30 142 1 1 1 0 0 0 2410
1 22 130 1 1 0 0 0 1 2410
1 17 120 1 1 0 0 0 3 2414
1 23 110 1 1 1 0 0 0 2424
1 17 120 2 0 0 0 0 2 2438
1 26 154 3 0 1 1 0 1 2442
1 20 105 3 0 0 0 0 3 2450
1 26 190 1 1 0 0 0 0 2466
1 14 101 3 1 1 0 0 0 2466
1 28 95 1 1 0 0 0 2 2466
1 14 100 3 0 0 0 0 2 2495
1 23 94 3 1 0 0 0 0 2495
1 17 142 2 0 0 1 0 0 2495
1 21 130 1 1 0 1 0 3 2495

Single variable

  • Single variables usually won't answer very interesting questions by themselves.
  • Graphs of single variables are often valuable for exploring your data, but generally not suitable for inclusion in the final product.

Single categorical variable

Single categorical variable

Mother's race (white, black, or other)

# A tibble: 189 x 1
   race 
   <fct>
 1 Black
 2 Other
 3 White
 4 White
 5 White
 6 Other
 7 White
 8 Other
 9 White
10 White
# ... with 179 more rows

Single categorical variable

  • Frequency table
Race Count
White 96
Black 26
Other 67

Single categorical variable

  • Bar chart

plot of chunk unnamed-chunk-14

Single categorical variable

  • Pie chart Danger! Danger!

plot of chunk unnamed-chunk-15

Single categorical variable

  • Relative frequency table
Race Count Percent
White 96 0.51
Black 26 0.14
Other 67 0.35

Single categorical variable

Single categorical variable

Single quantitative variable

Single quantitative variable

Infant birth weight in grams

# A tibble: 189 x 1
     bwt
   <int>
 1  2523
 2  2551
 3  2557
 4  2594
 5  2600
 6  2622
 7  2637
 8  2637
 9  2663
10  2665
# ... with 179 more rows

Single quantitative variable

  • Histogram

plot of chunk unnamed-chunk-18

Single quantitative variable

  • Tabular summaries

(“Five-number summary” or other quantiles)

Birth Weight (gm)
0% 709
5% 1801
25% 2414
50% 2977
75% 3487
95% 3997
100% 4990

Single quantitative variable

  • A bunch of other types I don't prefer:

boxplot, stem-and-leaf plot, dotplot

Multiple variables

There are at least six elements of a plot that can be assigned to variables:

  • x-axis (horizontal axis)
  • y-axis (vertical axis)
  • facets
  • color/fill
  • size
  • shape (e.g., dots vs crosses, solid vs dashed lines, etc.)

Two categorical variables

Two categorical variables

Mother's race and the number of previous premature labors.

# A tibble: 189 x 2
   race  ptl  
   <fct> <fct>
 1 Black 0    
 2 Other 0    
 3 White 0    
 4 White 0    
 5 White 0    
 6 Other 0    
 7 White 0    
 8 Other 0    
 9 White 0    
10 White 0    
# ... with 179 more rows

Two categorical variables

  • Contingency table (okay)
White Black Other
0 82 22 55
1 10 4 10
2 3 0 2
3 1 0 0
Total 96 26 67

Two categorical variables

  • Contingency table (better)
White Black Other
0 85.42 84.62 82.09
1 10.42 15.38 14.93
2 3.12 0.00 2.99
3 1.04 0.00 0.00
Total 100.00 100.00 100.00

Two categorical variables

  • Side-by-side bar chart (okay)

plot of chunk unnamed-chunk-23

Two categorical variables

  • Side-by-side bar chart (better)

plot of chunk unnamed-chunk-24

Two categorical variables

  • Stacked bar chart Danger! Danger!

plot of chunk unnamed-chunk-25

One categorical and one quantitative variable

One categorical and one quantitative variable

Mother's race and infant birth weight in grams

# A tibble: 189 x 2
   race    bwt
   <fct> <int>
 1 Black  2523
 2 Other  2551
 3 White  2557
 4 White  2594
 5 White  2600
 6 Other  2622
 7 White  2637
 8 Other  2637
 9 White  2663
10 White  2665
# ... with 179 more rows

One categorical and one quantitative variable

  • Side-by-side boxplot

plot of chunk unnamed-chunk-27

One categorical and one quantitative variable

  • Stacked histogram (okay)

plot of chunk unnamed-chunk-28

One categorical and one quantitative variable

  • Stacked histogram (better)

plot of chunk unnamed-chunk-29

Two quantitative variables

Two quantitative variables

Mother's age and infant birth weight

# A tibble: 189 x 2
     age   bwt
   <int> <int>
 1    19  2523
 2    33  2551
 3    20  2557
 4    21  2594
 5    18  2600
 6    21  2622
 7    22  2637
 8    17  2637
 9    29  2663
10    26  2665
# ... with 179 more rows

Two quantitative variables

  • Scatterplot

plot of chunk unnamed-chunk-31

Two quantitative variables

  • If one variable is ordered (like time) and there is only one observation of y for each x value, use a lineplot.

plot of chunk unnamed-chunk-32

Three or more variables

plot of chunk unnamed-chunk-33

Three or more variables

plot of chunk unnamed-chunk-34