Graph types

Sean Raleigh
Westminster University

Data preparation

Data preparation

  • Make sure your data is “tidy”:
    • Observations are rows.
    • Variables are columns.
    • Every cell is a single value.
    • First row (and only first row) consists of variable names.
    • No extra stuff outside the rectangle.

Data preparation

Measurements for penguins in the Palmer Archipelago, Antarctica.

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
Adelie Torgersen 39.2 19.6 195 4675 male 2007
Adelie Torgersen 41.1 17.6 182 3200 female 2007
Adelie Torgersen 38.6 21.2 191 3800 male 2007
Adelie Torgersen 34.6 21.1 198 4400 male 2007
Adelie Torgersen 36.6 17.8 185 3700 female 2007
Adelie Torgersen 38.7 19.0 195 3450 female 2007
Adelie Torgersen 42.5 20.7 197 4500 male 2007
Adelie Torgersen 34.4 18.4 184 3325 female 2007
Adelie Torgersen 46.0 21.5 194 4200 male 2007
Adelie Biscoe 37.8 18.3 174 3400 female 2007
Adelie Biscoe 37.7 18.7 180 3600 male 2007
Adelie Biscoe 35.9 19.2 189 3800 female 2007
Adelie Biscoe 38.2 18.1 185 3950 male 2007
Adelie Biscoe 38.8 17.2 180 3800 male 2007

Identifying variable types

Identifying variable types

  • Categorical (qualitative, nominal, factor)
    • Classifies data by category.
    • color, species, sex
  • Numerical (quantitative, scale, interval/ratio)
    • Numerical measurements, usually with meaningful units.
    • height, GDP, score

Identifying variable types

CAREFUL!

Numbers are not always numerical.

  • Do you own a car?
    • 0 = “No”, 1 = “Yes”
  • What is your zip code?

Single variable

  • Single variables usually won’t answer very interesting questions by themselves.
  • Graphs of single variables are often valuable for exploring your data, but generally not suitable for inclusion in the final product.

Single categorical variable

Penguin species.

species
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie
Adelie

Single categorical variable

  • Frequency table
species n
Adelie 146
Chinstrap 68
Gentoo 119

Single categorical variable

  • Bar chart

Single categorical variable

  • Pie chart Danger! Danger!

Single categorical variable

  • Relative frequency table
species n percent
Adelie 146 0.4384384
Chinstrap 68 0.2042042
Gentoo 119 0.3573574

Single categorical variable

Click here to see how to improve a pie chart.

Single numerical variable

Penguin body mass in grams.

body_mass_g
3750
3800
3250
3450
3650
3625
4675
3200
3800
4400
3700
3450
4500
3325
4200
3400
3600
3800
3950
3800

Single numerical variable

  • Histogram

Single numerical variable

  • Tabular summaries

“Five-number summary”

Percentiles Body_mass
0% 2700
25% 3550
50% 4050
75% 4775
100% 6300

Single numerical variable

  • A bunch of other types I don’t prefer:

boxplot, stem-and-leaf plot, dotplot

Multiple variables

There are at least six elements of a plot that can be assigned to variables:

  • x-axis (horizontal axis)
  • y-axis (vertical axis)
  • facets
  • color/fill
  • size
  • shape (e.g., dots vs crosses, solid vs dashed lines, etc.)

Two categorical variables

Penguin species and island.

species island
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Torgersen
Adelie Biscoe
Adelie Biscoe
Adelie Biscoe
Adelie Biscoe
Adelie Biscoe

Two categorical variables

  • Contingency table (okay)
species Biscoe Dream Torgersen
Adelie 44 55 47
Chinstrap 0 68 0
Gentoo 119 0 0
Total 163 123 47

Two categorical variables

  • Contingency table (better)
species Biscoe Dream Torgersen
Adelie 27.0% 44.7% 100.0%
Chinstrap 0.0% 55.3% 0.0%
Gentoo 73.0% 0.0% 0.0%
Total 100.0% 100.0% 100.0%

Two categorical variables

  • Side-by-side bar chart (okay)

Two categorical variables

  • Side-by-side bar chart (better)

Two categorical variables

  • Stacked bar chart (okay)

Two categorical variables

  • Stacked bar chart (okay)

One categorical and one numerical variable

Penguin species and body mass (g).

species body_mass_g
Adelie 3750
Adelie 3800
Adelie 3250
Adelie 3450
Adelie 3650
Adelie 3625
Adelie 4675
Adelie 3200
Adelie 3800
Adelie 4400
Adelie 3700
Adelie 3450
Adelie 4500
Adelie 3325
Adelie 4200
Adelie 3400
Adelie 3600
Adelie 3800
Adelie 3950
Adelie 3800

One categorical and one numerical variable

  • Side-by-side boxplot (okay)

One categorical and one numerical variable

  • Stacked histogram (better)

One categorical and one numerical variable

  • Dynamite plot Danger! Danger!

One categorical and one numerical variable

  • Beeswarm (better)

Two numerical variables

Penguin flipper length (mm) and body mass (g).

flipper_length_mm body_mass_g
181 3750
186 3800
195 3250
193 3450
190 3650
181 3625
195 4675
182 3200
191 3800
198 4400
185 3700
195 3450
197 4500
184 3325
194 4200
174 3400
180 3600
189 3800
185 3950
180 3800

Two numerical variables

  • Scatterplot

Two numerical variables

  • If one variable is ordered (like time) and there is only one observation of y for each x value, use a lineplot.

Three or more variables

Three or more variables