Assignment04 Charts

Variable Width Column Chart

Using mekko package

  1. Load package
  2. Create data (from cereal dataset)
  3. Plot https://rdrr.io/cran/mekko/man/barmekko.html
  4. Final edits -remove x ticks and titles with element_blank (https://stackoverflow.com/questions/35090883/remove-all-of-x-axis-labels-in-ggplot)
library(mekko)
library(ggplot2)

nutritioninfo <- data.frame(
  brand = c('All_Bran', 'Corn_Flakes', 'Cheerios', 'Fruit_Loops', 'Crispix'),
  calories = c(70, 100, 110, 110, 110),
  rating = c(59.4, 45.9, 50.8, 32.2, 46.9)
)

barmekko(nutritioninfo, brand, calories, rating) +
  theme(axis.ticks.x = element_blank(), axis.text.x = element_blank()) + labs(x = "Consumer Rating", y = "Calories per Serving", title = "Calories and Rating for Cereal Brands", caption = "by: Genna Campain") + theme(text= element_text(family = "Times New Roman"))

Bar Chart

Use stat = identity to indicate that I will be giving the values for the y axis (otherwise R wants to calculate them) Rotate whole chart

library(ggplot2)

sugarsinfo <- data.frame(
  brand = c('All_Bran', 'Corn_Flakes', 'Cheerios', 'Fruit_Loops', 'Crispix'),
  sugars = c(5, 2, 1, 13, 3)
)

p <-ggplot(data = sugarsinfo, mapping = aes(x = brand, y = sugars, fill = brand))
p + geom_bar(stat = 'identity') +
  labs(x = "Brand Name", y = "Grams of Sugar", title = "Grams of Sugar In Cereal Brands") + 
  theme(text = element_text(family = "Times New Roman")) +
  coord_flip()

Column Chart

Need to change data format to long using tidyr package Use position argument to create bars that are next to each other

library(ggplot2)

nutrition <- data.frame(
  newbrand = c("All_Bran", "Cheerios", "Fruit_Loops"),
  sugars = c(5, 1, 13),
  protein = c(4, 6, 2),
  fat = c(1, 2, 2),
  fiber = c(9, 2, 1)
)

library(tidyr)
nutrition_long <- gather(nutrition, variable, nutrients, -newbrand )

p <- ggplot(nutrition_long, mapping = aes(x = newbrand, y = nutrients, fill = variable))
p + geom_col(position = position_dodge()) +
  labs(x = "Brand Name", y = "Nutrient Values", title = "Nutrition Information for Various Cereal Brands") +
  theme(text = element_text(family = "Times New Roman"))

Table (with Embedded Charts)

Use formattable package to make more create tables (https://renkun-ken.github.io/formattable/)

library(formattable)
cereal <- read.csv("~/Downloads/cereal.csv")

df <- data.frame(
  name = cereal$name,
  calories = cereal$calories,
  protein = cereal$protein,
  fat = cereal$fat,
  fiber = cereal$fiber,
  rating = cereal$rating
)

formattable(df, list(area(col = c(fiber))~normalize_bar("pink", 0.2)))
name calories protein fat fiber rating
100% Bran 70 4 1 10.0 68.40297
100% Natural Bran 120 3 5 2.0 33.98368
All-Bran 70 4 1 9.0 59.42551
All-Bran with Extra Fiber 50 4 0 14.0 93.70491
Almond Delight 110 2 2 1.0 34.38484
Apple Cinnamon Cheerios 110 2 2 1.5 29.50954
Apple Jacks 110 2 0 1.0 33.17409
Basic 4 130 3 2 2.0 37.03856
Bran Chex 90 2 1 4.0 49.12025
Bran Flakes 90 3 0 5.0 53.31381
Cap’n’Crunch 120 1 2 0.0 18.04285
Cheerios 110 6 2 2.0 50.76500
Cinnamon Toast Crunch 120 1 3 0.0 19.82357
Clusters 110 3 2 2.0 40.40021
Cocoa Puffs 110 1 1 0.0 22.73645
Corn Chex 110 2 0 0.0 41.44502
Corn Flakes 100 2 0 1.0 45.86332
Corn Pops 110 1 0 1.0 35.78279
Count Chocula 110 1 1 0.0 22.39651
Cracklin’ Oat Bran 110 3 3 4.0 40.44877
Cream of Wheat (Quick) 100 3 0 1.0 64.53382
Crispix 110 2 0 1.0 46.89564
Crispy Wheat & Raisins 100 2 1 2.0 36.17620
Double Chex 100 2 0 1.0 44.33086
Froot Loops 110 2 1 1.0 32.20758
Frosted Flakes 110 1 0 1.0 31.43597
Frosted Mini-Wheats 100 3 0 3.0 58.34514
Fruit & Fibre Dates; Walnuts; and Oats 120 3 2 5.0 40.91705
Fruitful Bran 120 3 0 5.0 41.01549
Fruity Pebbles 110 1 1 0.0 28.02576
Golden Crisp 100 2 0 0.0 35.25244
Golden Grahams 110 1 1 0.0 23.80404
Grape Nuts Flakes 100 3 1 3.0 52.07690
Grape-Nuts 110 3 0 3.0 53.37101
Great Grains Pecan 120 3 3 3.0 45.81172
Honey Graham Ohs 120 1 2 1.0 21.87129
Honey Nut Cheerios 110 3 1 1.5 31.07222
Honey-comb 110 1 0 0.0 28.74241
Just Right Crunchy Nuggets 110 2 1 1.0 36.52368
Just Right Fruit & Nut 140 3 1 2.0 36.47151
Kix 110 2 1 0.0 39.24111
Life 100 4 2 2.0 45.32807
Lucky Charms 110 2 1 0.0 26.73451
Maypo 100 4 1 0.0 54.85092
Muesli Raisins; Dates; & Almonds 150 4 3 3.0 37.13686
Muesli Raisins; Peaches; & Pecans 150 4 3 3.0 34.13976
Mueslix Crispy Blend 160 3 2 3.0 30.31335
Multi-Grain Cheerios 100 2 1 2.0 40.10596
Nut&Honey Crunch 120 2 1 0.0 29.92429
Nutri-Grain Almond-Raisin 140 3 2 3.0 40.69232
Nutri-grain Wheat 90 3 0 3.0 59.64284
Oatmeal Raisin Crisp 130 3 2 1.5 30.45084
Post Nat. Raisin Bran 120 3 1 6.0 37.84059
Product 19 100 3 0 1.0 41.50354
Puffed Rice 50 1 0 0.0 60.75611
Puffed Wheat 50 2 0 1.0 63.00565
Quaker Oat Squares 100 4 1 2.0 49.51187
Quaker Oatmeal 100 5 2 2.7 50.82839
Raisin Bran 120 3 1 5.0 39.25920
Raisin Nut Bran 100 3 2 2.5 39.70340
Raisin Squares 90 2 0 2.0 55.33314
Rice Chex 110 1 0 0.0 41.99893
Rice Krispies 110 2 0 0.0 40.56016
Shredded Wheat 80 2 0 3.0 68.23588
Shredded Wheat ’n’Bran 90 3 0 4.0 74.47295
Shredded Wheat spoon size 90 3 0 3.0 72.80179
Smacks 110 2 1 1.0 31.23005
Special K 110 6 0 1.0 53.13132
Strawberry Fruit Wheats 90 2 0 3.0 59.36399
Total Corn Flakes 110 2 1 0.0 38.83975
Total Raisin Bran 140 3 1 4.0 28.59278
Total Whole Grain 100 3 1 3.0 46.65884
Triples 110 2 1 0.0 39.10617
Trix 110 1 1 0.0 27.75330
Wheat Chex 100 3 1 3.0 49.78744
Wheaties 100 3 1 3.0 51.59219
Wheaties Honey Gold 110 2 1 1.0 36.18756

Circular Area/Radar Chart

*Note: this chart type has come under fire lately for readability issues, it may be better to use lollipop charts instead (see https://www.data-to-viz.com/caveat/spider.html) Data format very important -each row is an entity -each column is a quantitative variable -the first two rows provide min/max value for the variable (5 values for x (max and min) and 0-10 range for y) “rgb” argument describes input for red, green, blue color values and transparency (1 = opaque) Reference https://www.rdocumentation.org/packages/fmsb/versions/0.7.0/topics/radarchart for more information

library (fmsb)

data <- data.frame(
  "2000" = 28702,
  "2001" = 28726,
  "2002" = 28977,
  "2003" = 29459,
  "2004" = 30200,
  "2005" = 30842,
  "2006" = 31358,
  "2007" = 31655,
  "2008" = 31251,
  "2009" = 29899,
  "2010" = 30491
)
data <- rbind(rep(32000,10) , rep(28500, 10) , data)
radarchart(data , 
           pcol=(rgb(0.2,0.9,0.5,1.0)), 
           pfcol=rgb(0.2,0.9,0.5,0.5) , 
           title = "United States GDP 20000-2010", 
           seg = 5 
           )

data <- data.frame(
  protein = 4,
  fat = 1,
  fiber = 10,
  carbs = 5,
  sugars = 6
)
data <- rbind(rep(10, 5) , rep(0, 5) , data)
radarchart(data , 
           pcol="red", 
           title = "Nutrients in 100% Bran Cereal"
           )

Simple Line Chart

library(tidyverse)
## ── Attaching packages ─────────────────── tidyverse 1.3.0 ──
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ readr   1.3.1     ✓ stringr 1.4.0
## ✓ purrr   0.3.4     ✓ forcats 0.5.0
## ── Conflicts ────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(extrafont)
## Registering fonts with R
data <- data.frame(
  year = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010),
  GDP = c(28702, 28726, 28977, 29459, 30200, 30842, 31358, 31655, 31251, 29899, 30491)
)

p <-ggplot(data, mapping = aes(x = year, y = GDP))
p + geom_line(color = "purple") +
  labs(x = "Year", y = "GDP", title = "United States GDP 2000-2010") + 
  scale_y_log10(labels = scales::dollar) +
  theme(text = element_text(family = "Impact")) +
  theme_bw()

Column Histogram

Changing axis labels: https://www.datanovia.com/en/blog/ggplot-axis-ticks-set-and-rotate-text-labels/ Center text using ggtitle to produce title and then theme with arguments to center

library(tidyverse)
library(extrafont)

data <- data.frame(
  cerealname = c('All_Bran', 'Corn_Flakes', 'Cheerios', 'Fruit_Loops', 'Crispix'),
  sugars = c(5, 2, 1, 13, 3)
)

p <- ggplot(data, mapping = aes(x = cerealname, y = sugars, fill = cerealname))
p + geom_histogram(stat = 'identity') +
  theme(axis.text.x = element_text(angle = 90)) +
  theme_bw() +
  theme(text = element_text(family = "Times New Roman")) + 
  labs(x = "Cereal Name", y = "Grams of Sugar") +
  theme(legend.position = "none") +
  ggtitle("Sugar Content of Cereal Brands") + 
  theme(plot.title = element_text(hjust = 0.5))
## Warning: Ignoring unknown parameters: binwidth, bins, pad

data <- data.frame(
  year = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010),
  GDP = c(28702, 28726, 28977, 29459, 30200, 30842, 31358, 31655, 31251, 29899, 30491)
)
p <- ggplot(data, mapping = aes(x = year, y = GDP))
p + geom_histogram(stat = 'identity', fill = "green") +
  ggtitle("United States GDP 2000-2010") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(x = "Year", y = "GDP") + 
  scale_y_continuous(labels = scales::dollar) +
  theme_bw() +
  theme(text = element_text(family = "Times New Roman"))
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Line Histogram

Useful for comparing the distribution across levels of a categorical variable Rnorm argument generates data with a normal distribution of ()

library(tidyverse)

data <- data.frame(
  sugars = c(rnorm(40), rnorm(40, mean = 6))
)

p <- ggplot(data, mapping = aes(x = sugars))
p + geom_density() +
  theme(axis.text.x = element_text(angle = 90)) +
  theme_bw() +
  theme(text = element_text(family = "Times New Roman")) + 
  labs(x = "Grams of Sugar", y = "Density") +
  theme(legend.position = "none") +
  ggtitle("Most Common Sugar Contents Among Cereal Brands") + 
  theme(plot.title = element_text(hjust = 0.5))

Scatter Chart

library(tidyverse)

data <- data.frame(
  sugar = cereal$sugars,
  fiber = cereal$fiber
)

p <- ggplot(data, mapping = aes(x = fiber, y = sugar))
p + geom_point(color = "purple") +
  theme_bw() +
  labs(x = "Grams of Fiber", y = "Grams of Sugar", title = "Sugar and Fiber Content of Breakfast Cereals") +
  theme(text = element_text(family = "Times New Roman"))

Stacked Column Chart with Subcomponents

Scaled to one; https://stackoverflow.com/questions/9563368/create-stacked-barplot-where-each-stack-is-scaled-to-sum-to-100

General chart including data formatting: https://www.datanovia.com/en/blog/how-to-create-a-ggplot-stacked-bar-chart-2/

Use fill argument to change label for map legend

library(tidyverse)

data <- data.frame(
  nutrient = rep(c("protein", "fat", "carbohydrates"), each = 7),
  cereal = c("Apple Cinnamon Cheerios", "Apple Jacks", "Basic 4", "Bran Chex", "Bran Flakes", "Can'n'Crunch", "Cheerios"),
  cals = c(8, 8, 12, 8, 12, 4, 24, 18, 0, 18, 9, 0, 18, 18,44, 44, 72, 60, 52, 48, 68 )
)
p <- ggplot(data, mapping = aes(x = cereal, y = cals, fill = nutrient))
p + geom_col(position = "fill", stat = "identity") +
  labs(x = "Brand Name", y = "Percent of Calories", title = "Caloric Breakdown of Cereal Brands", fill = "Nutrient") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme(text = element_text(family = "Times New Roman"))
## Warning: Ignoring unknown parameters: stat

3D Area Chart