** Please click all the tabs (in sequence) to get the entire set of information in these pages. **

** To download code, see the instructions in Session 2: https://rpubs.com/hkb/DAX-Session2 **

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
options(scipen=10000000)
options(digits=3)
# install.packages("knitr")
library(knitr)

library(dplyr)
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(ggrepel)
library(boxoffice) # because the package is already installed

Session 5

Objectives

Some Example Visualizations

Let’s start with this very simple chart. What is it telling us? How can you improve it?

In the previous chart, each object of interest was a point. For instance, the “number of movies for each year in the dataset”. Now let us add one more dimension, so that each object of interest is a line (i.e., a number of points that are connected to each other). For instance, the total revenues for a movie at each rank in a specific year. That is, each single object of interest is a curve, or a series of line segments). And of course we want to display and compare multiple objects.

Because we have multiple dimensions, we have multiple ways in which we could organize them. Here is one additional alternative. What do you see here? How do you see decide which visualization is better?

Here’s a totally different way to display this kind of data. First, what is the visualization about (what is the information it is giving you)?

Is it an effective way of conveying the information in this particular instance?

Here is another example using the same type of visualization.

Movies Data Set

Lets load the data by making a call to boxofficemojo.com through the boxoffice() library. If, for some reason, you have not yet installed the package look through Session 2 notes and do it.

date.seq <- paste(2000:2019,"-12-31",sep="") 
# Fetch the data 
movies <- boxoffice(date = as.Date(date.seq), top_n = 50)

We’ll extend the data frame by adding - for each movie in the database - Year, and Rank within Year based on gross revenues.

movies <- movies %>% na.omit() %>% mutate(Year =  as.numeric(format(as.Date(date), "%Y"))) # na.omit() omits the rows with NA values; create new column Year. which extracts the Y (year) from the date

# Extract the Year, then Rank by Sales

movies <- movies %>% group_by(Year) %>% arrange(desc(total_gross)) %>%  mutate(rank=row_number())
LS0tCnRpdGxlOiAiU2Vzc2lvbiA1IgphdXRob3I6ICJIZW1hbnQgQmhhcmdhdmEiCmRhdGU6ICI3LzI5LzIwMjAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCioqIFBsZWFzZSBjbGljayBhbGwgdGhlIHRhYnMgKGluIHNlcXVlbmNlKSB0byBnZXQgdGhlIGVudGlyZSBzZXQgb2YgaW5mb3JtYXRpb24gaW4gdGhlc2UgcGFnZXMuICoqCgoqKiBUbyBkb3dubG9hZCBjb2RlLCBzZWUgdGhlIGluc3RydWN0aW9ucyBpbiBTZXNzaW9uIDI6IGh0dHBzOi8vcnB1YnMuY29tL2hrYi9EQVgtU2Vzc2lvbjIgKioKCgpgYGB7ciBzZXR1cH0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCB3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFKQpvcHRpb25zKHNjaXBlbj0xMDAwMDAwMCkKb3B0aW9ucyhkaWdpdHM9MykKYGBgCgpgYGB7ciBwYWNrYWdlc30KIyBpbnN0YWxsLnBhY2thZ2VzKCJrbml0ciIpCmxpYnJhcnkoa25pdHIpCgpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShnZ3Bsb3QyKQpsaWJyYXJ5KGdyaWRFeHRyYSkKbGlicmFyeShnZ3JlcGVsKQpsaWJyYXJ5KGJveG9mZmljZSkgIyBiZWNhdXNlIHRoZSBwYWNrYWdlIGlzIGFscmVhZHkgaW5zdGFsbGVkCmBgYAoKIyBTZXNzaW9uIDUKCiMjIE9iamVjdGl2ZXMKCiogVGVhbXMgYW5kIFBsYW5uaW5nIGZvciBZb3VyIEhhbmRzLW9uIERhdGEgQW5hbHlzaXMgUHJvamVjdAoKKiBWaXN1YWxpemluZyBNdWx0aXBsZSBEaW1lbnNpb25zIFRocm91Z2ggTGluZSBDaGFydHMsIEJ1YmJsZSBDaGFydHMsIFBvc2l0aW9uIFBsb3RzCgoqIEludGVycHJldGluZyBBbHRlcm5hdGl2ZSBWaXN1YWxpemF0aW9ucyBhbmQgUGlja2luZyB0aGUgKG9yIGEpIFJpZ2h0IE9uZQoKKiBTdG9yeXRlbGxpbmcgd2l0aCBEYXRhIC0gZnJvbSBGaW5kSG90ZWwuY29tLCBodHRwczovL2Jsb2cuZmluZGhvdGVsLm5ldC8yMDIwLzA2L2RhdGEtc3Rvcnktb2YtdGhlLXRyYXZlbC1tYXJrZXQtcmVjb3ZlcnkvICAKCiMjIFNvbWUgRXhhbXBsZSBWaXN1YWxpemF0aW9ucwoKTGV0J3Mgc3RhcnQgd2l0aCB0aGlzIHZlcnkgc2ltcGxlIGNoYXJ0LiBXaGF0IGlzIGl0IHRlbGxpbmcgdXM/IEhvdyBjYW4geW91IGltcHJvdmUgaXQ/IAoKYGBge3IgZmlnLmFsaWduPSJjZW50ZXIiLCBvdXQud2lkdGg9IjMwJSIsIGVjaG89RkFMU0V9CmtuaXRyOjppbmNsdWRlX2dyYXBoaWNzKCJJbWFnZXMvZmlnLW1vdmllLXNpbmdsZS1kb3RzLnBuZyIpCmBgYApJbiB0aGUgcHJldmlvdXMgY2hhcnQsIGVhY2ggb2JqZWN0IG9mIGludGVyZXN0IHdhcyBhIHBvaW50LiBGb3IgaW5zdGFuY2UsIHRoZSAibnVtYmVyIG9mIG1vdmllcyBmb3IgZWFjaCB5ZWFyIGluIHRoZSBkYXRhc2V0Ii4gTm93IGxldCB1cyBhZGQgb25lIG1vcmUgZGltZW5zaW9uLCBzbyB0aGF0IGVhY2ggb2JqZWN0IG9mIGludGVyZXN0IGlzIGEgbGluZSAoaS5lLiwgYSBudW1iZXIgb2YgcG9pbnRzIHRoYXQgYXJlIGNvbm5lY3RlZCB0byBlYWNoIG90aGVyKS4gRm9yIGluc3RhbmNlLCB0aGUgdG90YWwgcmV2ZW51ZXMgZm9yIGEgbW92aWUgYXQgZWFjaCByYW5rIGluIGEgc3BlY2lmaWMgeWVhci4gVGhhdCBpcywgZWFjaCBzaW5nbGUgb2JqZWN0IG9mIGludGVyZXN0IGlzIGEgY3VydmUsIG9yIGEgc2VyaWVzIG9mIGxpbmUgc2VnbWVudHMpLiBBbmQgb2YgY291cnNlIHdlIHdhbnQgdG8gZGlzcGxheSBhbmQgY29tcGFyZSBtdWx0aXBsZSBvYmplY3RzLiAKCmBgYHtyIGZpZy5hbGlnbj0iY2VudGVyIiwgb3V0LndpZHRoPSIzMCUiLCBlY2hvPUZBTFNFfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygiSW1hZ2VzL2ZpZy1tb3ZpZS1saW5lY2hhcnRzLWNvbG9yLnBuZyIpCmBgYAoKQmVjYXVzZSB3ZSBoYXZlIG11bHRpcGxlIGRpbWVuc2lvbnMsIHdlIGhhdmUgbXVsdGlwbGUgd2F5cyBpbiB3aGljaCB3ZSBjb3VsZCBvcmdhbml6ZSB0aGVtLiBIZXJlIGlzIG9uZSBhZGRpdGlvbmFsIGFsdGVybmF0aXZlLiBXaGF0IGRvIHlvdSBzZWUgaGVyZT8gSG93IGRvIHlvdSBzZWUgZGVjaWRlIHdoaWNoIHZpc3VhbGl6YXRpb24gaXMgYmV0dGVyPyAKCmBgYHtyIGZpZy5hbGlnbj0iY2VudGVyIiwgb3V0LndpZHRoPSIzMCUiLCBlY2hvPUZBTFNFfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygiSW1hZ2VzL2ZpZy1tb3ZpZS1saW5lY2hhcnRzLWF4ZXMucG5nIikKYGBgCkhlcmUncyBhIHRvdGFsbHkgZGlmZmVyZW50IHdheSB0byBkaXNwbGF5IHRoaXMga2luZCBvZiBkYXRhLiBGaXJzdCwgd2hhdCBpcyB0aGUgdmlzdWFsaXphdGlvbiBhYm91dCAod2hhdCBpcyB0aGUgaW5mb3JtYXRpb24gaXQgaXMgZ2l2aW5nIHlvdSk/IAoKCmBgYHtyIGZpZy5hbGlnbj0iY2VudGVyIiwgb3V0LndpZHRoPSIzMCUiLCBlY2hvPUZBTFNFfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygiSW1hZ2VzL2ZpZy1tb3ZpZS1idWJibGUteWVhci5wbmciKQpgYGAKCklzIGl0IGFuIGVmZmVjdGl2ZSB3YXkgb2YgY29udmV5aW5nIHRoZSBpbmZvcm1hdGlvbiBpbiB0aGlzIHBhcnRpY3VsYXIgaW5zdGFuY2U/IAoKSGVyZSBpcyBhbm90aGVyIGV4YW1wbGUgdXNpbmcgdGhlIHNhbWUgdHlwZSBvZiB2aXN1YWxpemF0aW9uLiAKCmBgYHtyIGZpZy5hbGlnbj0iY2VudGVyIiwgb3V0LndpZHRoPSIzMCUiLCBlY2hvPUZBTFNFfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygiSW1hZ2VzL2ZpZy1tb3ZpZS1idWJibGUucG5nIikKYGBgCgoKYGBge3IgZmlnLmFsaWduPSJjZW50ZXIiLCBvdXQud2lkdGg9IjMwJSIsIGVjaG89RkFMU0V9CmtuaXRyOjppbmNsdWRlX2dyYXBoaWNzKCJJbWFnZXMvZmlnLW1vdmllLXBvc2l0aW9uLXBsb3RzLnBuZyIpCmBgYAoKCgojIyBNb3ZpZXMgRGF0YSBTZXQKCkxldHMgbG9hZCB0aGUgZGF0YSBieSBtYWtpbmcgYSBjYWxsIHRvIGJveG9mZmljZW1vam8uY29tIHRocm91Z2ggdGhlIGJveG9mZmljZSgpIGxpYnJhcnkuIElmLCBmb3Igc29tZSByZWFzb24sIHlvdSBoYXZlIG5vdCB5ZXQgaW5zdGFsbGVkIHRoZSBwYWNrYWdlIGxvb2sgdGhyb3VnaCBTZXNzaW9uIDIgbm90ZXMgYW5kIGRvIGl0LiAKCmBgYHtyIG1vdmllcy5kYXRhfQpkYXRlLnNlcSA8LSBwYXN0ZSgyMDAwOjIwMTksIi0xMi0zMSIsc2VwPSIiKSAKIyBGZXRjaCB0aGUgZGF0YSAKbW92aWVzIDwtIGJveG9mZmljZShkYXRlID0gYXMuRGF0ZShkYXRlLnNlcSksIHRvcF9uID0gNTApCmBgYAoKV2UnbGwgZXh0ZW5kIHRoZSBkYXRhIGZyYW1lIGJ5IGFkZGluZyAtIGZvciBlYWNoIG1vdmllIGluIHRoZSBkYXRhYmFzZSAtIFllYXIsIGFuZCBSYW5rIHdpdGhpbiBZZWFyIGJhc2VkIG9uIGdyb3NzIHJldmVudWVzLiAKCmBgYHtyIG1vdmllcy5leHRlbmR9IAptb3ZpZXMgPC0gbW92aWVzICU+JSBuYS5vbWl0KCkgJT4lIG11dGF0ZShZZWFyID0gIGFzLm51bWVyaWMoZm9ybWF0KGFzLkRhdGUoZGF0ZSksICIlWSIpKSkgIyBuYS5vbWl0KCkgb21pdHMgdGhlIHJvd3Mgd2l0aCBOQSB2YWx1ZXM7IGNyZWF0ZSBuZXcgY29sdW1uIFllYXIuIHdoaWNoIGV4dHJhY3RzIHRoZSBZICh5ZWFyKSBmcm9tIHRoZSBkYXRlCgojIEV4dHJhY3QgdGhlIFllYXIsIHRoZW4gUmFuayBieSBTYWxlcwoKbW92aWVzIDwtIG1vdmllcyAlPiUgZ3JvdXBfYnkoWWVhcikgJT4lIGFycmFuZ2UoZGVzYyh0b3RhbF9ncm9zcykpICU+JSAgbXV0YXRlKHJhbms9cm93X251bWJlcigpKQoKYGBgCgo=