Question 1: Read the first chapter (page 11-46) from Visual Revelations by Howard Wainer. Summarize the rules in your own words for creating bad graphics (Use bullet points).

Rule 1: Try your best to show the least amount of data possible. Wainer introduces the concept of Data Density and even gives us a metric to measure it with (Scale 0-50–with 50 being graphics that contain the highest density of data). So if you want really really bad graphics, you should shoot for a data density of 0.x, like Pravda.
Rule 2: If you absolutely must show some data, then make sure you hide it behind some chartjunk. Again, Wainer gives us a metric: data/ink ratio. If you want a bad graph, minimize this ratio.
Rule 3: Find a way to display data in the least accurate way possible–by messing with the visual metaphor. You can do this a number of ways:
- Improper Ordering
- Changhing the meaning mid-plot
- Hiding data in the scale
Rule 4: Ignore everything else but the order of elements in the plot, so that the data no longer makes sense.
Rule 5: Ignore or sinisterly manipulate context when plotting a graph in order to suit the message or agenda that you’d like to convey to your audience.
Rule 6: Change scales in the plot somewhere for no reason.
Rule 7: Zoom in on or highlight really irrelevant data, while conveniently overlooking really critical insights.
Rule 8: Make poor comparisons through the use of different bases or try to have your audience make comparisons from a curve (which humans can’t do). I know this is supposed to be our own words but I have to say it: “Jiggle the baseline”
Rule 9: Order the data alphabetically (or in some other way that isn’t dependant on some aspect of the dataset)
Rule 10: Just do an absolutely awful job of labeling. Examples include:
- Illegible Labels
- Incomplete Labels
- Ambiguous Labels
- Labels that are just straight up wrong
- Bonus: All of the above!
Rule 11: Confuse the audience by adding a lot more silly, non-impactful aspects such as decimal places, dimensions and effects.
Rule 12: If it’s ain’t broke, spend a lot of time and effort figuring out how to break it…and then publish the result.

Question 2: The following graphic shows the annual cost per Capita for care of Insane in Pittsburgh City Homes and Pennsylvania State Hospitals.

a. Explain which rule(s) of showing bad graphics has been used to create the above graph.

Rule 1: The only real, discrete, datapoints that this chart depicts are the city name and the magnitude of annual cost per capita for care of the insane. That’s it. Honestly, this data isn’t that interesting, and could be depicted in a descinding order table. Adding some contextual data such as the size of the city, region of the city or demographic data could make this chart much more powerful.
Rule 2: There’s a ton of chartjunk in this graph. Most of the ink in the data area is consumed by the buildings, and the title and citation. The rest is just empty space that is actaully quite distracting. Very low data/ink ratio.
Rule 3: This chart hides data in the scale. Specifically, the magnitude of difference between the cities is completely obscured by the inconsistent (and very large) houses. If the data labels weren’t there, you’d have no idea what the scale actually is, so why include it at all?
Rule 5: As mentioned previously, there is simply not enough context to make sense of this chart. At first glance, it seems like the cities of Warren and Norristown PA are really expensive places to care for Insane people relative to their peers. However, a cursory Google search of all 5 will tell you that they are virtually incomparable in terms of population, demographic and average income. Also, South Mountain appears to be a region, not even a county or city. So we are comparing apples to rambutan.
Rule 6: I think that the scale changes every time a new value is plotted.
Rule 9: The data is ordered in ascending order, which is technically according to an aspect of the data…But since ther isn’t enough data, I take issue with it. There should probably be a more sophisticated grouping of datapoints, based on contextually relevant information.
Rule 10: The labels are just awful. They don’t give you enough information, and what they give you is poorly displayed: I had to bring the page up to my face to read the city names, and the $ values hover at inconsistent heights to make room for the title.

b. Suggest another way of showing the data in this graph and create it using what we learned in class so far (ggplot2).

So I’d like to do the following in my new graph:
- Increase the amount of data displayed (Rule 1)
- Reduce the chartjunk (Rule 2)
- Fix the scale, and keep it consistent (Rule 3, Rule 6)
- Add context (Rule 5)
- Order the data for better audience understanding (Rule 9)
- Add appropriate labels (Rule 10)

Since we’re not working with time-series data…A souped up bar chart should do the trick.

Some setup is needed! (Note: dataset is loaded into RStudio from a custom .csv that I created)

library(ggplot2)
library(gcookbook)
library(gridExtra)

## Loading required package: grid

I’m going to need to label currency, but still treat it as a number so ggplot2 plots the scale properly. So I’ll use this function:

printCurrency <- function(value, currency.sym="$", digits=0, sep=",", decimal=".") {
  paste(
        currency.sym,
        formatC(value, format = "f", big.mark = sep, digits=digits, decimal.mark=decimal),
        sep=""
  )
}

Okay! Finally we’re ready. The below code chuck creates an ordered bar chart which groups the dataset by population size, then sorts them smallest to largest. After that, I add color to make the grouping and sorting more distinct. Finally, labels and a title! Something interesting emerges…As the population size increases, it generally gets cheaper to care for the Insane. But–I would need many more obeservations in each category to prove this trend visually.

ggplot(data, aes(x = reorder(City, Population..2013.), y= CPC, fill = Type)) + geom_bar(stat = "identity") + xlab("\nLocation (Smallest to Largest Population)") +   ylab("CPC") + ggtitle("Cost per Capita for Care of the Insane in PA\n") + theme(plot.title = element_text(lineheight=.8, face="bold"))  + scale_fill_manual(values=c("#33a02c","#b2df8a","#1f78b4","#a6cee3"), name="Population Size",breaks=c("Region (N/A)","Small City (<10,000)","Medium City (10,000-50,000)","Large City (50,000+)")) + geom_text(aes(label = printCurrency(CPC)), vjust = 0)

Homework 2

Question 1: Read the first chapter (page 11-46) from Visual Revelations by Howard Wainer. Summarize the rules in your own words for creating bad graphics (Use bullet points).

Question 2: The following graphic shows the annual cost per Capita for care of Insane in Pittsburgh City Homes and Pennsylvania State Hospitals.