Rule 1: Try your best to show the least amount of data possible. Wainer introduces the concept of Data Density and even gives us a metric to measure it with (Scale 0-50–with 50 being graphics that contain the highest density of data). So if you want really really bad graphics, you should shoot for a data density of 0.x, like Pravda.
Rule 2: If you absolutely must show some data, then make sure you hide it behind some chartjunk. Again, Wainer gives us a metric: data/ink ratio. If you want a bad graph, minimize this ratio.
Rule 4: Ignore everything else but the order of elements in the plot, so that the data no longer makes sense.
Rule 5: Ignore or sinisterly manipulate context when plotting a graph in order to suit the message or agenda that you’d like to convey to your audience.
Rule 6: Change scales in the plot somewhere for no reason.
Rule 7: Zoom in on or highlight really irrelevant data, while conveniently overlooking really critical insights.
Rule 8: Make poor comparisons through the use of different bases or try to have your audience make comparisons from a curve (which humans can’t do). I know this is supposed to be our own words but I have to say it: “Jiggle the baseline”
Rule 9: Order the data alphabetically (or in some other way that isn’t dependant on some aspect of the dataset)
Rule 11: Confuse the audience by adding a lot more silly, non-impactful aspects such as decimal places, dimensions and effects.
Rule 12: If it’s ain’t broke, spend a lot of time and effort figuring out how to break it…and then publish the result.
a. Explain which rule(s) of showing bad graphics has been used to create the above graph.
Rule 1: The only real, discrete, datapoints that this chart depicts are the city name and the magnitude of annual cost per capita for care of the insane. That’s it. Honestly, this data isn’t that interesting, and could be depicted in a descinding order table. Adding some contextual data such as the size of the city, region of the city or demographic data could make this chart much more powerful.
Rule 2: There’s a ton of chartjunk in this graph. Most of the ink in the data area is consumed by the buildings, and the title and citation. The rest is just empty space that is actaully quite distracting. Very low data/ink ratio.
Rule 3: This chart hides data in the scale. Specifically, the magnitude of difference between the cities is completely obscured by the inconsistent (and very large) houses. If the data labels weren’t there, you’d have no idea what the scale actually is, so why include it at all?
Rule 5: As mentioned previously, there is simply not enough context to make sense of this chart. At first glance, it seems like the cities of Warren and Norristown PA are really expensive places to care for Insane people relative to their peers. However, a cursory Google search of all 5 will tell you that they are virtually incomparable in terms of population, demographic and average income. Also, South Mountain appears to be a region, not even a county or city. So we are comparing apples to rambutan.
Rule 6: I think that the scale changes every time a new value is plotted.
Rule 9: The data is ordered in ascending order, which is technically according to an aspect of the data…But since ther isn’t enough data, I take issue with it. There should probably be a more sophisticated grouping of datapoints, based on contextually relevant information.
Rule 10: The labels are just awful. They don’t give you enough information, and what they give you is poorly displayed: I had to bring the page up to my face to read the city names, and the $ values hover at inconsistent heights to make room for the title.
b. Suggest another way of showing the data in this graph and create it using what we learned in class so far (ggplot2).
Since we’re not working with time-series data…A souped up bar chart should do the trick.
library(ggplot2)
library(gcookbook)
library(gridExtra)
## Loading required package: grid
printCurrency <- function(value, currency.sym="$", digits=0, sep=",", decimal=".") {
paste(
currency.sym,
formatC(value, format = "f", big.mark = sep, digits=digits, decimal.mark=decimal),
sep=""
)
}
ggplot(data, aes(x = reorder(City, Population..2013.), y= CPC, fill = Type)) + geom_bar(stat = "identity") + xlab("\nLocation (Smallest to Largest Population)") + ylab("CPC") + ggtitle("Cost per Capita for Care of the Insane in PA\n") + theme(plot.title = element_text(lineheight=.8, face="bold")) + scale_fill_manual(values=c("#33a02c","#b2df8a","#1f78b4","#a6cee3"), name="Population Size",breaks=c("Region (N/A)","Small City (<10,000)","Medium City (10,000-50,000)","Large City (50,000+)")) + geom_text(aes(label = printCurrency(CPC)), vjust = 0)