In this chapter we discussed why well-designed data graphics are important and we described a taxonomy for understanding their composition.
The objective of this assignment is for you to understand what characteristics you can use to develop a great data graphic.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.
Question #1
Answer the following questions for this graphic Relationship between ages and psychosocial maturity
knitr::include_graphics("http://ars.els-cdn.com/content/image/1-s2.0-S1043276005002602-gr2.jpg")
The visual cues of this graph are position, color, size or length, its shape, and direction. The coordinate system is Cartesian - a pair of (x, y), where x is ages and y is the age of psychosocial maturation. The x-axis has an ordinal scale, starting from 20,000 years ago, and till present days. The y-axis has a ratio scale, where age of psychosocial maturation is measured as age in years, varying from 0 to more 20 years.
There are two variables depicted on this graph: psychosocial maturation and menarche. Color disinguishes each variable from each other. The bars on the graph may mean their lengths in the given ages period. The direction of this graph tends to describe the pattern of change of these variables over years.
Using different colors is a good approach, but it is hardly readable due to the position visual cue. Also, this visualization does not allow us to compare the length of each bar accurately. The title of the graph is better to locate at the top of the graph, in the central position. Y-axis should have more scale values so to compare bars more properly.
Question #2
Answer the following questions for this graphic World’s top 10 best selling cigarette brands 2004-2007
knitr::include_graphics("https://farm3.static.flickr.com/2695/4149541331_482fbb0aaf_o.png")
The visual cues include color and length. The coordinate system is Cartesian. The y-axis is categorical, showing cigarette brand names. The x-axis is numerical, measuring sales in billions of dollars.
There are two variables: cigarette brand and sales. Each horizontal colored bar represents a particular brand (color distinguishes the brands), and the length of a bar represents the volume of sales for this particular brand.
Question #3
Find two data graphics published in a newspaper on on the internet in the last two years.
Identify a graphical display that you find compelling. What aspects of the display work well, and how do these relate to the principles that we have just gone over in lecture. Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).
Identify a graphical display that you find less compelling. What aspects of the display don’t work well? Are there ways that the display might be improved? Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).
Question #4
Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.
Answer:
The visual cues used in the first donut chart are color and angle. Using these cues, it is hard to demonstrate what is the highest percentage visually, we can do it only by reading labels. In my opinion, a better option would be a horizontal bar chart, and the data should be sorted in descending order for the percentage of the demand for data scientists. All these charts are designed in blue and gray colors, and despite the fact that different graphics visualize different variables, the colors are similar and it makes a reader confusing. The fourth and fifth charts use similar cues in differentiation the two professionals. In the fifth chart, it is better to use bar chart. The only visual cue in the last graph is color, the boxes are of the same size, which makes the differences hardly detectable if don’t reading the labels.
Question #5
Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.
Charts that explain food in America
Answer:
A geographic chart is an appropriate way to notify readers how agriculture industry is operating across various geographic regions of the country. Nevertheless, the selection of colors to distinguish categories looks quite random and it is hard to read. It is better to use one particular color but with different shades. If we look at the pie chart representing cash receipts of crops, we can find some confusing data labels: the numbers in parentheses are unclear and their meaning is unknown. When considering the Obesity and Energy Intake graph, it is quite hard for readers to understand which vertical axis corresponds to which line.It would be better to use a different diagram type or to create two separate diagrams. Finally, looking at the soft drink production stacked bar chart, I would say that it is hard to understand the beginning point and the ending point of the diet soda factor. This can be improved if we decide to develop a side by side bar chart, and put the starting point at the axis origin (zero).