Directions

In this chapter we discussed why well-designed data graphics are important and we described a taxonomy for understanding their composition.

The objective of this assignment is for you to understand what characteristics you can use to develop a great data graphic.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.

Question #1

Answer the following questions for this graphic Relationship between ages and psychosocial maturity

  1. Identify the visual cues, coordinate system, and scale(s) In this graph, the x axis represents the historical timeline from 20,000 years ago, 2000 years ago, 200 years ago, and today. The Y axis represents Age in Years. The graph itself depicts two bars per historical era split by color: green represents menarche while pink represents psychological maturity. We can see that as we go forwards in time, both bars get higher, indicating that both psychological maturity and menarche are happening at later ages than in the past. This holds true until we get to the present. In the very last set of bars, there is a distinct mismatch between the two, where psychological maturity is happening much later, but menarche is happening very early.

  2. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above. Two variables: menarche and psychological maturity. The visual cue for these variables is color. Menarche is green while psychological maturity is pink

  3. Critique this data graphic using the taxonomy described in the lecture. I think the graphic is efficient in driving home the point that it is trying to make, which is that there is a significant difference nowadays between menarche and psychological maturity, contrary to the past. The colors in the graph make it easy to see what each bar represents. My only complaint is the scale on the Y axis. There is only a 10 and a 20, and its hard to determine where anything in between is. I would instead have a line for every 2 years (i.e 10, 12, 14, etc). The graph is also missing a title.

Question #2

Answer the following questions for this graphic World’s top 10 best selling cigarette brands 2004-2007

  1. Identify the visual cues, coordinate system, and scale(s) This is a horizontal bar chart depicting the popularity of cigarette brands. The X axis represents sales in billions, with a scale from 0 to 500 billion, while the y axis lists the cigarette brands. A separate color depicts each cigarette brand.

  2. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above. Two variables: cigarette brands and sales. A separate color is used for each cigarette brand.

  3. Critique this data graphic using the taxonomy described in the lecture. I think this is a very simple, yet very efficient graph. It clearly shows the dominance of Malboro in the cigarette industry, and the audience can see the magnitude of this dominance by looking at how much longer the bar is for Marlboro compared to the other brands. The use of color is also a nice touch that allows the viewer to distinguish the brands. The scale is approporiate and the there are axis titles and a main title showing the dates included in the study.

Question #3

Find two data graphics published in a newspaper on on the internet in the last two years.

  1. Identify a graphical display that you find compelling. What aspects of the display work well, and how do these relate to the principles that we have just gone over in lecture. Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).
knitr::include_graphics("Drug Arrests.webp")

This graph shows the difference in drug arrests related to possession vs Sale/Manufacturing. I like this graph because it is simple in conveying a clear idea, which is that drug arrests are mainly focused on possession of drugs as opposed to shutting down the industry at the source, which is sales and manufacturing. The x axis shows Years (from 1985 to 2020) and the Y axis shows the # of arrests. A separate color is used for arrests due to possession and arrests due to sale/manufacturing. My only complaint about this graph is that there are no axis titles, even though they are pretty obvious.

  1. Identify a graphical display that you find less compelling. What aspects of the display don’t work well? Are there ways that the display might be improved? Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).
knitr::include_graphics("Population by Country.png")

The above graph is not a compelling one. It is meant to show population by country by putting the countries into a pie chart. However, it is a pie chart with way too many sub-sections, making the graph look crammed and very dififcult to read. Once we start getting to the smaller countries, we can barely see their sections in the graph and we get nothing out of them. Perhaps a better way would be to show only the top 9 countries and then the 10th sub section can be “Other”. That way there are less subsections and the graph will be easier to read.

Question #4

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

What is a Data Scientist

Answer:

Overall, I think the graphics shown in this link are good. I like the consistent distinction through the use of color between business intelligence and data science, it makes it easier for the audience to follow the “story”. My favorite graphic is the one showing the biggest obstacles to data science adoption. The graphic uses lighter colors for the lower percentages, and stronger colors for the higher percentages, which I think is efficient. The one chart that I did not like was the pie chart at the beginning. It seems to me like there are better graphics to show this information. There are basically 5 levels, going from “significantly outpace demand” to “be significantly less than demand”. Maybe building a bar chart (with the bars in order) instead of a pie chart would be easier to understand and analyze.

Question #5

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

Charts that explain food in America

Answer:

There are some graphs here that I think are very good, and others that are somewhat confusing. I like the graphs that use color efficiently, such as the graph increases and decreases in farms by using the colors blue and red, respectively. It makes it clear for the audience and makes it easy to interpret. The meat consumption graph is another good one, using different shades of red to show how much meat is consumed in different countries. I am not a big fan of graphs that have too much going on that makes them look crammed and messy. An example is graph #25, which shows american pizza chains throughout the country. There are too many colors in the map, and some colors are very similar to each other (such as pizza hut and little ceasars, which are red and orange). Graph #28 is similar. Too many colors, and it can be hard to distinguish the colors from each other in the map.