Directions

In this chapter we discussed wy well-designed data graphics are important and we described a taxonomy for understanding their composition.

The objective of this assignment is for you to understand what characteristics you can use to develop a great data graphic.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.

Question #1

Answer the following questions for this graphic Relationship between ages and psychosocial maturity

  1. Identify the visual cues, coordinate system, and scale(s)
  1. Bar chart with different colors to represent different dimensions, green for Menarche and red for Psychosocial maturation. Other than the colors, the length of the bars also indicates distribution of the age for that dimension in certain time period.
  2. Line chart to show the trends over time from 20000 years ago till present.
  3. The slope of the trend line indicates how it changes and how much it changes over time. Coordinate system: Cartesian system - rectangular coordinate system with two perpendicular axes Scales: x-axis is time series, and y-axis is numeric scale.
  1. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above. three variables are depicted in the graph: Time, Age and comparison dimensions (Menarche vs Psychosocial maturation). Time is the x-axis, Age is the y-axis, and menarche vs Psychosocial maturation relationship is represented by two different colors of the bars.
  2. Critique this data graphic using the taxonomy described in the lecture. Overall the graph tells the story clearly that with the time being from 20k years ago to 200 years ago both menarche and psychosocial maturation has been increasing, while in present the gap between menarche and psychosocial maturation becomes huge due to social complexity and nutritional overload. It’s easy to know the relationship among those variables by colored bars/colored labels/annotation on the graph/time series/line chart. However, the length of the bar is not clear, if the author wanted to show the distribution within menarche/psychosocial in that period, it’d better to use boxplot rather than the bar.

Question #2

Answer the following questions for this graphic World’s top 10 best selling cigarette brands 2004-2007

  1. Identify the visual cues, coordinate system, and scale(s)
  1. Visual cues: Color and length
  2. coordinate system: Cartesian coordinate system
  3. Scales: Numeric - Linear
  1. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above.
  1. Marlboro: Color and length
  2. Mild Seven: Color and length
  3. L&M: Color and length
  4. Winston: Color and length
  5. Camel: Color and length
  6. Cleopatra: Color and length
  7. Derby: Color and length
  8. Pall Mall: Color and length
  9. Kent: Color and length
  10. Wills Gold flake: Color and length
  1. Critique this data graphic using the taxonomy described in the lecture. Visual cues Color and length are used for data graphics. There’s a nice variation in the color choices, and none stand out more than the others. The length of Marlboro clearly shows that it was the best-selling cigarette brand between 2004-2007. The length also shows the sales (in billions) of the individual brands. Context is provided by axis labels and titles

Question #3

Find two data graphics published in a newspaper on on the internet in the last two years.

  1. Identify a graphical display that you find compelling. What aspects of the display work well, and how do these relate to the principles that we have just gone over in lecture. Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”). This visual is very simple but informative. The author used water drops to replace dot or line to make this line chart. It provides average use/efficient use and user’s use very efficient by using legend, short labels and drastic color variance. You know what it is trying to say at your first glance.

  2. Identify a graphical display that you find less compelling. What aspects of the display don’t work well? Are there ways that the display might be improved? Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”). Poor pick of the background about the banana, good relevance but makes me hard to read the graph. very poor choose to present year dimension, and the 3-D makes it hard to see trends year over year and some data points are even hidden.

Question #4

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

What is a Data Scientist

Answer: It’s easy to see the difference between the different segments/sizes, like which part takes up more and less. And the authors also use color to 1) distinguish different categories in comparisons and 2) delineate proportions through sequential swatches. In addition to this, the author also uses size to depict the difference, which is obvious at first glance. However, the downside of just using percentages is that viewers also want to know the denominator to understand the scale of the impact. It’s best to include/mention the cardinality somewhere in the diagram or in the description or notes.

Question #5

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

Charts that explain food in America

Answer: Most of the maps in this collection use heatmaps by breaking down to the state or county or store level, which not only helps distinguish geographic boundaries, but also highlights another additional piece of information. Whereas if there is only one variable on the map, the authors tend to use the sequential/diverging color palette to depict the scales, and if there are 2+ variables, the authors tend to use 2+ colors to differentiate a categorical variable and the sequential/diverging color palette to depict the numbers variable, which adds another layer of information to the map. Most maps are good enough to tell a story. For some people who only focus on one part of the country, I won’t show the entire US map, but will zoom in a bit to help the viewer focus on that part. In addition to map views, some authors use trend lines to describe time series trends of variables over time, and bar charts to describe categorical versus numerical variables.