Directions

In this chapter we discussed why well-designed data graphics are important and we described a taxonomy for understanding their composition.

The objective of this assignment is for you to understand what characteristics you can use to develop a great data graphic.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.

Question #1

Answer the following questions for this graphic Relationship between ages and psychosocial maturity

  1. Identify the visual cues, coordinate system, and scale(s)

  2. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above.

  3. Critique this data graphic using the taxonomy described in the lecture.

  4. Visual cues -

    1. color is being used to seperate the two different phenomena that have been observed and are being depicted.
    2. position on vertical scale indicates range of age values observed for each phenomena at each time period
    3. length of bar is being used to denote the variance in observations of the phenomena at a given time period
    4. text is being used denote significance of time periods
    5. line depicts relationship between observations across time periods.
    6. the direction of the boxes along the x-axis denote the change in observed age over the time periods.

Coordinate system - Cartesian system is being used

Scales - X-axis -> Time period (nonlinear scale) Y-axis -> Age in years as a linear scale

  1. Three variables are depicted in the graph, these are the variable age when phenomena was observed, the time period the observation of the phenomena belongs to, and the type of phenomena that was measured.
    1. Color is used to explain the third variable (type of phenomena) by using pink for psychosocial maturation and green for Menarche.
    2. The position of the bars on the vertical scale is used to explain the age when phenomena observed.
    3. The length of the bar is used to explain the variance of the age observed for a specific phenomena at a given time period.
    4. The text is being used to explain the time period the observation belongs to, the text visual cue is used to denote the significance of the time period (for ex. 2000 years ago, Agriculture settlement had begun, which is explained in the graphic)
    5. The line is being used to link the observations across different time periods for a specific phenomena, therefore it is used to explain both the time period and type of phenomena variables.
    6. The direction is being used to show how over the years (denoted by the time periods), the distribution of ages observed for the onset of a phenomena have changed. This direction is used to explain the changes of the first variable across the second variable for a category of the third variable
    • uses position on vertical and horizontal scale to depict the value of the observation at a given time period
    • makes use of the cartesian coordinate system
    • color is being used a visual cue to denote the type of phenomena being depicted
    • direction indicates the changing values observed over different time periods for a phenomenon.
    • uses a linear numeric scale and a nonlinear time scale
    • Context of the time periods depicted are provided with textual cues
    • Context of the overarching phenomena being depicted are indicated with the help of visual and textual cues (for ex. the curly braces used to denote the separation of the two groups in the present time period, denoted with the text ‘Mismatch’)
    • graphic is descriptive, explicitly conveys information and messaging is strongly enforced.

Question #2

Answer the following questions for this graphic World’s top 10 best selling cigarette brands 2004-2007

  1. Identify the visual cues, coordinate system, and scale(s)

  2. How many variables are depicted in the graph? Explicitly link each variable to a visual cue that you listed above.

  3. Critique this data graphic using the taxonomy described in the lecture.

  4. Visual cues

  1. color is being used to distinguish between the sales of different companies
  2. length of the bar is being used to indicate how much sales were made by that company
  3. text is used to name each company against their sales

Coordinate system - Cartesian system is being used

Scales - X-axis -> sales in billions (numeric linear scale) Y-axis -> company achieving the sales (categorical scale) b. Two variables are being depicted in this graph, first is the name of the company whose sales have been observed and the second is the amount of sales done by a company. 1. color is being used to explain the first variable as it is used to explain which company’s sales are being depicted. 2. length of the bar is being used to explain the second variable as it is used to show how much sales have been done by the company 3. text is being used to describe the first variable as it is being used to denote the name of the company against the bar for that company

    • visual cue of length is being used to depict amount of sales made
    • visual cue of color is being used to distinguish different companies
    • uses a linear scale for denoting sales
    • additional context of the data is provided with the help of the labels and title
    • graphic is simple, clear and provides a clear conclusion, use of linear scale also great at displaying difference between position 1 vs the rest

Question #3

Find two data graphics published in a newspaper on on the internet in the last two years.

  1. Identify a graphical display that you find compelling. What aspects of the display work well, and how do these relate to the principles that we have just gone over in lecture. Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).

  2. Identify a graphical display that you find less compelling. What aspects of the display don’t work well? Are there ways that the display might be improved? Include a screenshot of the display along with your solution (Hint:use the following in a code chunk: knitr::include_graphics(“your_graphic”).

  3. I particularly enjoyed the visualizations used by the New York Times during the 2022 midterm elections. I enjoyed the interactive nature of the data visualization, as you could view results at a national stage and by simply hovering over each state, you were presented with additional information in a dialog box. On clicking the state, you were guided to a new page for the state and even to the specific districts in the state with a live count of the votes reported so far. I enjoyed the use of color to denote the capturing of a seat by one party over the next and the use of textual cues to provide context to the graphic. [Link to New York Times Election tracker] (https://www.nytimes.com/interactive/2022/11/10/us/elections/results-house-seats-elections-congress.html)

Question #4

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

What is a Data Scientist

Answer:

This collection of graphics are thematically consistent. The designer has used a variety of visual cues and co-ordinate systems while sticking to % based scales in this collection. The use of colors is consistent with groups within each grpahic sharing the same color scheme to denote ranking with light blue denoting the highest ranking response by % of respondents and black denoting the smallest group, the gradient of colors over this scale while contrasting do not have a theme.I would have maybe chosen two colors that are contrasting to denote the ends of the colors of the theme and include colors that lie in-between them to ensure a logical gradient that further drives the point home of the ordering of these groups. Designer makes uses of extensive textual cues to share information with the reader including the use of a key-takeaway before the actual graphic to prime readers well. The collection does a great job at explaining what it sets out to do, but underlining the significant takeaways from their survey of Data Scientists and Data Science within a company.

Question #5

Briefly (one paragraph) critique the designer’s choices. Would you have made different choices? Why or why not? Note: Link contains a collection of many data graphics, and I don’t expect (or want) you to write a full report on each individual graphic. But each collection shares some common stylistic elements. You should comment on a few things that you notice about the design of the collection.

Charts that explain food in America

Answer: The designer appears to have curated a collection of data graphics all of which center around the agriculture industry and its different aspects. The designer includes a wide variety of data graphics that include cartesian plots, pie charts, a variety of map plots and even visualizations that depict more than 3 dimensions (gif of plot over time). Despite the curation from different sources, there is a consistent language of providing a key takeaway from each graphic before it is presented and providing a caption for each grpahic that provides extensive detail on it and it’s source. I would not have used graphic 14, since it does not make sense and the designer hasn’t explained how to interpret that warped world map.