Evaluate a Visualiation

A blog post shows this visualization of donations for reaserch on on several diseases and the number of deaths attributed to these diseases:

  • The image is originally from an article at Vox. The chart now on the Vox site is different; comment on the key change.

The size differences in the bubbles were originally much more staggering, with the large bubble being much much larger than the smallest, which is barely able to be seen on the chart. This is likely due to scaling the radii of the bubbles to the numerical value instead of the area directly. Since, humans view the bubbles size in terms of area and since area has a squared relationship to the radius, the scaling can make the bubbles misleading**

  • Describe the items, attributes, marks, channels, and mappings used in the visualization. Comment on how well the strength of the channels used mathch the importance of the featured displayed.

Items:

      Relationship between the amount spent and the number of deaths from disease

Attributes:

      Disease       Money raised in dollars for each disease       Number of deaths in US for each disease

Marks:

      Circles       Text

Channels:

      Area of circles mapped to money raised       Area of circles mapped to number of deaths       Color mapped to disease       Text mapped to number of deaths and money raised

None of the channels are particularly strong. Area is not a great channel to assess magnitude. In addition, color is not the strongest Identity channel to use for perceptual grouping. The relationship between deaths and money spent is not immediatly apparent and requires searching through the two columns

  • Can you suggest alternatives that might improve on this visualiation in some respects?

A better visualization might incorporated horizontal and vertical position to represent the numerical variables, with color and text to identify the disease. This would more readily demonstrate the relationship between these variables, and which diseases don’t fit the pattern (which was the author’s intent). This plot would also scale to more data points more readily

Air Pollution Data

The SemiPar package contains a data set calif.air.poll with measurements of ozone level and several covariates for 345 days. After loading the package use data(calif.air.poll) to load the data set. The help page contains more information.

  • Create a scatterplot matrix for the variables in the data set and comment on any interesting features you see.
##   ozone.level daggett.pressure.gradient inversion.base.height
## 1           3                       -15                  5000
## 2           3                       -25                  2693
## 3           5                       -24                   590
## 4           5                        25                  1450
## 5           6                        15                  1568
## 6           4                       -33                  2631
##   inversion.base.temp
## 1               30.56
## 2               47.66
## 3               55.04
## 4               57.02
## 5               53.78
## 6               54.14

  • Base height might be capped at 5,000 feet. There are many data points with this value.
  • As base temperature increases, ozone level tends to increase, and base height tends to decrease. There does not seem to be an association between pressure gradiant and temperature.
  • As base height increases, there is less variability in ozone level.
  • As ozone level increseas there is less variability in pressure gradiant, which becomes centered around zero.

  • Use conditioning plots to explore how the relationship between ozone.level and inversion.base.temp changes as inversion.base.height increases.

When base height is low, there is a positive linear relationship between base temperature and ozone level. As base height approaches 5,000 ft, the relationship between base temperature and ozone flattens and ozone level becomes approximatley zero for the range of temperatures.

  • Examine the distribution of ozone.level, inversion.base.temp and inversion.base.height using points3d from the rgl package. The scale function may be useful for rescaling the data. Comment on any interesting features you see. Include an interactive WebGL plot in your report.