INTRODUCTION


Humans think in terms of images and stories, and we have incredible abilities to find patterns in images. Thus, I am energetically drawn towards data visualizations - they are a powerful means of distilling large data sets and communicating the stories drawn from data.

Following are examples of visualizations I created. My tool-of-choice is R, but in the processes of any data analysis and visualization, I may use other tools such as Excel, QGIS, SQL, HTML, or CSS.

This portfolio is a living document. Please check back from time to time and see what I’ve added.





INTERACTIVE MAP


Maps of all kinds are a long-enduring passion of mine. I get deep satisfaction creating interactive maps, where spreadsheets of numbers get transformed into beautiful, understandable visual patterns. I use the JavaScript library Leaflet, via R, to create interactive maps. In the following map, you can zoom in and out, scroll in any direction, select from several background maps, and choose the layers you wish to view. Controls are hidden in the “stacked layers” icon at the top right of the map. Hover over the icon to see options. (To make a layer the top one, click it off and then on). Clicking on each data point will produce a popup with additional information.

The following map shows all of the forest fires in Oregon from 2013 to 2015. You can explore the geographic distribution of major causes of forest fires by showing individual layers. For example, it is easy to see that lightning-caused fires are more prevalent in the southern and eastern parts of Oregon. I’ve included a variety of base maps to use depending on the question you are exploring. Using the “Terrain” map shows how difficult some fires must have been to fight because of steep, rugged slopes.







GRAPHS


Interactive Time-series Graph

Time-series graphs can show cyclical trends or changes after significant events. The following graph shows the daily number of births in Quebec from 1977 to 1990. The graph includes a range selector beneath it which allows you to zoom in on the time range of interest. Simply move the sliders on the left and right.







Moveable 3-D Graph

Adding a third variable to a graph is often helpful, and while static 3-D graphs are possible, being able to rotate the graph to view it from any angel is superior. Below is a 3-D plot of Fisher’s classic iris data set (collected by Edgar Anderson) which includes sepal and petal measurements of three species. I plotted sepal length, sepal width, and petal length on the X, Y, and Z axes, respectively, and color coded the species ( Setosa = light blue, Versicolor = blue, and Virginica = light green). To rotate, click anywhere on the graph and drag in any direction. Zoom using your mouse scroll wheel.







Network Graph

Many systems, whether biological, physical, or social are best represented as a network where features such as connectivity, complexity, and distance can illuminate many questions. The following graph shows frequent associations between 62 dolphins off the coast of New Zealand (Lusseau et al. 2003). Hover over a node to see the name (or designation) of the dolphin, and drag nodes around to explore the network. Male dolphins are dark blue, female light blue, and those with unknown gender are orange.







Scatter Plots

Static graphs are the bread and butter of data visualizations. A simple scatter plot of 2 variables, say tree height vs tree diameter, makes the relationship concrete, and allows us to easily detect interesting outliers (e.g. the rather stout baobab tree).

When I worked for the Pacific Northwest National Laboratory, our research focused on determining the number of young salmon that survived going through dams on the Columbia River. We implanted tiny, acoustic transmitters in a sample (50,000!) of migrating fish, and listened for their unique codes as they approached the dams. The sensors we used (hydrophones), were submerged in the water on the upstream face of the dams. The problem was that dams make a lot of noise, and trying to hear an acoustic signal sent at very low power was difficult at useful distances. We constructed shields (baffles) that blocked noise coming from or reflected off of the dams. To test the baffles, two hydrophones were lowered in the water, one with a baffle, the other without, and a transmitter, mounted to a boat, was moved toward and away from the dam in a series of transects. The signal-to-noise ratio (SNR) was recorded each time a code was heard. SNR is a good proxy for distance: higher SNRs imply the fish is close by.


The following scatter plots show that baffles are effective at blocking noise, allowing detection of distant signals. The top plot shows all detections and their SNR levels for the baffled and non-baffled hydrophones compared with the distance the “fish” was from the hydrophone. The bottom plot shows all detections and their distance from each hydrophone.













Correlation Matrix

Multicollinearity (two or more highly correlated variables) can cause problems when building multiple regression models. An early step in data exploration is creating a correlation matrix. Following is one way to visualize such a matrix, and quickly spot variables that are highly correlated. The more highly correlated two variables are, the more oblong the shape where the two intersect. If the shape is blue, they are positively correlated, if they are red, they are negatively correlated. Two variables with no correlation would show up as a grey circle.

The data set used comes from a paper by J. W. Longley (1967) which demonstrates highly correlated macroeconomic indicators.







Heat Map

Heat maps are a useful way to represent matrix data, facilitating the discovery of clusters or outliers. The following heatmap shows daily weather-related measurements taken in Portland, Oregon November 2011. Hover over a cell to get a popup showing the row, column, and value.







TABLES


While using visualization techniques usually makes the process of finding trends easier, sometimes it is necessary to present a table. The following table is fairly large (2460 rows, 14 columns), and presenting it in a web page requires some “folding” technique. Here you can select the number of rows you want to see, order by clicking on the arrows at the top of the appropriate column, or search for a particular subset. Try selecting one of the “General.Causes”, e.g. “Debris Burning”, and it will filter the table to just those instances. The total number of rows remaining after filtering are shown at the bottom of the table.







INTERACTIVE APPLICATIONS


Shiny App

Sometimes we want more interactivity than we get on a single, interactive graph. That’s where full-blown applications come in. I build applications with Shiny which brings to bare all the tools and packages available in R. To see an example of one of my Shiny apps, go here: Shiny App

Slides

Presentations can be built in R with “R Presentations” or “Slidify”, which allow you to author HTML5 presentations. One advantage of these packages is that R code is embedded in the presentation, so any changes in code are easily re-rendered. For one of my examples, go here: Data Science Capstone Presentation





BIBLIOGRAPHY


Anderson, Edgar, The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5 (1935).

Longley, J. W. An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association 62, 819–841 (1967).

Lusseau, D., K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54, 396–405 (2003).