The content of this document is based on a blog post by Reed College’s newest member of the ITS Department Rich Marjerus who specializes in data visualization. If you have questions about data visualization or are interested in learning more about how to make dynamic maps in R, please stop by his office (ETC 223) or email him at majerus@reed.edu. The data we consider is the geographic distribution of the college“s entering classes from 2007-2013 on the Institutional Research page. Before you run this code, you”ll need to install the googleVis R package.

State of origin for US-originating Reed freshmen in 2013

Table 1 lists states in the first column and each enrollment year from 2007-2013 in the subsequent columns. Table 1 has five pages with ten states on each page. You can sort the data in the table by clicking on the column headers. For example to see which states accounted for the highest number of Reed matriculants in 2013 double-click on the 2013 column header to sort the states in descending order.

Table 1: Detailed Geographic Distribution of Entering Classes (First-year Students)

Historgrams and choropleth maps

Here is a histogram of the distribution of the number of students from each state in the US in 2013.

Using the googleVis package in R we can create dynamic maps of this state level data. Figure 1 shows the raw matriculant data from 2013 mapped by state. The darker a states shading the more matriculants from that state. Mousing over a state will reveal the exact number of students who matriculated from a certain state.

Figure 1: Domestic Geographic Distribution of 2013 Entering Class

I think this is a pretty good start to visualizing the geographic distribution of Reed“s 2013 matriculation data. Figure 1 clearly shows us that California accounts for the most matriculants, but then it quickly becomes difficult to distinguish between the shading of the other states. According to Table 1, Reed enrolled 87 students from California in 2013. The next highest state level enrollment was 28 from Oregon. Given that three-times as many students enrolled from California than from any other state, the color gradient in the map is correctly placing substantial emphasis on California.

However, we may be interested in learning more about the variation in matriculants across all states rather than identifying the states that account for the greatest number of matriculants. One way to approach this task is to map the log of matriculants or to take the log transformation of the variable of interest. Log transforming a variable that contains exceptionally large values (i.e. a right skewed variable) pulls those large values closer to the mean and yields a more symmetrically distributed variable. As for the map, log transforming the number of matriculants increases the variation in the color gradient across states and enables us to better visualize the distribution of Reed“s matriculants across the entire country (as you can see in Figure 2 below). We will use log base 10 so that the numbers reflect powers of 10 i.e. \(\log(100)=\log(10^2)=2\).

Figure 2: Domestic Geographic Distribution of 2013 Entering Class (Log Transformed)