Essay
The dataset, when I first imported it, was unworkable. I first renamed the columns to be easier to work with in R. The next challenge was coercing so many columns that had string “y” and “n” values into Boolean values, so that I could summarize them or otherwise perform operations on them logically. This was difficult and it took a lot of trial and error to find a method that worked, but eventually I managed to create a looping structure that checked wether the column contained “y” or “n” values, and then replace anything that wasn’t an “n” with TRUE, and “n” with false. I lost some data on the way, like the names of the metro stations, but I decided that wasn’t relevant to what I wanted to explore. Next I merged all the columns for 47 different nearby bus stops into one column that just counted how many bus stops were nearby. Again I lost the specificity of which bu routes those were, but I didn’t need that for what I was trying to do.
To create the visualization, I cleaned the dataset further by selecting for the language collection columns and pivoting them into longer format for each collection. I created a tile map that shows which libraries have collections in different non-English languages. It was surprising to me that the Germantown library, which had the largest total inventory, had collections for only 2 languages, while many smaller libraries had 4 or more. I’m not surprised to notice that Spanish is among the popular collections, since there are so many Spanish speakers here, and am a little surprised about the Chinese being up there with it. I wonder how much of it is in Mandarin or Cantonese or other languages.
I’m not satisfied with how the chart turned out, and originally wanted to show black tiles in place of blank tiles for libraries that didn’t have a particular collection, but I couldn’t figure it out yet, and in doing so lost rows for libraries that didn’t have any non-English collections, which would have been interesting to see. There’s also a lot of variables I cleaned and didn;t explore too deeply about, and would want to keep looking into, like the many services, library hours and nearby transportation. Overall this dataset has a lot of interesting information that I feel I didn’t spend enough time truly dissecting.