June, 2019

Jump to My Voyage in Data II

Introduction

In September 2017, I left my home town to start a new experience in New Zealand. As this has been, so far, my second far-flung trip, I decided to keep good records of what has been going on during these times. In this series of articles, I am going to share some experiments on data visualization based on my travel records as an alternative method of story-telling.

The Book of Names

Shortly before my departure, my sister-in-law gave me a small and fashionable notebook.

Once landed in Auckland, I realized what to do about it: I have written names of all people I met throughout my trips, regardless if they are friends, colleagues, passing-by acquaintances or even people I wish I would have never come across with.

This word cloud shows all the names of people I met. Some names display bigger than others because those were repeated among different people. A few names in there are just nicknames I made up because I could not get the actual name. Some names, however, may sound like nicknames but are actually not.
To be honest, it is not a perfect account of each and every soul I came across. For many people, I just never knew the name at all, even though I may have had some interaction with them – certainly not an important conversation. Beyond the obvious case of shopping clerks, some people I may just have missed names or were not properly introduced to me.

Origins

In addition to the names, I included the country of origin next to each name. It turns out I met people from many places around the world.
The world map below shows a colour scale according to the number of people I met from each country. Proportions are shown from green as lower to red as greater.

New Zealand has the brightest colour while a majority of countries show shades of green.
It was not until mid-2018 that I started writing down nationality. Thus, I had forgotten where many of my early acquaintances came from, and I just listed them as “unknown nationality”. Also, some people I actually could never figure out or ask where they were from.
The following waffle graph shows a proportional distribution of the nationality of the people I have met:

Some key observations in this plot:
1. Roughly more than a fifth of the people I met are New Zealand nationals
2. After kiwis, the second most important group is the “unknown” origin.
3. My country of origin is well represented among my acquaintances
4. There are 4 other nationalities representing percentages from 6 to 4%. All others are crammed together in the “Other” category.
Though intuitive and sexy, the shortcoming of this visualisation is that it can only plot up to 9 colours. However, there are 48 different countries in my Book of Names (plus, the “unknown” label) and for some of them, there are just a handful of people.

Regions

A somewhat better approach is to group countries by geographic regions. Kindly notice that the definition of “region” is arguable, but I intend to display a balance between aggregation and enunciation. The resulting waffle chart shows that the distribution by region is much more balanced, with the notable exception of Anglo-America and Africa (which is less than 1%). Despite the crowd of my countrymen and women who stood up in the previous waffle chart, Latin Americans occupy the fourth rank as a region. Also, the unknown origin segment becomes almost marginal from this perspective.

Communities

As I would still like to know more about how countries of origins are distributed in the Book of Names, I will try a different approach with a density estimation plot.

The number of frequencies (how many people for each country) is listed on the horizontal axis and the height of the shape represents how many countries fall within a frequency. Notice how 92 New Zealanders are on the right corner. Contrastingly, for the vast majority of countries, I had met fewer than 10 people. The broken red line is the median, which matches the frequency of four people.
Finally, to complement this graph, I can show the word cloud on the background panel:

Citations

This is a personal project to share my data analysis and visualization skills with R and combine it in a story-telling format. For those interested in the code I used on this publication, feel free to contact me. Below are the citations of technologies I used in this article.

## R Core Team (2018). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna,
## Austria. <URL: https://www.R-project.org/>.
## 
## Urbanek S (2013). _png: Read and write PNG images_. R package
## version 0.1-7, <URL: https://CRAN.R-project.org/package=png>.
## 
## Becker OScbRA, Minka ARWRvbRBEbTP, Deckmyn. A (2018). _maps: Draw
## Geographical Maps_. R package version 3.3.0, <URL:
## https://CRAN.R-project.org/package=maps>.
## 
## Hijmans R (2019). _geosphere: Spherical Trigonometry_. R package
## version 1.5-10, <URL:
## https://CRAN.R-project.org/package=geosphere>.
## 
## Vaidyanathan R, Xie Y, Allaire J, Cheng J, Russell K (2018).
## _htmlwidgets: HTML Widgets for R_. R package version 1.3, <URL:
## https://CRAN.R-project.org/package=htmlwidgets>.
## 
## Wickham H (2017). _httr: Tools for Working with URLs and HTTP_. R
## package version 1.3.1, <URL:
## https://CRAN.R-project.org/package=httr>.
## 
## Wickham H (2019). _rvest: Easily Harvest (Scrape) Web Pages_. R
## package version 0.3.4, <URL:
## https://CRAN.R-project.org/package=rvest>.
## 
## Wickham H, Hester J, Ooms J (2018). _xml2: Parse XML_. R package
## version 1.2.0, <URL: https://CRAN.R-project.org/package=xml2>.
## 
## Wickham H (2018). _stringr: Simple, Consistent Wrappers for Common
## String Operations_. R package version 1.3.1, <URL:
## https://CRAN.R-project.org/package=stringr>.
## 
## Wickham H, Henry L (2018). _tidyr: Easily Tidy Data with
## 'spread()' and 'gather()' Functions_. R package version 0.8.2,
## <URL: https://CRAN.R-project.org/package=tidyr>.
## 
## Garnier S (2018). _viridis: Default Color Maps from 'matplotlib'_.
## R package version 0.5.1, <URL:
## https://CRAN.R-project.org/package=viridis>.
## 
## Garnier S (2018). _viridisLite: Default Color Maps from
## 'matplotlib' (Lite Version)_. R package version 0.3.0, <URL:
## https://CRAN.R-project.org/package=viridisLite>.
## 
## Rudis B, Gandy D (2017). _waffle: Create Waffle Chart
## Visualizations in R_. R package version 0.7.0, <URL:
## https://CRAN.R-project.org/package=waffle>.
## 
## Wickham H, François R, Henry L, Müller K (2019). _dplyr: A Grammar
## of Data Manipulation_. R package version 0.8.3, <URL:
## https://CRAN.R-project.org/package=dplyr>.
## 
## Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_.
## Springer-Verlag New York. ISBN 978-3-319-24277-4, <URL:
## http://ggplot2.org>.