Proportional symbol maps (also known as graduate symbol maps) are a class of maps that use the visual variable of size to represent differences in the magnitude of a discrete, abruptly changing phenomenon, e.g. counts of people. Like choropleth maps, you can create classed or unclassed versions of these maps. The classed ones are known as range-graded or graduated symbols, and the unclassed are called proportional symbols, where the area of the symbols are proportional to the values of the attribute being mapped.
The dataset contains data of the various armed conflicts that have occurred in Southern Asia over a period of 4 years from 2016 to 2020. The core purpose is to create a spatial visualization that would help us observe and analyse the fatalities and types of events that have taken place in countries like India, Pakistan, Nepal, Sri Lanka and Bangladesh. The spatial visualization that I will focus on is a proportional symbol map with relevant interactivity and animation.
3.1 tmap is based on the idea of a ‘grammar of graphics’. This involves a separation between the input data and the aesthetics (how data are visualised): each input dataset can be ‘mapped’ in a range of different ways including location on the map (defined by data’s geometry), color, and other visual variables. The basic building block is tm_shape() (which defines input data, raster and vector objects), followed by one or more layer elements such as tm_fill() and tm_dots()
3.2 The object passed to tm_shape() in this case is an sf object representing the regions. Layers are added to represent them visually, with tm_fill() and tm_borders() creating shaded areas and border outlines.
This is an intuitive approach to map making: the common task of adding new layers is undertaken by the addition operator +, followed by tm_(). The asterisk () refers to a wide range of layer types which have self-explanatory names including fill, borders, bubbles, text and raster.
3.3 Another component would be the coloured legends showing different sizes of the variables given.
3.4 Animation: This is provided using the ImageMagick package and a gif which we will create further down in the code chunks.
3.5 Interactivity: This is provided using the view mode of tmap which we will use down in the code chunks. We will create a synchronised hover circle which will be used in the faceted maps for real-time comparison of different factors in our dataframe.
Proportional Symbol Map
Before we get started, we need to ensure that tmap package of R and other related R packages have been installed and loaded into R. These include sf and tidyverse as well. Further, these will be loaded using the library function in R.
packages = c('sf', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: sf
## Warning: package 'sf' was built under R version 3.6.3
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
## Loading required package: tmap
## Warning: package 'tmap' was built under R version 3.6.3
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages -------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.5
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ----------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
The data set use for this data viz assignment is called 2016-02-01-2020-02-01-Southern_Asia. The data is in csv file format. The map we’ll be using for the spatial visual is in geojson format and it was custom loaded from a website. The name of the map file is southasia.
The code chunk below uses read_csv() function of readr package to import 2016-02-01-2020-02-01-Southern_Asia.csv into R as a tibble data frame called armed_conflicts. Further, map is read using st_read function into a dataframe called map for ease of understanding.
Additionally, we will filter the data based on the fatalities column. We will consider only those events which had 1 or more than fatalities as they are ones with a cause for grave concern.
armed_conflicts <- read_csv("2016-02-01-2020-02-01-Southern_Asia.csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## data_id = col_double(),
## iso = col_double(),
## event_id_no_cnty = col_double(),
## year = col_double(),
## time_precision = col_double(),
## inter1 = col_double(),
## inter2 = col_double(),
## interaction = col_double(),
## latitude = col_double(),
## longitude = col_double(),
## geo_precision = col_double(),
## fatalities = col_double(),
## timestamp = col_double()
## )
## See spec(...) for full column specifications.
map <- st_read("southasia.json")
## Reading layer `southasia' from data source `C:\Users\Jaideep Ballani\Desktop\Visual Analytics\Lesson 9\Data Viz 9\southasia.json' using driver `GeoJSON'
## Simple feature collection with 7 features and 64 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 60.52843 ymin: 5.96837 xmax: 97.40256 ymax: 38.48628
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
armed_filter0 <- filter_at(armed_conflicts,vars(starts_with("fatalities")), all_vars((.) != 0))
After importing the data file into R, it is important for us to examine if the data file has been imported correctly.
The code chunk below shows list() is used to do the job. Here, we can a sample of the dataset to get an initial idea of the different variables and factors of armed conflicts.
list(armed_filter0)
## [[1]]
## # A tibble: 7,195 x 31
## data_id iso event_id_cnty event_id_no_cnty event_date year time_precision
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 6771745 50 BGD17277 17277 01 Februa~ 2020 1
## 2 6771497 356 IND70622 70622 01 Februa~ 2020 2
## 3 6863760 356 IND71150 71150 01 Februa~ 2020 1
## 4 6771682 50 BGD17274 17274 01 Februa~ 2020 1
## 5 6771500 356 IND70652 70652 31 Januar~ 2020 1
## 6 6771522 356 IND70827 70827 31 Januar~ 2020 1
## 7 6772068 356 IND70611 70611 31 Januar~ 2020 1
## 8 6771909 586 PAK55748 55748 31 Januar~ 2020 1
## 9 6771864 586 PAK55824 55824 30 Januar~ 2020 1
## 10 6771796 356 IND70582 70582 29 Januar~ 2020 1
## # ... with 7,185 more rows, and 24 more variables: event_type <chr>,
## # sub_event_type <chr>, actor1 <chr>, assoc_actor_1 <chr>, inter1 <dbl>,
## # actor2 <chr>, assoc_actor_2 <chr>, inter2 <dbl>, interaction <dbl>,
## # region <chr>, country <chr>, admin1 <chr>, admin2 <chr>, admin3 <chr>,
## # location <chr>, latitude <dbl>, longitude <dbl>, geo_precision <dbl>,
## # source <chr>, source_scale <chr>, notes <chr>, fatalities <dbl>,
## # timestamp <dbl>, iso3 <chr>
The code chunk below converts armed_filter0 data frame into a simple feature data frame by using st_as_sf() of sf packages.
armed_sf <- st_as_sf(armed_filter0,
coords = c("longitude", "latitude"),
crs= 4326)
Things to learn from the arguments above:
The coords argument requires you to provide the column name of the x-coordinates first then followed by the column name of the y-coordinates.
The crs argument required you to provide the coordinates system in epsg format. EPSG: 4326 is South Asia Projected Coordinate System. You can search for other country’s epsg code by refering to epsg.io.
You can display the basic information of the newly created armed_sf by using the code chunk below.
list(armed_sf)
## [[1]]
## Simple feature collection with 7195 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 61.7461 ymin: 5.9831 xmax: 96.1788 ymax: 35.9186
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 7,195 x 30
## data_id iso event_id_cnty event_id_no_cnty event_date year time_precision
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 6771745 50 BGD17277 17277 01 Februa~ 2020 1
## 2 6771497 356 IND70622 70622 01 Februa~ 2020 2
## 3 6863760 356 IND71150 71150 01 Februa~ 2020 1
## 4 6771682 50 BGD17274 17274 01 Februa~ 2020 1
## 5 6771500 356 IND70652 70652 31 Januar~ 2020 1
## 6 6771522 356 IND70827 70827 31 Januar~ 2020 1
## 7 6772068 356 IND70611 70611 31 Januar~ 2020 1
## 8 6771909 586 PAK55748 55748 31 Januar~ 2020 1
## 9 6771864 586 PAK55824 55824 30 Januar~ 2020 1
## 10 6771796 356 IND70582 70582 29 Januar~ 2020 1
## # ... with 7,185 more rows, and 23 more variables: event_type <chr>,
## # sub_event_type <chr>, actor1 <chr>, assoc_actor_1 <chr>, inter1 <dbl>,
## # actor2 <chr>, assoc_actor_2 <chr>, inter2 <dbl>, interaction <dbl>,
## # region <chr>, country <chr>, admin1 <chr>, admin2 <chr>, admin3 <chr>,
## # location <chr>, geo_precision <dbl>, source <chr>, source_scale <chr>,
## # notes <chr>, fatalities <dbl>, timestamp <dbl>, iso3 <chr>, geometry <POINT
## # [°]>
To create a proportional symbol map in R, the plot mode of tmap will be used.
The code churn below will turn on the plotting mode of tmap.
tmap_mode("plot")
## tmap mode set to plotting
The code chunks below are used to a proportional symbol map of our dataframe.
tm_shape(map)+tm_polygons()+
tm_shape(armed_sf)+
tm_bubbles(col = "red",
size = 0.1,
border.col = "black",
border.lwd = 1)
To draw a proportional symbol map, we need to assign a numerical variable to the size visual attribute. The code chunks below show that the variable fatalities is assigned to size visual attribute. The code chunks below will create a map with the size of the symbols given by the number of fatalities caused by these armed conflicts.
tm_shape(map)+tm_polygons()+
tm_shape(armed_sf)+
tm_bubbles(col = "red",
size = "fatalities",
border.col = "black",
border.lwd = 1)
The proportional symbol map can be further improved by using the colour visual attribute. In the code chunks below, event_type variable is used as the colour attribute variable. The code chunk below will create a map showing the number of fatalities caused by different types of events.
tm_shape(map)+tm_polygons()+
tm_shape(armed_sf)+
tm_bubbles(col = "event_type",
size = "fatalities",
border.col = "black",
border.lwd = 1)
Now, we will create faceted maps. We will visualize one map for each type of event and the symbol. This map will contain 6 facets for 6 event types.
tmap_options(limits = c(facets.view = 6))
tm_shape(map)+tm_polygons()+
tm_shape(armed_sf) +
tm_bubbles(col = "event_type",
size = "fatalities",
border.col = "black",
border.lwd = 1) +
tm_facets(by= "event_type",
sync = TRUE)
The new magick package is an ambitious effort to modernize and simplify high-quality image processing in R. It wraps the ImageMagick STL which is perhaps the most comprehensive open-source image processing library available today.
The ImageMagick library has an overwhelming amount of functionality. The current version of Magick exposes a decent chunk of it, but being a first release, documentation is still sparse. This post briefly introduces the most important concepts to get started.
First, we will install the ImageMagick package using the code chunk given below.
#install.packages("installr")
library(installr)
## Warning: package 'installr' was built under R version 3.6.3
##
## Welcome to installr version 0.22.0
##
## More information is available on the installr project website:
## https://github.com/talgalili/installr/
##
## Contact: <tal.galili@gmail.com>
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/installr/issues
##
## To suppress this message use:
## suppressPackageStartupMessages(library(installr))
#install.imagemagick("https://www.imagemagick.org/script/download.php")
Second, we will set the environment to have the ImageMagick package in our system PATH using the code chunk below.
Sys.setenv(PATH = paste("C:/Program Files/ImageMagick/bin",
Sys.getenv("PATH"), sep = ";"))
Third, we will create a gif file for this animated map. This is then used to create an animated proportional symbol map using the code chunk below.
t<-tm_shape(map)+tm_polygons()+tm_shape(armed_sf)+
tm_bubbles(col = "event_type",
size = "fatalities",
alpha=0.7,
border.col = "black",
border.lwd = 1)+
tm_facets(along = "year", free.coords = FALSE)+
tm_layout(main.title.size = 1)
tmap_animation(t, filename = "t.gif", delay = 200, restart.delay = 200, height=9,width=12,dpi=72)
## Animation saved to C:\Users\Jaideep Ballani\Desktop\Visual Analytics\Lesson 9\Data Viz 9\t.gif
Finally, we will use the view mode of tmap to create an interactive faceted map with synchronisation among all 6 facets. This will be done using the code chunk below.
tmap_mode("view")
## tmap mode set to interactive viewing
tmap_options(limits = c(facets.view = 6))
tm_shape(armed_sf)+
tm_bubbles(col = "event_type",
size = 1,
alpha=0.7,
border.col = "black",
border.lwd = 1)+
tm_facets(by="event_type",
sync=TRUE)
As we can see in the proportional symbol map, the country with conflicts causing the highest number of fatalities is Pakistan. There have been armed conflicts near and on the Pakista-Afghanistan border as well. This highlights the recent dire state of relations between these two countries marred by internal turmoil. Additionally, there have been a few conflicts with a high number of fatalities on the India-Pakistan border as well. This reinforces the unrest between these two nations over the past 4 years.
As we can see in the proportional symbol map showing the different types of events, Pakistan has had the most number of explosions and/or remote violence conflicts causing a large number of fatalities. This can be observed in Sri Lanka as well. This shows the ongoing prominence of terrorist cells like Lashkar-e-Taiba, Al-Qaeda, LTTE, Taliban among others.
As we can see in our faceted map, India has the most number of riots and violence against civilians. We know that some riots lead to violence against civilians so there could be an overlap. Nonetheless, with the ongoing rule of the BJP party since 2014, we can observe this clearly growing rightest nationalist movement which threatens the secularism, the value at the core of the Indian Constitution.
As we can see in our animated proportional symbol map, the number of armed conflicts and resulting fatalities have been increasing since 2016. This is a worrying sign for the Southern Asian nations as this is the most populated continent in the world, and these conflicts will have a devastating effect on lives of people, economies and development.
The great benefit of animation is that it allows for the expansion of the number of variables you can visualize. The motion of an animated plot is ‘driven’ by a variable in the data set. It is invaluable when visualising changes over time, which is tedious with static visuals.
Animated representations enable users to more often correctly identify whether a particular type of pattern was present than do the static representations. We are able to come to a conclusion about what we saw more quickly while viewing an animated representation.
Data Story – How complex is the story? Isolating on a specific data story is required for static visualizations. With an abundance of data, having just one static representation may not be enough to really visualize the whole story. An interactive visualization provides the solution for larger data sets with complex stories and can provide relevant information for extended periods of time.
Multiple questions: Interactivity allows us to pose multiple questions per visualization – allowing us to switch axes or to add confabulating factors, which could be very useful for survey data, and allowing viewers to break down responses to a specific question by gender, age or perhaps occupation. Focus on detail: Interactivity allows users to zoom into a visualization – physically selecting an area of interest and blowing up that area of the chart. Hover information is also incredibly useful – if you’re built a choropleth that visualizes the relative populations of different countries, allowing users to hover over a country to get the exact population value is much more useful than simple providing a color gradient legend. User experience: Interactivity is particularly powerful when you allow users to select points or series in a chart and for a summary of the relevant data to appear. This allows us to build clean looking visualizations that are actually very rich and provide a tool for viewers to explore your data as little or as much as they’re interested.