The report below details the preliminary exploratory analysis for a project undertaken for the course ISSS608- Visual Analytics offered in SMU MITB. As one of the project deliverables, a Shiny application is built to allow users to explore and perform spatial point pattern analysis on the dataset at a deeper level. Please click on the link here to explore the shiny application!
In this age of growing socio-political and cultural dissimilarities, the occurrences of armed conflicts have risen. Non-profit organizations like ACLED have been collating the data of such conflicts in a tabular form, analyzing such data and mapping the crisis of these events. It is of research interest to study the conflict environments and how the nature of conflicts and its intensity vary across space and time.
In this dataviz exercise, the analysis centers on Pakistan. The focus is to visualise how different types of armed conflicts are distributed across Pakistan, and simultaneously study how the intensity (frequency of occurrence) of each conflict types vary over time. It aims to answer the following questions: - Are regions divided into sub-regions where one type of event dominate? - Do these spatial patterns persist over time?
The types of data visualisations used, and corresponding rationales are listed below:
Point Symbol Map: Each point symbol on the map represent one occurrence of armed conflict. When all events are plotted on the map, it becomes visually evident if the ‘intensity’ (i.e. The average density of points per unit area) varies from location to location, and how the events are spatially segregated. One advantage is that information about each point event can be supplemented through labelling/ tooltips (which cannot be achieved with 2d density plot)
Kernel density (raster) map: Kernel density plot is another way to visualise spatial variation in density of point events. Using the spatstat package, the kernel density estimate is computed using Gaussian smoothing function, therefore intensity values across the study area are smoothened to make a continuous surface. Hence it can be visualised like a heat map and is useful to show which sub-regions are hot spots for armed conflicts. Another benefit of density plots is that it is visually clearer (as it is supplemented with a quantitative value) to differentiate variation in density as compared to points, especially for cases where there are too many point events in a region and the map becomes too cluttered for effective visualisation.
Stacked Area Charts: Stacked area charts are useful to show how frequency and simultaneously the proportion of each conflict type, aggregated across the country evolves across time. Dygraphs are used here because of its highly interactive features including zoom/pan and series/point highlighting.
The following interactive features are proposed to be incorporated in the data visualisations to enhance usability and hence user experience: - View manipulation: To allow users to manipulate the data by filtering conflict types and year, for a high-level or isolated view, to better examine distribution and patterns across time and space. This feature will be incorporated in the time-series analysis using point symbol maps. - Use of tooltips: Supplementing information about the point event on the point symbol maps when user hovers or clicks on the datapoint. This feature is also incorporated in the stacked area chart (dygraph) whereby frequency values are displayed on the graph upon series/point highlighting.
- Pan and zoom: To allow user to navigate through large information spaces on maps (made interactive with leaflet). Through zooming and panning, users can view the overall distribution across country, or zoom in on a particular state. Zooming and panning is also a feature incorporated into dygraph, where users can zoom in on a shorter range of dates by toggling the slider selector at the bottom of the chart.
A list of packages are required for this makeover exercise. This code chunk installs the required packages and loads them onto RStudio environment.
packages <- c('tidyverse','sf','RColorBrewer','dygraphs','viridis','ggpubr','GADMTools','tmap','here','rnaturalearthdata','gganimate','lubridate','plotly','ggmap','streamgraph','crosstalk','htmltools','leaflet','wordcloud2','tm','xts','raster','maptools','rgdal','spatstat','sp','rpanel','tkrplot','rgdal')
for (p in packages){
if (!require(p,character.only=T)){
install.packages(p)
}
library(p, character.only=T)
}
The dataset used in this exercise is sourced from The Armed Conflict Location and Event Data Project (ACLED), a project that collects data, provides analysis and performs crisis mapping on the reported political violence and protest activities around the world. In this data visualisation exercise, the focus is on conflicts in Pakistan occurring between 2016-2020. The data was exported from https://acleddata.com/.
Data import was accomplished using read_csv() of readr package, which is useful for reading delimited files into a tibble.
# Reading the csv file as a tbl_df
ACLED_SA <- read_csv("Data/2016-01-01-2019-12-31-Southern_Asia.csv")
# Inspecting the structure of the dataset
str(ACLED_SA)
## tibble [100,995 x 31] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ data_id : num [1:100995] 6715395 6714375 6714634 6714640 6714644 ...
## $ iso : num [1:100995] 356 356 356 356 356 356 356 356 356 356 ...
## $ event_id_cnty : chr [1:100995] "IND68948" "IND68961" "IND68962" "IND68960" ...
## $ event_id_no_cnty: num [1:100995] 68948 68961 68962 68960 68958 ...
## $ event_date : chr [1:100995] "31 December 2019" "31 December 2019" "31 December 2019" "31 December 2019" ...
## $ year : num [1:100995] 2019 2019 2019 2019 2019 ...
## $ time_precision : num [1:100995] 1 1 1 1 1 1 1 1 1 1 ...
## $ event_type : chr [1:100995] "Protests" "Protests" "Protests" "Protests" ...
## $ sub_event_type : chr [1:100995] "Peaceful protest" "Peaceful protest" "Peaceful protest" "Peaceful protest" ...
## $ actor1 : chr [1:100995] "Protesters (India)" "Protesters (India)" "Protesters (India)" "Protesters (India)" ...
## $ assoc_actor_1 : chr [1:100995] "Students (India); NSUI: National Students Union of India" "Labour Group (India)" "Sikh Group (India)" "Farmers (India)" ...
## $ inter1 : num [1:100995] 6 6 6 6 6 6 6 6 6 4 ...
## $ actor2 : chr [1:100995] NA NA NA NA ...
## $ assoc_actor_2 : chr [1:100995] NA NA NA NA ...
## $ inter2 : num [1:100995] 0 0 0 0 0 0 0 0 0 7 ...
## $ interaction : num [1:100995] 60 60 60 60 60 60 60 60 60 47 ...
## $ region : chr [1:100995] "Southern Asia" "Southern Asia" "Southern Asia" "Southern Asia" ...
## $ country : chr [1:100995] "India" "India" "India" "India" ...
## $ admin1 : chr [1:100995] "Odisha" "Uttar Pradesh" "Punjab" "Punjab" ...
## $ admin2 : chr [1:100995] "Khordha" "Mahoba" "Jalandhar" "Rupnagar" ...
## $ admin3 : chr [1:100995] "Bhubaneswar" "Mahoba" "Jalandhar" "Chamkaur Sahib" ...
## $ location : chr [1:100995] "Bhubaneswar" "Mahoba" "Jalandhar" "Morinda" ...
## $ latitude : num [1:100995] 20.3 25.3 31.3 30.8 31.3 ...
## $ longitude : num [1:100995] 85.8 79.9 75.6 76.5 75.6 ...
## $ geo_precision : num [1:100995] 1 1 1 1 1 1 1 2 1 1 ...
## $ source : chr [1:100995] "Pioneer (India)" "Amar Ujala" "Chandigarh Tribune" "Chandigarh Tribune" ...
## $ source_scale : chr [1:100995] "National" "Subnational" "Subnational" "Subnational" ...
## $ notes : chr [1:100995] "On 31 December, students' wing of INC and Milita Odisha Nishanibaran Abhiyan (MONA) held a protest in Bhubanesw"| __truncated__ "On Dec 31, protests were held by auto drivers in Mahoba (UP) over the beating of the auto union president durin"| __truncated__ "On Dec 31, protests were held by the Sikh community in front of the District Administrative Complex, Jalandhar "| __truncated__ "On Dec 31, protests were held by farmers outside the SDM's office in Morinda (Punjab) over the government's ina"| __truncated__ ...
## $ fatalities : num [1:100995] 0 0 0 0 0 0 0 0 0 1 ...
## $ timestamp : num [1:100995] 1.58e+09 1.58e+09 1.58e+09 1.58e+09 1.58e+09 ...
## $ iso3 : chr [1:100995] "IND" "IND" "IND" "IND" ...
## - attr(*, "spec")=
## .. cols(
## .. data_id = col_double(),
## .. iso = col_double(),
## .. event_id_cnty = col_character(),
## .. event_id_no_cnty = col_double(),
## .. event_date = col_character(),
## .. year = col_double(),
## .. time_precision = col_double(),
## .. event_type = col_character(),
## .. sub_event_type = col_character(),
## .. actor1 = col_character(),
## .. assoc_actor_1 = col_character(),
## .. inter1 = col_double(),
## .. actor2 = col_character(),
## .. assoc_actor_2 = col_character(),
## .. inter2 = col_double(),
## .. interaction = col_double(),
## .. region = col_character(),
## .. country = col_character(),
## .. admin1 = col_character(),
## .. admin2 = col_character(),
## .. admin3 = col_character(),
## .. location = col_character(),
## .. latitude = col_double(),
## .. longitude = col_double(),
## .. geo_precision = col_double(),
## .. source = col_character(),
## .. source_scale = col_character(),
## .. notes = col_character(),
## .. fatalities = col_double(),
## .. timestamp = col_double(),
## .. iso3 = col_character()
## .. )
As the dataframe contains ACLED data for 5 other Southern Asian countries that will not be included in this analysis, filter() will be used to subset out data associated with Pakistan. Also during data inspection, it was observed that the variable event_date is in string format. Therefore as.Date() along with mutate() will be used to convert the string datatype to date. Another column monyear will be created for subsequent time-series analysis aggregated by month and year.
PAK_df <- ACLED_SA %>%
filter(country=="Pakistan") %>%
mutate(event_date=parse_date(event_date, "%d %B %Y"))%>%
mutate(year=as.factor(year)) %>%
mutate(month=month(event_date)) %>%
mutate(monyear = as.Date(paste0(year,"-",month, "-01"),"%Y-%m-%d"))
head(PAK_df)
## # A tibble: 6 x 33
## data_id iso event_id_cnty event_id_no_cnty event_date year time_precision
## <dbl> <dbl> <chr> <dbl> <date> <fct> <dbl>
## 1 6714819 586 PAK55342 55342 2019-12-31 2019 1
## 2 6715352 586 PAK55350 55350 2019-12-31 2019 1
## 3 6715353 586 PAK55346 55346 2019-12-31 2019 1
## 4 6715356 586 PAK55352 55352 2019-12-31 2019 1
## 5 6715361 586 PAK55344 55344 2019-12-31 2019 1
## 6 6715367 586 PAK55349 55349 2019-12-31 2019 1
## # ... with 26 more variables: event_type <chr>, sub_event_type <chr>,
## # actor1 <chr>, assoc_actor_1 <chr>, inter1 <dbl>, actor2 <chr>,
## # assoc_actor_2 <chr>, inter2 <dbl>, interaction <dbl>, region <chr>,
## # country <chr>, admin1 <chr>, admin2 <chr>, admin3 <chr>, location <chr>,
## # latitude <dbl>, longitude <dbl>, geo_precision <dbl>, source <chr>,
## # source_scale <chr>, notes <chr>, fatalities <dbl>, timestamp <dbl>,
## # iso3 <chr>, month <dbl>, monyear <date>
sf dataframe from the aspatial dataframeSimilarly, the aspatial dataframe was converted to sf dataframe using st_as_sf. The Coordinate Reference System (CRS) is initially specified as EPSG4326, to align with the CRS for GADM file format. st_transform is then used to change the CRS to EPSG:24313 (Kalianpur 1962). This additional step is necessary to align the bbox.
More information on Kalianpur 1962 can be found here https://epsg.io/24313.
PAK_sf <- st_as_sf(PAK_df,
coords = c("longitude", "latitude"),
crs=4326)
PAK_sf <- st_transform(PAK_sf, 24313)
Geopackage (with file extension *.gpkg) is a type of geospatial vector data format to store geometric location and associated attribute information. The required geopackage was downloaded from the database of Global Administrative Areas (GADM) at https://gadm.org/data.html.
The st_read() function of sf package is used to convert the geospatial data to a sf (Special Features) data. It is preferred over its predecessor sp as geospatial and attribute data can be stored in spatial dataframe which can be manipulated like tibble dataframes using standard functions in the tidyverse packages.
# check available layers from a geopackage
st_layers(paste0(here::here(), "/Data/geopackage/gadm36_PAK.gpkg"))
# read in geopackage
PAK_sh1 <- st_read(dsn = paste0(here::here(), "/Data/geopackage/gadm36_PAK.gpkg"), layer="gadm36_PAK_1")
# Transform the CRS (source is using WGS84 datum)
PAK_sh1 <- st_transform(PAK_sh1, 24313)
## Driver: GPKG
## Available layers:
## layer_name geometry_type features fields
## 1 gadm36_PAK_3 Multi Polygon 141 16
## 2 gadm36_PAK_2 Multi Polygon 32 13
## 3 gadm36_PAK_1 Multi Polygon 8 10
## 4 gadm36_PAK_0 Multi Polygon 1 2
## Reading layer `gadm36_PAK_1' from data source `C:\Users\denis\OneDrive\Desktop\MITB\Term 2\ISSS608- Visual Analytics\Makeover\09\Makeover_09\Data\geopackage\gadm36_PAK.gpkg' using driver `GPKG'
## Simple feature collection with 8 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 60.89944 ymin: 23.70292 xmax: 77.84308 ymax: 37.09701
## geographic CRS: WGS 84
To improve user’s experience by including information about the point event using the id and popup.vars argument. id will display the value of the data variable when user mouse hover over the datapoint. popup.vars displays (a list of) data variables when user clicks on the datapoint.
PAK_tm <- tm_shape(PAK_sh1) +
tm_text("NAME_1")+
tm_fill() +
tm_borders("black", lwd = 1) +
tm_shape(PAK_sf) +
tm_dots(col="event_type", palette="Spectral", alpha= 0.5,
id= "data_id",
popup.vars= c("Country:"="country", "State/Province:"="admin1","Event Type"="event_type","Sub-Event Type"="sub_event_type","Primary actor"="actor1"))
tmap_leaflet(PAK_tm)