1. Overview

Suicides often are not reported in News and this led to people not aware of the number of suicides cases happening in their country or world. Using the data published world health organization, the aim of this visualization is to reveal the trend the suicide rate by trend, GDP, demographic profile and continent and using Map.

2. Proposed Design

Four plot in mind to reveal the trend for suicide using the available variables.

  • Overall Suicide Trend and Relationship to GDP per capita
    • Line chart and Line Chart with smoothing plot against Year and GDP_Per_Capita to reveal the trend and relationship
  • Suicide rate by Demographic Profile
    • Two demographic profile, Gender and Age, plot using the bar chart
  • Suicide rate by Continent and Year
    • Bar chart and Line chart use to identify suicide rate in each continent
  • Visualization of suicide rate through map
    • Using map as an alternative visualization of the suicide rate by country, continent and gender

2.1 Design Sketch

Sketch the design based on the idea in mind:

3. Data Wrangling

3.1 Install R Packages

To create the proposed plot/charts the following 6 package and the Shape file is required:

  • tidyverse: Package for data manipulation and exploration
  • ggplot2: Create Elegant Data Visualization using the grammar of graphics
  • plotly: Create interactive web graphics
  • reshape2: Flexibly Reshape Data
  • sf: Simple Features for R
  • tmap: Thematic Maps
packages = c('sf', 'tmap', 'tidyverse','plotly','ggplot2','reshape2')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3.2 Data Wrangling

Load the data using the read_csv function. Preparation and basic cleaning of the data was done through excel. basic data wrangling is required. Using the glimpse and summary function to view the data.

#Reading and viewing the data into R environment 
suicide_data<-read_csv('Suicide.csv')
glimpse(suicide_data)
## Rows: 10,976
## Columns: 9
## $ Code                <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"...
## $ Continent           <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",...
## $ Country             <chr> "Afghanistan", "Afghanistan", "Afghanistan", "A...
## $ Gender              <chr> "Female", "Male", "Female", "Male", "Female", "...
## $ Year                <dbl> 1990, 1990, 1991, 1991, 1992, 1992, 1993, 1993,...
## $ Deaths_Per_100K_Pop <dbl> 10.31850, 10.31850, 10.32701, 10.32701, 10.2714...
## $ Deaths_percent      <dbl> 0.3773036, 0.3773036, 0.3809912, 0.3809912, 0.4...
## $ Gender_Death        <dbl> 5.46, 14.75, 5.43, 14.81, 5.38, 14.80, 5.41, 15...
## $ GDP_Per_Capita      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
summary(suicide_data)
##      Code            Continent           Country             Gender         
##  Length:10976       Length:10976       Length:10976       Length:10976      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       Year      Deaths_Per_100K_Pop Deaths_percent     Gender_Death   
##  Min.   :1990   Min.   : 1.527      Min.   : 0.1094   Min.   :  0.61  
##  1st Qu.:1997   1st Qu.: 6.488      1st Qu.: 0.7136   1st Qu.:  4.77  
##  Median :2004   Median :10.218      Median : 1.1575   Median :  8.20  
##  Mean   :2004   Mean   :11.954      Mean   : 1.4229   Mean   : 12.20  
##  3rd Qu.:2010   3rd Qu.:14.807      3rd Qu.: 1.8183   3rd Qu.: 16.05  
##  Max.   :2017   Max.   :98.832      Max.   :13.8871   Max.   :145.46  
##                                                       NA's   :112     
##  GDP_Per_Capita    
##  Min.   :   285.6  
##  1st Qu.:  2479.3  
##  Median :  6960.9  
##  Mean   : 13483.5  
##  3rd Qu.: 17741.8  
##  Max.   :141635.0  
##  NA's   :1054

From the output it is noted the data have 10, 976 rows and 9 Column. The 9 columns(variables) are: Code, Country, Country, Gender, Year, Deaths_Per_100k_Pop, Deaths_percent, Gender_Death and GDP_Per_Capita.

Gender_Deaths are the Deaths_Per_100K_Pop of the gender type and GDP_Per_Capita provides per capita values for gross domestic product (GDP) expressed in current international dollars converted by purchasing power parity (PPP) conversion factor.

There are missing values for GDP_Per_Capita and Gender_Death and there are duplicate values for Deaths_Per_100k_Pop. therefore the dcast function is used to convert the data from long format to wide format.

#convert to wide format and verify the data
suicide<-dcast(suicide_data,Continent+Country+Code+Year+Deaths_Per_100K_Pop+Deaths_percent+GDP_Per_Capita~Gender,value.var='Gender_Death')
glimpse(suicide)
## Rows: 5,488
## Columns: 9
## $ Continent           <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",...
## $ Country             <chr> "Afghanistan", "Afghanistan", "Afghanistan", "A...
## $ Code                <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"...
## $ Year                <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,...
## $ Deaths_Per_100K_Pop <dbl> 10.318504, 10.327010, 10.271411, 10.376123, 10....
## $ Deaths_percent      <dbl> 0.3773036, 0.3809912, 0.4099087, 0.4118492, 0.3...
## $ GDP_Per_Capita      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ Female              <dbl> 5.46, 5.43, 5.38, 5.41, 5.50, 5.53, 5.52, 5.53,...
## $ Male                <dbl> 14.75, 14.81, 14.80, 15.02, 15.34, 15.54, 15.65...
summary(suicide)
##   Continent           Country              Code                Year     
##  Length:5488        Length:5488        Length:5488        Min.   :1990  
##  Class :character   Class :character   Class :character   1st Qu.:1997  
##  Mode  :character   Mode  :character   Mode  :character   Median :2004  
##                                                           Mean   :2004  
##                                                           3rd Qu.:2010  
##                                                           Max.   :2017  
##                                                                         
##  Deaths_Per_100K_Pop Deaths_percent    GDP_Per_Capita         Female      
##  Min.   : 1.527      Min.   : 0.1094   Min.   :   285.6   Min.   : 0.610  
##  1st Qu.: 6.488      1st Qu.: 0.7136   1st Qu.:  2479.3   1st Qu.: 3.000  
##  Median :10.218      Median : 1.1575   Median :  6960.9   Median : 5.070  
##  Mean   :11.954      Mean   : 1.4229   Mean   : 13483.5   Mean   : 5.776  
##  3rd Qu.:14.807      3rd Qu.: 1.8183   3rd Qu.: 17741.8   3rd Qu.: 7.290  
##  Max.   :98.832      Max.   :13.8871   Max.   :141635.0   Max.   :48.710  
##                                        NA's   :527        NA's   :56      
##       Male       
##  Min.   :  2.52  
##  1st Qu.:  9.62  
##  Median : 15.54  
##  Mean   : 18.62  
##  3rd Qu.: 22.73  
##  Max.   :145.46  
##  NA's   :56

When viewing the data, it is noted the country consist of world, which this an aggregate value. To create a filtered data set excluding the world from the country. At this stage data wrangling is completed.

#filered out 'world' from the data set
suicide_filtered<-suicide%>%
  filter(Country!='World')

4. Proposed Visualization

Create the plot as the proposed sketch

4.1 Suicide rate trend and relationship with GDP

To visualize the suicide rate trend over the years, ggplot using ggeom_line and ggeom_point, plotting the suicide rate (Deaths_Per_100K_Pop) against Year. Global average is also calculated and added to plot. This average is represented by the dashline and will be use in other plot also. The static plot is converted to interactive using the ggplotly function and combine the two plots using subplot.

Insight from Line Plot:

  • Global average suicide rate is 12.98
  • Peak suicide rate was 16.05 deaths per 100k in 1994
  • Suicide rate decreased steadily from 16.05 to 9.98 (per 100K population) over the population below the global average

From overall trend of the suicide rate, suicide rate has decrease over the year. As country develop and become more affluent over the year, does suicide rate decrease with country getting more affluent? This is achieve using ggeom_smooth to show the relationship of GDP per Capita and suicide rate.

Insight from the Combined Plot:

  • A weak negative relationship between GDP per Capita and Suicide, as country become wealthier, suicide rate drop.
#Calculate the global average suicide rate
global_average <- suicide %>%
  filter(Country=='World') %>%
  summarise(mean=mean(Deaths_Per_100K_Pop))

#creating the line plot 
Line_suicide<-suicide %>%
  filter(Country=='World') %>%
 ggplot(aes(x = Year, y = Deaths_Per_100K_Pop)) + 
  geom_line(col = "deepskyblue3", size = 1) + 
  geom_point(col = "Steelblue", size = 1) +
  geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
   labs(title = "1990 to 2017 Global Average Suicides (per 100K Population)",
       x = "Year", 
       y = "Avg Suicides per 100k Pop") + 
  scale_x_continuous(breaks = seq(1990, 2017, 2)) + 
  scale_y_continuous(breaks = seq(10, 20))+
  theme_bw()

Line_Plotly<-ggplotly(Line_suicide)

# Creating scatter plot with GDP Variables 
##data manipulation 
suicide_gdp<-dcast(suicide_data,Continent+Country+Deaths_Per_100K_Pop+GDP_Per_Capita+Year~Gender,value.var='Gender_Death')

#excluding those with missing values
suicide_gdp2<-suicide_gdp %>%
  filter(GDP_Per_Capita>0)%>%
  filter(Country!='World')%>%
  group_by(Country, Continent)%>%
  summarize(GDP_Per_Capita= sum(GDP_Per_Capita),
            Suicides=sum(Deaths_Per_100K_Pop))

# Plot GDP and Suicides rate with geom_smooth to identify the relationship 
GDPvsSuicides<-ggplot(suicide_gdp2,aes(x=GDP_Per_Capita,y=Suicides,col=Continent))+
  geom_point()+
  geom_smooth(method='lm',aes(group=1))+
  theme_classic()+
  labs(title = "Correlation between GDP (per capita) and Suicides per 100k", 
       x = "GDP (per capita)", 
       y = "Suicides per 100k", 
       col = "Continent")

GDP_Plotly<-ggplotly(GDPvsSuicides)

#combine 2 plot together
subplot(Line_Plotly,GDP_Plotly, nrows=2,shareY = TRUE, shareX= FALSE, titleX=TRUE)%>%
  layout(title = 'Suicide Trend and Relation with GDP per Capita',
         legend=list(x=40,y=0.01),
         showlegend=TRUE)

4.2 Suicide by Demographic Profile

Gender: Prior to create plot for visualizing the Suicide rate by Gender, missing value is filtered out, then using the bar plot to identify the suicide rate in Female and Male and further showing the suicide rate by gender in world map view.

Age: Visualization of the Age profile and suicide using the Bar plot. The Age data was stored in separate file. Loading and data wrangling was done before creating the Bar plot.

Insight from the Combined Plot:

  • Continent except Oceania, suicide rate increase with Age
  • Oceania’s suicide rate is the highest in age profile of 15 to 49
  • Male suicide rate is more than twice of Female
#Bar plot for Gender
Suicide_gender <- suicide_data %>%
  filter(Country!='World')%>%
  filter(Gender_Death>0)
Gender_bar<-ggplot(Suicide_gender,aes(x=Gender,y=Gender_Death,fill=Gender))+
  stat_summary(geom='bar',
               fun.y='mean')+
  geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
  labs(title = "Avg Suicides(per 100K Population), by Gender",
       x = "Gender", 
       y = "Avg Suicides Per 100k Pop")+
  theme(legend.position = "none") + 
  theme(axis.line=element_line(size=1))+
  scale_fill_brewer(palette = 'Dark2') 

Gender_Plotly<-ggplotly(Gender_bar)%>%
   layout(title = 'Suicides(Per 100K Population), by Gender')

# Bar Plot for Age 
suicide_age<-read_csv('SuicideAge.csv')

suicide_age$Age<-factor(suicide_age$Age,levels=c('5 to 14',
                                                '15 to 49',
                                                '50 to 69',
                                                '70+'))

suicide_age2<-suicide_age%>%
  group_by(Continent,Age)%>%
  summarize(n = n(),
            suicides = mean(Death_Per_100K_Pop))

suicideage_plot<-ggplot(suicide_age2,aes(x = reorder(Continent,suicides), y = suicides, fill = Age)) + 
  geom_bar(stat = "identity", position = "dodge")+
  geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
  labs(title = "Avg Suicides(per 100K Population), by Age",
       x = "Continent", 
       y = "Avg Suicides per 100k Pop", 
       fill = "Demographic\n Profile") +
  theme_classic()

Age_Plotly<-ggplotly(suicideage_plot)

subplot(Gender_Plotly,Age_Plotly, nrows=2,shareY = TRUE, shareX= FALSE, titleX=FALSE)%>%
  layout(title = 'Suicide Trend by Demographic Profile',
         legend=list(x=40,y=0.03))

4.3 Suicide by Continent

To visualize the suicide rate by continent 2 plot was created. Bar plot to identify suicides rate in each continent and line to identify the suicides by continent over time.

Insight from the Combined Plot:

  • Africa and Europe are the two continents having more than 1500(per 100K population) suicides rate
  • Europe has the highest decrease in suicides rate over the year ~28% decrease
  • South America experience a slight increase suicides rate (Compare year 1990 and 2017)
#Bar Chart
Suicide_continent2<-suicide_filtered %>%
  group_by(Continent) %>%
  summarize(suicides= sum(Deaths_Per_100K_Pop))

continent_plot <- ggplot(Suicide_continent2, aes(x =reorder(Continent,-suicides), y = suicides)) + 
  stat_summary(geom='bar',
               fill.y='sum',
               fill='Steelblue')+ 
  labs(title = "Global Suicides (per 100k Population), by Continent",
  x = 'Continent', 
  y = 'Suicides per 100k') +
  theme_classic()
 
Continent_plotly<-ggplotly(continent_plot)
 
#Line Chart 
continent_time <- suicide_filtered %>%
  group_by(Year, Continent) %>%
  summarize(suicide_per_100k = sum(Deaths_Per_100K_Pop))

continent_time_plot <- ggplot(continent_time, aes(x = Year, y = suicide_per_100k, col = Continent)) + 
  geom_line() + 
  geom_point() + 
  labs(title = "Continent Trends Over Time", 
       x = "Year", 
       y = "Suicides per 100k", 
       color = "Continent") + 
  theme_classic()+
  scale_x_continuous(breaks = seq(1990, 2017, 2))

time_continent_plotly<-ggplotly(continent_time_plot)

#combine 2 plot together
subplot(Continent_plotly,time_continent_plotly, nrows=2, shareY = TRUE)%>%
  layout(title = 'Suicide By Continent',
         legend=list(x=40,y=0.01),
         showlegend=TRUE)

4.4 Suicide rate - Visualization through World Map

Visualize the total suicide using world map. This required to load the shape file and using the Tmap R package. The shape file is loaded and left join with the using Country column.

Insight from Map:

  • Russia and Greenland has the highest total suicide per 100K Population
# Loading the shape and 
map_world <- st_read(dsn = 'Geo', 
                layer = "World")
## Reading layer `World' from data source `C:\Users\Linya\Desktop\SMU\5. Year 1 Module\7. ISSS608-Visual Analytics and Applications (Term 3)\Assignment\Assignment 5\Geo' using driver `ESRI Shapefile'
## Simple feature collection with 251 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## geographic CRS: WGS 84
map_world
## Simple feature collection with 251 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## geographic CRS: WGS 84
## First 10 features:
##    OBJECTID          CNTRY_NAME                       geometry
## 1         1               Aruba MULTIPOLYGON (((-69.88223 1...
## 2         2 Antigua and Barbuda MULTIPOLYGON (((-61.73889 1...
## 3         3         Afghanistan MULTIPOLYGON (((61.27656 35...
## 4         4             Algeria MULTIPOLYGON (((-5.152135 3...
## 5         5          Azerbaijan MULTIPOLYGON (((45.02583 41...
## 6         6             Albania MULTIPOLYGON (((20.79192 40...
## 7         7             Armenia MULTIPOLYGON (((45.52888 40...
## 8         8             Andorra MULTIPOLYGON (((1.445833 42...
## 9         9              Angola MULTIPOLYGON (((13.09139 -4...
## 10       10      American Samoa MULTIPOLYGON (((-170.7439 -...
#create interactive map
tmap_mode("view")
Suicide_country<-suicide_filtered %>%
  group_by(Country) %>%
  summarize(suicides= sum(Deaths_Per_100K_Pop))

map_world_suicide<-left_join(map_world,Suicide_country,
                                 by=c('CNTRY_NAME'='Country'))

tm_shape(map_world_suicide) +
  tm_fill('suicides',
          style = "quantile",
          palette = "Blues") + 
  tm_layout(legend.show = TRUE,
            main.title = '1990 to 2017 Suicide per 100K Population by Country',
            main.title.position = 'Center',
            main.title.size = 1.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.10) +
  tm_borders(alpha = 0.5)

4.4.1 Comparing Deaths (per 100K population) and Deaths percentage

For comparing the suicide rate across the country using per 100K population use. Using the tm_bubbles adding the death_percent to the plot, it helps to visualize the percentage of death case in each country. Static Plot is preferred due to the many label required for the legend.

tmap_mode("plot") # change to interactive mode by changing to 'view'
Suicide_continent<-suicide_filtered %>%
  group_by(Continent,Country) %>%
  summarize(suicides= mean(Deaths_Per_100K_Pop),
            suicides_percent=mean(Deaths_percent))

map_continent<-left_join(map_world,Suicide_continent,
                                 by=c('CNTRY_NAME'='Country'))

tm_shape(map_continent) +
  tm_fill('suicides',
          style = "quantile",
          palette = "Blues") + 
  tm_bubbles(size='suicides_percent',col='Continent')+ 
  tm_layout(legend.show = TRUE,
            main.title = '1990 to 2017 Average Suicide',
            main.title.position = 'center',
            main.title.size = 1.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.10) +
  tm_borders(alpha = 0.5)

4.4.2 Total Suicide by Continent

Beside viewing the suicide rate (Death per 100K Population), it can view by continent using tm_facet visualizing which country in the continent have higher suicide rate

Insight from Map by Continent:

  • Africa: Country in southern of Africa have higher suicide rate
  • Asia: Higher suicide rate in India, Sri Lanka, Mongolia, Japan and South Korea
  • North America: Greenland has the highest suicide rate
Suicide_continent<-suicide_filtered %>%
  group_by(Continent,Country) %>%
  summarize(suicides= sum(Deaths_Per_100K_Pop))

tmap_options(limits = c(facets.view = 29))
map_continent<-left_join(map_world,Suicide_continent,
                                 by=c('CNTRY_NAME'='Country'))

tm_shape(map_continent) +
  tm_fill('suicides',
          style = "quantile",
          palette = "Blues") + 
  tm_facets(by='Continent', 
            free.coords=TRUE) +
  tm_layout(legend.show = FALSE,
            title.position = c("center", "center"), 
            title.size = 20) +
  tm_borders(alpha = 0.5)

4.4.3 Suicide rate by Gender

Bar Chart was used to visualization the suicide rate in female and Male, it provides overview by gender. Visualization by gender can be done through map by using tm_fill with more than 1 variable.

Insight from Map by Gender:

  • Female suicide rate is higher in Asia continent (mainly in India, Sri Lanka) as compare to Male
  • Male suicide rate is higher in Russia as compare to Female
Suicide_gender2<-suicide %>%
  filter(Female>0)%>%
  filter(Male>0)%>%
  group_by(Country) %>%
  summarize(suicides_Female= sum(Female),
            suicides_Male=sum(Male))

map_gender_suicide<-left_join(map_world,Suicide_gender2,
                                 by=c('CNTRY_NAME'='Country'))
tmap_mode('plot')
tm_shape(map_gender_suicide) +
  tm_fill(c('suicides_Female','suicides_Male'),
          style = "equal",
          palette = "Blues") + 
  tm_layout(legend.show = TRUE,
            legend.position = c("left", "bottom"),
            main.title = '1990 to 2017 Suicide(per 100K Population) by Gender',
            main.title.position = 'Center',
            main.title.size = 1) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar(width = 0.10) +
  tm_borders(alpha = 0.5)

5. Insight

Visualization is created for the suicide rate by overall trend, relationship with GDP per Capita, demographic profile, continent using the Bar, Line Plot. The same visualization can be achieve using map.

The overall suicide rate decreases over the years and fall below the global average rate. The reason could be as the country progress become country become more wealthier a condition of living improves. Therefore, suicide rate decrease. The negative relationship between suicide and GDP per capita indicate that increase in GDP per capita lead to a decrease in suicide. However, it a weak relationship.

Visualization by demographic profile show that Male’s suicide rate is more than twice of Female and Age 70+ has highest suicide rate across the continent except for Oceania. The average suicide rate for different age group are above the global average rate, except for age 5 - 14.

Suicide by continent show that Africa and Europe have higher suicide cases of 19022 and 18026 per 100K population respectively, highest two among the other continent. At country level, Russia in continent Europe has the higher suicide rate. This could lead to Europe continent having higher overall suicide rate. For better result, Europe can split into Eastern Europe and Western Europe. Another trend observed suicide rate in Europe decrease more rapidly over the years compare to other continent.

6. Challenges Faced

Generating the shape file: Issues with generating the Shape file using ArcGIS Pro. This was resolved to download the shape file from ArcGIS website.

Data issues: Data download are in multiples csv, having difficulty joining in R and the need to use other software to combine the files together. Though still unable to achieve 1 single csv for all the data, as after the combining the value was wrong after checking

Aesthetics and coding issues: First attempt was trying to use plotly to create interactive chart, as not very familiar with plotly as compare to ggplot, after searching it is noted that the ‘ggplotly’ function can directly convert ggplot into interactive plot. This help to save time and is an easier way of creating interactive plot at the first try, output does not satisfy the aesthetic requirement, multiplies attempt was made to achieve the desired output.

Unable to publish to Rpub when tmap mode is set to view After multiples attempt, document is knitted but unable to publish to Rpub. The issue is resolved after the tmap view set to plot for map visualization of suicide by gender.