Suicides often are not reported in News and this led to people not aware of the number of suicides cases happening in their country or world. Using the data published world health organization, the aim of this visualization is to reveal the trend the suicide rate by trend, GDP, demographic profile and continent and using Map.
Four plot in mind to reveal the trend for suicide using the available variables.
Sketch the design based on the idea in mind:
To create the proposed plot/charts the following 6 package and the Shape file is required:
packages = c('sf', 'tmap', 'tidyverse','plotly','ggplot2','reshape2')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Load the data using the read_csv function. Preparation and basic cleaning of the data was done through excel. basic data wrangling is required. Using the glimpse and summary function to view the data.
#Reading and viewing the data into R environment
suicide_data<-read_csv('Suicide.csv')
glimpse(suicide_data)
## Rows: 10,976
## Columns: 9
## $ Code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"...
## $ Continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",...
## $ Country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "A...
## $ Gender <chr> "Female", "Male", "Female", "Male", "Female", "...
## $ Year <dbl> 1990, 1990, 1991, 1991, 1992, 1992, 1993, 1993,...
## $ Deaths_Per_100K_Pop <dbl> 10.31850, 10.31850, 10.32701, 10.32701, 10.2714...
## $ Deaths_percent <dbl> 0.3773036, 0.3773036, 0.3809912, 0.3809912, 0.4...
## $ Gender_Death <dbl> 5.46, 14.75, 5.43, 14.81, 5.38, 14.80, 5.41, 15...
## $ GDP_Per_Capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
summary(suicide_data)
## Code Continent Country Gender
## Length:10976 Length:10976 Length:10976 Length:10976
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Year Deaths_Per_100K_Pop Deaths_percent Gender_Death
## Min. :1990 Min. : 1.527 Min. : 0.1094 Min. : 0.61
## 1st Qu.:1997 1st Qu.: 6.488 1st Qu.: 0.7136 1st Qu.: 4.77
## Median :2004 Median :10.218 Median : 1.1575 Median : 8.20
## Mean :2004 Mean :11.954 Mean : 1.4229 Mean : 12.20
## 3rd Qu.:2010 3rd Qu.:14.807 3rd Qu.: 1.8183 3rd Qu.: 16.05
## Max. :2017 Max. :98.832 Max. :13.8871 Max. :145.46
## NA's :112
## GDP_Per_Capita
## Min. : 285.6
## 1st Qu.: 2479.3
## Median : 6960.9
## Mean : 13483.5
## 3rd Qu.: 17741.8
## Max. :141635.0
## NA's :1054
From the output it is noted the data have 10, 976 rows and 9 Column. The 9 columns(variables) are: Code, Country, Country, Gender, Year, Deaths_Per_100k_Pop, Deaths_percent, Gender_Death and GDP_Per_Capita.
Gender_Deaths are the Deaths_Per_100K_Pop of the gender type and GDP_Per_Capita provides per capita values for gross domestic product (GDP) expressed in current international dollars converted by purchasing power parity (PPP) conversion factor.
There are missing values for GDP_Per_Capita and Gender_Death and there are duplicate values for Deaths_Per_100k_Pop. therefore the dcast function is used to convert the data from long format to wide format.
#convert to wide format and verify the data
suicide<-dcast(suicide_data,Continent+Country+Code+Year+Deaths_Per_100K_Pop+Deaths_percent+GDP_Per_Capita~Gender,value.var='Gender_Death')
glimpse(suicide)
## Rows: 5,488
## Columns: 9
## $ Continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",...
## $ Country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "A...
## $ Code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"...
## $ Year <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,...
## $ Deaths_Per_100K_Pop <dbl> 10.318504, 10.327010, 10.271411, 10.376123, 10....
## $ Deaths_percent <dbl> 0.3773036, 0.3809912, 0.4099087, 0.4118492, 0.3...
## $ GDP_Per_Capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ Female <dbl> 5.46, 5.43, 5.38, 5.41, 5.50, 5.53, 5.52, 5.53,...
## $ Male <dbl> 14.75, 14.81, 14.80, 15.02, 15.34, 15.54, 15.65...
summary(suicide)
## Continent Country Code Year
## Length:5488 Length:5488 Length:5488 Min. :1990
## Class :character Class :character Class :character 1st Qu.:1997
## Mode :character Mode :character Mode :character Median :2004
## Mean :2004
## 3rd Qu.:2010
## Max. :2017
##
## Deaths_Per_100K_Pop Deaths_percent GDP_Per_Capita Female
## Min. : 1.527 Min. : 0.1094 Min. : 285.6 Min. : 0.610
## 1st Qu.: 6.488 1st Qu.: 0.7136 1st Qu.: 2479.3 1st Qu.: 3.000
## Median :10.218 Median : 1.1575 Median : 6960.9 Median : 5.070
## Mean :11.954 Mean : 1.4229 Mean : 13483.5 Mean : 5.776
## 3rd Qu.:14.807 3rd Qu.: 1.8183 3rd Qu.: 17741.8 3rd Qu.: 7.290
## Max. :98.832 Max. :13.8871 Max. :141635.0 Max. :48.710
## NA's :527 NA's :56
## Male
## Min. : 2.52
## 1st Qu.: 9.62
## Median : 15.54
## Mean : 18.62
## 3rd Qu.: 22.73
## Max. :145.46
## NA's :56
When viewing the data, it is noted the country consist of world, which this an aggregate value. To create a filtered data set excluding the world from the country. At this stage data wrangling is completed.
#filered out 'world' from the data set
suicide_filtered<-suicide%>%
filter(Country!='World')
Create the plot as the proposed sketch
To visualize the suicide rate trend over the years, ggplot using ggeom_line and ggeom_point, plotting the suicide rate (Deaths_Per_100K_Pop) against Year. Global average is also calculated and added to plot. This average is represented by the dashline and will be use in other plot also. The static plot is converted to interactive using the ggplotly function and combine the two plots using subplot.
Insight from Line Plot:
From overall trend of the suicide rate, suicide rate has decrease over the year. As country develop and become more affluent over the year, does suicide rate decrease with country getting more affluent? This is achieve using ggeom_smooth to show the relationship of GDP per Capita and suicide rate.
Insight from the Combined Plot:
#Calculate the global average suicide rate
global_average <- suicide %>%
filter(Country=='World') %>%
summarise(mean=mean(Deaths_Per_100K_Pop))
#creating the line plot
Line_suicide<-suicide %>%
filter(Country=='World') %>%
ggplot(aes(x = Year, y = Deaths_Per_100K_Pop)) +
geom_line(col = "deepskyblue3", size = 1) +
geom_point(col = "Steelblue", size = 1) +
geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
labs(title = "1990 to 2017 Global Average Suicides (per 100K Population)",
x = "Year",
y = "Avg Suicides per 100k Pop") +
scale_x_continuous(breaks = seq(1990, 2017, 2)) +
scale_y_continuous(breaks = seq(10, 20))+
theme_bw()
Line_Plotly<-ggplotly(Line_suicide)
# Creating scatter plot with GDP Variables
##data manipulation
suicide_gdp<-dcast(suicide_data,Continent+Country+Deaths_Per_100K_Pop+GDP_Per_Capita+Year~Gender,value.var='Gender_Death')
#excluding those with missing values
suicide_gdp2<-suicide_gdp %>%
filter(GDP_Per_Capita>0)%>%
filter(Country!='World')%>%
group_by(Country, Continent)%>%
summarize(GDP_Per_Capita= sum(GDP_Per_Capita),
Suicides=sum(Deaths_Per_100K_Pop))
# Plot GDP and Suicides rate with geom_smooth to identify the relationship
GDPvsSuicides<-ggplot(suicide_gdp2,aes(x=GDP_Per_Capita,y=Suicides,col=Continent))+
geom_point()+
geom_smooth(method='lm',aes(group=1))+
theme_classic()+
labs(title = "Correlation between GDP (per capita) and Suicides per 100k",
x = "GDP (per capita)",
y = "Suicides per 100k",
col = "Continent")
GDP_Plotly<-ggplotly(GDPvsSuicides)
#combine 2 plot together
subplot(Line_Plotly,GDP_Plotly, nrows=2,shareY = TRUE, shareX= FALSE, titleX=TRUE)%>%
layout(title = 'Suicide Trend and Relation with GDP per Capita',
legend=list(x=40,y=0.01),
showlegend=TRUE)
Gender: Prior to create plot for visualizing the Suicide rate by Gender, missing value is filtered out, then using the bar plot to identify the suicide rate in Female and Male and further showing the suicide rate by gender in world map view.
Age: Visualization of the Age profile and suicide using the Bar plot. The Age data was stored in separate file. Loading and data wrangling was done before creating the Bar plot.
Insight from the Combined Plot:
#Bar plot for Gender
Suicide_gender <- suicide_data %>%
filter(Country!='World')%>%
filter(Gender_Death>0)
Gender_bar<-ggplot(Suicide_gender,aes(x=Gender,y=Gender_Death,fill=Gender))+
stat_summary(geom='bar',
fun.y='mean')+
geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
labs(title = "Avg Suicides(per 100K Population), by Gender",
x = "Gender",
y = "Avg Suicides Per 100k Pop")+
theme(legend.position = "none") +
theme(axis.line=element_line(size=1))+
scale_fill_brewer(palette = 'Dark2')
Gender_Plotly<-ggplotly(Gender_bar)%>%
layout(title = 'Suicides(Per 100K Population), by Gender')
# Bar Plot for Age
suicide_age<-read_csv('SuicideAge.csv')
suicide_age$Age<-factor(suicide_age$Age,levels=c('5 to 14',
'15 to 49',
'50 to 69',
'70+'))
suicide_age2<-suicide_age%>%
group_by(Continent,Age)%>%
summarize(n = n(),
suicides = mean(Death_Per_100K_Pop))
suicideage_plot<-ggplot(suicide_age2,aes(x = reorder(Continent,suicides), y = suicides, fill = Age)) +
geom_bar(stat = "identity", position = "dodge")+
geom_hline(yintercept=unlist(global_average$mean),linetype=2,color= "grey35", size = 0.5)+
labs(title = "Avg Suicides(per 100K Population), by Age",
x = "Continent",
y = "Avg Suicides per 100k Pop",
fill = "Demographic\n Profile") +
theme_classic()
Age_Plotly<-ggplotly(suicideage_plot)
subplot(Gender_Plotly,Age_Plotly, nrows=2,shareY = TRUE, shareX= FALSE, titleX=FALSE)%>%
layout(title = 'Suicide Trend by Demographic Profile',
legend=list(x=40,y=0.03))
To visualize the suicide rate by continent 2 plot was created. Bar plot to identify suicides rate in each continent and line to identify the suicides by continent over time.
Insight from the Combined Plot:
#Bar Chart
Suicide_continent2<-suicide_filtered %>%
group_by(Continent) %>%
summarize(suicides= sum(Deaths_Per_100K_Pop))
continent_plot <- ggplot(Suicide_continent2, aes(x =reorder(Continent,-suicides), y = suicides)) +
stat_summary(geom='bar',
fill.y='sum',
fill='Steelblue')+
labs(title = "Global Suicides (per 100k Population), by Continent",
x = 'Continent',
y = 'Suicides per 100k') +
theme_classic()
Continent_plotly<-ggplotly(continent_plot)
#Line Chart
continent_time <- suicide_filtered %>%
group_by(Year, Continent) %>%
summarize(suicide_per_100k = sum(Deaths_Per_100K_Pop))
continent_time_plot <- ggplot(continent_time, aes(x = Year, y = suicide_per_100k, col = Continent)) +
geom_line() +
geom_point() +
labs(title = "Continent Trends Over Time",
x = "Year",
y = "Suicides per 100k",
color = "Continent") +
theme_classic()+
scale_x_continuous(breaks = seq(1990, 2017, 2))
time_continent_plotly<-ggplotly(continent_time_plot)
#combine 2 plot together
subplot(Continent_plotly,time_continent_plotly, nrows=2, shareY = TRUE)%>%
layout(title = 'Suicide By Continent',
legend=list(x=40,y=0.01),
showlegend=TRUE)
Visualize the total suicide using world map. This required to load the shape file and using the Tmap R package. The shape file is loaded and left join with the using Country column.
Insight from Map:
# Loading the shape and
map_world <- st_read(dsn = 'Geo',
layer = "World")
## Reading layer `World' from data source `C:\Users\Linya\Desktop\SMU\5. Year 1 Module\7. ISSS608-Visual Analytics and Applications (Term 3)\Assignment\Assignment 5\Geo' using driver `ESRI Shapefile'
## Simple feature collection with 251 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## geographic CRS: WGS 84
map_world
## Simple feature collection with 251 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236
## geographic CRS: WGS 84
## First 10 features:
## OBJECTID CNTRY_NAME geometry
## 1 1 Aruba MULTIPOLYGON (((-69.88223 1...
## 2 2 Antigua and Barbuda MULTIPOLYGON (((-61.73889 1...
## 3 3 Afghanistan MULTIPOLYGON (((61.27656 35...
## 4 4 Algeria MULTIPOLYGON (((-5.152135 3...
## 5 5 Azerbaijan MULTIPOLYGON (((45.02583 41...
## 6 6 Albania MULTIPOLYGON (((20.79192 40...
## 7 7 Armenia MULTIPOLYGON (((45.52888 40...
## 8 8 Andorra MULTIPOLYGON (((1.445833 42...
## 9 9 Angola MULTIPOLYGON (((13.09139 -4...
## 10 10 American Samoa MULTIPOLYGON (((-170.7439 -...
#create interactive map
tmap_mode("view")
Suicide_country<-suicide_filtered %>%
group_by(Country) %>%
summarize(suicides= sum(Deaths_Per_100K_Pop))
map_world_suicide<-left_join(map_world,Suicide_country,
by=c('CNTRY_NAME'='Country'))
tm_shape(map_world_suicide) +
tm_fill('suicides',
style = "quantile",
palette = "Blues") +
tm_layout(legend.show = TRUE,
main.title = '1990 to 2017 Suicide per 100K Population by Country',
main.title.position = 'Center',
main.title.size = 1.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.10) +
tm_borders(alpha = 0.5)
For comparing the suicide rate across the country using per 100K population use. Using the tm_bubbles adding the death_percent to the plot, it helps to visualize the percentage of death case in each country. Static Plot is preferred due to the many label required for the legend.
tmap_mode("plot") # change to interactive mode by changing to 'view'
Suicide_continent<-suicide_filtered %>%
group_by(Continent,Country) %>%
summarize(suicides= mean(Deaths_Per_100K_Pop),
suicides_percent=mean(Deaths_percent))
map_continent<-left_join(map_world,Suicide_continent,
by=c('CNTRY_NAME'='Country'))
tm_shape(map_continent) +
tm_fill('suicides',
style = "quantile",
palette = "Blues") +
tm_bubbles(size='suicides_percent',col='Continent')+
tm_layout(legend.show = TRUE,
main.title = '1990 to 2017 Average Suicide',
main.title.position = 'center',
main.title.size = 1.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.10) +
tm_borders(alpha = 0.5)
Beside viewing the suicide rate (Death per 100K Population), it can view by continent using tm_facet visualizing which country in the continent have higher suicide rate
Insight from Map by Continent:
Suicide_continent<-suicide_filtered %>%
group_by(Continent,Country) %>%
summarize(suicides= sum(Deaths_Per_100K_Pop))
tmap_options(limits = c(facets.view = 29))
map_continent<-left_join(map_world,Suicide_continent,
by=c('CNTRY_NAME'='Country'))
tm_shape(map_continent) +
tm_fill('suicides',
style = "quantile",
palette = "Blues") +
tm_facets(by='Continent',
free.coords=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
Bar Chart was used to visualization the suicide rate in female and Male, it provides overview by gender. Visualization by gender can be done through map by using tm_fill with more than 1 variable.
Insight from Map by Gender:
Suicide_gender2<-suicide %>%
filter(Female>0)%>%
filter(Male>0)%>%
group_by(Country) %>%
summarize(suicides_Female= sum(Female),
suicides_Male=sum(Male))
map_gender_suicide<-left_join(map_world,Suicide_gender2,
by=c('CNTRY_NAME'='Country'))
tmap_mode('plot')
tm_shape(map_gender_suicide) +
tm_fill(c('suicides_Female','suicides_Male'),
style = "equal",
palette = "Blues") +
tm_layout(legend.show = TRUE,
legend.position = c("left", "bottom"),
main.title = '1990 to 2017 Suicide(per 100K Population) by Gender',
main.title.position = 'Center',
main.title.size = 1) +
tm_compass(type="8star", size = 1) +
tm_scale_bar(width = 0.10) +
tm_borders(alpha = 0.5)
Visualization is created for the suicide rate by overall trend, relationship with GDP per Capita, demographic profile, continent using the Bar, Line Plot. The same visualization can be achieve using map.
The overall suicide rate decreases over the years and fall below the global average rate. The reason could be as the country progress become country become more wealthier a condition of living improves. Therefore, suicide rate decrease. The negative relationship between suicide and GDP per capita indicate that increase in GDP per capita lead to a decrease in suicide. However, it a weak relationship.
Visualization by demographic profile show that Male’s suicide rate is more than twice of Female and Age 70+ has highest suicide rate across the continent except for Oceania. The average suicide rate for different age group are above the global average rate, except for age 5 - 14.
Suicide by continent show that Africa and Europe have higher suicide cases of 19022 and 18026 per 100K population respectively, highest two among the other continent. At country level, Russia in continent Europe has the higher suicide rate. This could lead to Europe continent having higher overall suicide rate. For better result, Europe can split into Eastern Europe and Western Europe. Another trend observed suicide rate in Europe decrease more rapidly over the years compare to other continent.
Generating the shape file: Issues with generating the Shape file using ArcGIS Pro. This was resolved to download the shape file from ArcGIS website.
Data issues: Data download are in multiples csv, having difficulty joining in R and the need to use other software to combine the files together. Though still unable to achieve 1 single csv for all the data, as after the combining the value was wrong after checking
Aesthetics and coding issues: First attempt was trying to use plotly to create interactive chart, as not very familiar with plotly as compare to ggplot, after searching it is noted that the ‘ggplotly’ function can directly convert ggplot into interactive plot. This help to save time and is an easier way of creating interactive plot at the first try, output does not satisfy the aesthetic requirement, multiplies attempt was made to achieve the desired output.
Unable to publish to Rpub when tmap mode is set to view After multiples attempt, document is knitted but unable to publish to Rpub. The issue is resolved after the tmap view set to plot for map visualization of suicide by gender.