Hi! In this kernel we are going to analize some important topics about Barcelona: population, culture, transport, environment, etc. As a citizen and lover of this city, I would like to introduce it to the rest of Kaggle users. We are going to use data from the Open Data BCN platform:
Open Data BCN, a project that was born in 2010, implementing the portal in 2011, has evolved and is now part of the Barcelona Ciutat Digital strategy, fostering a pluralistic digital economy and developing a new model of urban innovation based on the transformation and digital innovation of the public sector and the implication among companies, administrations, the academic world, organizations, communities and people, with a clear public and citizen leadership.
This portal contains more than 400 data sets classified in 5 main topics: Administration, Urban environment, Population, Territory, Economy and Business which are also divided in sub topics. You can consult the data sets catalogue here. You can visit the second part (Discovering Barcelona Part II) here. Let’s begin!
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.4
## Warning: package 'tidyr' was built under R version 4.0.4
## Warning: package 'dplyr' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(skimr)
setwd('C:/Users/DellPC/Desktop/Corner/R_source_code/Barcelona')
#Load libraries
library(tidyverse)
library(knitr)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(wordcloud)
## Loading required package: RColorBrewer
library(visNetwork)
library(knitr)
# Read the data
population <- read.csv('population.csv',
sep =',', fileEncoding = 'UTF-8')
immigrant_emigrants_by_sex <- read.csv('immigrants_emigrants_by_sex.csv',
sep =',' , fileEncoding = 'UTF-8')
immigrants_emigrants_by_age <- read.csv('immigrants_emigrants_by_age.csv',
sep = ',',fileEncoding = 'UTF-8')
immigrants_emigrants_by_destination <- read.csv('immigrants_emigrants_by_destination.csv',
sep =',', fileEncoding = 'UTF-8')
immigrants_emigrants_by_destination2 <- read.csv('immigrants_emigrants_by_destination2.csv',
sep = ',', fileEncoding = 'UTF-8')
immigrants_by_nationality <- read.csv('immigrants_by_nationality.csv',
sep = ',', fileEncoding = 'UTF-8')
births <- read.csv('births.csv',
sep = ',', fileEncoding = 'UTF-8')
deaths <- read.csv('deaths.csv',
sep = ',', fileEncoding = 'UTF-8')
unemployment <- read.csv('unemployment.csv',
sep = ',', fileEncoding = 'UTF-8')
baby_names <- read.csv('most_frequent_baby_names.csv',
sep = ',', fileEncoding = 'UTF-8')
names <- read.csv('most_frequent_names.csv',
sep = ',', fileEncoding = 'UTF-8')
life_expectancy <- read.csv('life_expectancy.csv',
sep = ',', fileEncoding = 'UTF-8')
colnames(life_expectancy)[2:6] <- c("2006-2010", "2007-2011", "2008-2012", "2009-2013", "2010-2014")
Let’s get an idea of what we’re working with. We have many datasets so we are not going to inspect everyone.
population %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Gender Age Number
## 1 el Raval Male 0-4 224
## 2 el Barri Gòtic Male 0-4 50
## 3 la Barceloneta Male 0-4 43
## 4 Sant Pere, Santa Caterina i la Ribera Male 0-4 95
## 5 el Fort Pienc Male 0-4 124
## 6 la Sagrada Família Male 0-4 191
## 7 la Dreta de l'Eixample Male 0-4 176
## 8 l'Antiga Esquerra de l'Eixample Male 0-4 158
## 9 la Nova Esquerra de l'Eixample Male 0-4 220
## 10 Sant Antoni Male 0-4 143
str(population)
## 'data.frame': 70080 obs. of 8 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Age : chr "0-4" "0-4" "0-4" "0-4" ...
## $ Number : int 224 50 43 95 124 191 176 158 220 143 ...
skim(population)
| Name | population |
| Number of rows | 70080 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 10 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 73 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| Age | 0 | 1 | 3 | 5 | 0 | 20 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2015.00 | 1.41 | 2013 | 2014 | 2015 | 2016 | 2017 | ▇▇▇▇▇ |
| District.Code | 0 | 1 | 6.25 | 2.79 | 1 | 4 | 7 | 8 | 10 | ▃▃▃▇▆ |
| Neighborhood.Code | 0 | 1 | 37.00 | 21.07 | 1 | 19 | 37 | 55 | 73 | ▇▇▇▇▇ |
| Number | 0 | 1 | 114.84 | 96.54 | 0 | 43 | 92 | 162 | 777 | ▇▂▁▁▁ |
immigrant_emigrants_by_sex %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Gender Immigrants Emigrants
## 1 el Raval Male 3063 1195
## 2 el Barri Gòtic Male 1499 427
## 3 la Barceloneta Male 910 361
## 4 Sant Pere, Santa Caterina i la Ribera Male 1438 502
## 5 el Fort Pienc Male 1082 495
## 6 la Sagrada Família Male 1693 796
## 7 la Dreta de l'Eixample Male 1543 666
## 8 l'Antiga Esquerra de l'Eixample Male 1693 702
## 9 la Nova Esquerra de l'Eixample Male 1927 925
## 10 Sant Antoni Male 1253 632
str(immigrant_emigrants_by_sex)
## 'data.frame': 730 obs. of 8 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Immigrants : int 3063 1499 910 1438 1082 1693 1543 1693 1927 1253 ...
## $ Emigrants : int 1195 427 361 502 495 796 666 702 925 632 ...
skim(immigrant_emigrants_by_sex)
| Name | immigrant_emigrants_by_se… |
| Number of rows | 730 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 10 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 73 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2015.00 | 1.42 | 2013 | 2014.00 | 2015.0 | 2016.00 | 2017 | ▇▇▇▇▇ |
| District.Code | 0 | 1 | 6.25 | 2.79 | 1 | 4.00 | 7.0 | 8.00 | 10 | ▃▃▃▇▆ |
| Neighborhood.Code | 0 | 1 | 37.00 | 21.09 | 1 | 19.00 | 37.0 | 55.00 | 73 | ▇▇▇▇▇ |
| Immigrants | 0 | 1 | 588.79 | 484.94 | 5 | 251.25 | 474.5 | 784.25 | 3219 | ▇▃▁▁▁ |
| Emigrants | 0 | 1 | 362.18 | 255.19 | 3 | 178.00 | 329.0 | 498.75 | 1541 | ▇▆▂▁▁ |
immigrants_emigrants_by_age %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Age Immigrants Emigrants
## 1 el Raval 0-4 154 108
## 2 el Barri Gòtic 0-4 58 33
## 3 la Barceloneta 0-4 38 37
## 4 Sant Pere, Santa Caterina i la Ribera 0-4 56 55
## 5 el Fort Pienc 0-4 79 60
## 6 la Sagrada Família 0-4 111 95
## 7 la Dreta de l'Eixample 0-4 121 78
## 8 l'Antiga Esquerra de l'Eixample 0-4 97 63
## 9 la Nova Esquerra de l'Eixample 0-4 123 130
## 10 Sant Antoni 0-4 57 80
str(immigrants_emigrants_by_age)
## 'data.frame': 4662 obs. of 8 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Age : chr "0-4" "0-4" "0-4" "0-4" ...
## $ Immigrants : int 154 58 38 56 79 111 121 97 123 57 ...
## $ Emigrants : int 108 33 37 55 60 95 78 63 130 80 ...
str(immigrants_emigrants_by_age)
## 'data.frame': 4662 obs. of 8 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Age : chr "0-4" "0-4" "0-4" "0-4" ...
## $ Immigrants : int 154 58 38 56 79 111 121 97 123 57 ...
## $ Emigrants : int 108 33 37 55 60 95 78 63 130 80 ...
skim(immigrants_emigrants_by_age)
| Name | immigrants_emigrants_by_a… |
| Number of rows | 4662 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 11 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 74 | 0 |
| Age | 0 | 1 | 3 | 5 | 0 | 21 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2016.00 | 0.82 | 2015 | 2015 | 2016.0 | 2017 | 2017 | ▇▁▇▁▇ |
| District.Code | 0 | 1 | 7.50 | 11.06 | 1 | 4 | 7.0 | 8 | 99 | ▇▁▁▁▁ |
| Neighborhood.Code | 0 | 1 | 37.84 | 22.12 | 1 | 19 | 37.5 | 56 | 99 | ▇▇▇▆▁ |
| Immigrants | 0 | 1 | 58.23 | 108.97 | 0 | 5 | 21.0 | 64 | 1393 | ▇▁▁▁▁ |
| Emigrants | 0 | 1 | 33.68 | 45.93 | 0 | 6 | 18.0 | 42 | 451 | ▇▁▁▁▁ |
immigrants_emigrants_by_destination %>% head(10)
## from to weight
## 1 Abroad Barcelona 53085
## 2 Andalusia Barcelona 2416
## 3 Aragon Barcelona 811
## 4 Asturias Barcelona 323
## 5 Balearic Islands Barcelona 1634
## 6 Barcelona Abroad 5485
## 7 Barcelona Andalusia 1596
## 8 Barcelona Aragon 745
## 9 Barcelona Asturias 204
## 10 Barcelona Balearic Islands 1428
str(immigrants_emigrants_by_destination)
## 'data.frame': 40 obs. of 3 variables:
## $ from : chr "Abroad" "Andalusia" "Aragon" "Asturias" ...
## $ to : chr "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
## $ weight: int 53085 2416 811 323 1634 5485 1596 745 204 1428 ...
skim(immigrants_emigrants_by_destination)
| Name | immigrants_emigrants_by_d… |
| Number of rows | 40 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| from | 0 | 1 | 5 | 19 | 0 | 21 | 0 |
| to | 0 | 1 | 5 | 19 | 0 | 21 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| weight | 0 | 1 | 3749.32 | 10603.95 | 12 | 203.25 | 613.5 | 1605.5 | 53085 | ▇▁▁▁▁ |
immigrants_by_nationality %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Nationality Number
## 1 el Raval Spain 1109
## 2 el Barri Gòtic Spain 482
## 3 la Barceloneta Spain 414
## 4 Sant Pere, Santa Caterina i la Ribera Spain 537
## 5 el Fort Pienc Spain 663
## 6 la Sagrada Família Spain 1181
## 7 la Dreta de l'Eixample Spain 1063
## 8 l'Antiga Esquerra de l'Eixample Spain 1177
## 9 la Nova Esquerra de l'Eixample Spain 1593
## 10 Sant Antoni Spain 883
str(immigrants_by_nationality)
## 'data.frame': 35224 obs. of 7 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Nationality : chr "Spain" "Spain" "Spain" "Spain" ...
## $ Number : int 1109 482 414 537 663 1181 1063 1177 1593 883 ...
skim(immigrants_by_nationality)
| Name | immigrants_by_nationality |
| Number of rows | 35224 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 11 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 74 | 0 |
| Nationality | 0 | 1 | 4 | 24 | 0 | 177 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2016.01 | 0.81 | 2015 | 2015 | 2016.0 | 2017 | 2017 | ▇▁▇▁▇ |
| District.Code | 0 | 1 | 7.50 | 11.06 | 1 | 4 | 7.0 | 8 | 99 | ▇▁▁▁▁ |
| Neighborhood.Code | 0 | 1 | 37.84 | 22.12 | 1 | 19 | 37.5 | 56 | 99 | ▇▇▇▆▁ |
| Number | 0 | 1 | 7.71 | 50.42 | 0 | 0 | 0.0 | 2 | 1603 | ▇▁▁▁▁ |
births %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Gender Number
## 1 el Raval Boys 283
## 2 el Barri Gòtic Boys 56
## 3 la Barceloneta Boys 51
## 4 Sant Pere, Santa Caterina i la Ribera Boys 90
## 5 el Fort Pienc Boys 117
## 6 la Sagrada Família Boys 207
## 7 la Dreta de l'Eixample Boys 185
## 8 l'Antiga Esquerra de l'Eixample Boys 176
## 9 la Nova Esquerra de l'Eixample Boys 218
## 10 Sant Antoni Boys 172
str(births)
## 'data.frame': 734 obs. of 7 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Gender : chr "Boys" "Boys" "Boys" "Boys" ...
## $ Number : int 283 56 51 90 117 207 185 176 218 172 ...
skim(births)
| Name | births |
| Number of rows | 734 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 11 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 75 | 0 |
| Gender | 0 | 1 | 4 | 5 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2015.00 | 1.42 | 2013 | 2014.00 | 2015.0 | 2016.00 | 2017 | ▇▇▇▇▇ |
| District.Code | 0 | 1 | 6.75 | 7.38 | 1 | 4.00 | 7.0 | 8.00 | 99 | ▇▁▁▁▁ |
| Neighborhood.Code | 0 | 1 | 37.34 | 21.52 | 1 | 19.00 | 37.0 | 55.75 | 99 | ▇▇▇▆▁ |
| Number | 0 | 1 | 91.75 | 61.93 | 0 | 43.25 | 84.5 | 129.25 | 283 | ▇▇▃▂▁ |
deaths %>% head(10)
## Year District.Code District.Name Neighborhood.Code
## 1 2017 1 Ciutat Vella 1
## 2 2017 1 Ciutat Vella 2
## 3 2017 1 Ciutat Vella 3
## 4 2017 1 Ciutat Vella 4
## 5 2017 2 Eixample 5
## 6 2017 2 Eixample 6
## 7 2017 2 Eixample 7
## 8 2017 2 Eixample 8
## 9 2017 2 Eixample 9
## 10 2017 2 Eixample 10
## Neighborhood.Name Age Number
## 1 el Raval 0-4 1
## 2 el Barri Gòtic 0-4 1
## 3 la Barceloneta 0-4 0
## 4 Sant Pere, Santa Caterina i la Ribera 0-4 0
## 5 el Fort Pienc 0-4 0
## 6 la Sagrada Família 0-4 1
## 7 la Dreta de l'Eixample 0-4 0
## 8 l'Antiga Esquerra de l'Eixample 0-4 0
## 9 la Nova Esquerra de l'Eixample 0-4 0
## 10 Sant Antoni 0-4 0
str(deaths)
## 'data.frame': 4599 obs. of 7 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Age : chr "0-4" "0-4" "0-4" "0-4" ...
## $ Number : int 1 1 0 0 0 1 0 0 0 0 ...
skim(deaths)
| Name | deaths |
| Number of rows | 4599 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| District.Name | 0 | 1 | 6 | 19 | 0 | 10 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 73 | 0 |
| Age | 0 | 1 | 3 | 5 | 0 | 21 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2016.00 | 0.82 | 2015 | 2015 | 2016 | 2017 | 2017 | ▇▁▇▁▇ |
| District.Code | 0 | 1 | 6.25 | 2.79 | 1 | 4 | 7 | 8 | 10 | ▃▃▃▇▆ |
| Neighborhood.Code | 0 | 1 | 37.00 | 21.07 | 1 | 19 | 37 | 55 | 73 | ▇▇▇▇▇ |
| Number | 0 | 1 | 10.05 | 18.56 | 0 | 0 | 2 | 11 | 139 | ▇▁▁▁▁ |
unemployment %>% head(10)
## Year Month District.Code District.Name Neighborhood.Code
## 1 2017 January 1 Ciutat Vella 1
## 2 2017 January 1 Ciutat Vella 2
## 3 2017 January 1 Ciutat Vella 3
## 4 2017 January 1 Ciutat Vella 4
## 5 2017 January 2 Eixample 5
## 6 2017 January 2 Eixample 6
## 7 2017 January 2 Eixample 7
## 8 2017 January 2 Eixample 8
## 9 2017 January 2 Eixample 9
## 10 2017 January 2 Eixample 10
## Neighborhood.Name Gender Demand_occupation Number
## 1 el Raval Male Registered unemployed 2107
## 2 el Barri Gòtic Male Registered unemployed 538
## 3 la Barceloneta Male Registered unemployed 537
## 4 Sant Pere, Santa Caterina i la Ribera Male Registered unemployed 741
## 5 el Fort Pienc Male Registered unemployed 630
## 6 la Sagrada Família Male Registered unemployed 1032
## 7 la Dreta de l'Eixample Male Registered unemployed 655
## 8 l'Antiga Esquerra de l'Eixample Male Registered unemployed 751
## 9 la Nova Esquerra de l'Eixample Male Registered unemployed 1128
## 10 Sant Antoni Male Registered unemployed 847
str(unemployment)
## 'data.frame': 14208 obs. of 9 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ Month : chr "January" "January" "January" "January" ...
## $ District.Code : int 1 1 1 1 2 2 2 2 2 2 ...
## $ District.Name : chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Neighborhood.Code: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Neighborhood.Name: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Demand_occupation: chr "Registered unemployed" "Registered unemployed" "Registered unemployed" "Registered unemployed" ...
## $ Number : int 2107 538 537 741 630 1032 655 751 1128 847 ...
skim(unemployment)
| Name | unemployment |
| Number of rows | 14208 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Month | 0 | 1 | 3 | 9 | 0 | 12 | 0 |
| District.Name | 0 | 1 | 6 | 19 | 0 | 11 | 0 |
| Neighborhood.Name | 0 | 1 | 5 | 44 | 0 | 74 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| Demand_occupation | 0 | 1 | 19 | 21 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Year | 0 | 1 | 2015.38 | 1.32 | 2013 | 2014.75 | 2015.5 | 2016.25 | 2017 | ▃▃▇▇▇ |
| District.Code | 0 | 1 | 7.50 | 11.06 | 1 | 4.00 | 7.0 | 8.00 | 99 | ▇▁▁▁▁ |
| Neighborhood.Code | 0 | 1 | 37.84 | 22.12 | 1 | 19.00 | 37.5 | 56.00 | 99 | ▇▇▇▆▁ |
| Number | 0 | 1 | 446.70 | 425.95 | 0 | 121.00 | 296.0 | 685.00 | 3094 | ▇▂▁▁▁ |
baby_names %>% head(10)
## Order Name Gender Year Frequency
## 1 1 LAURA Female 1996 237
## 2 2 MARIA Female 1996 219
## 3 3 MARTA Female 1996 206
## 4 4 ANDREA Female 1996 199
## 5 5 ANNA Female 1996 195
## 6 6 PAULA Female 1996 186
## 7 7 ALBA Female 1996 142
## 8 8 SARA Female 1996 125
## 9 9 LAIA Female 1996 123
## 10 10 CLAUDIA Female 1996 122
str(baby_names)
## 'data.frame': 1050 obs. of 5 variables:
## $ Order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "LAURA" "MARIA" "MARTA" "ANDREA" ...
## $ Gender : chr "Female" "Female" "Female" "Female" ...
## $ Year : int 1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
## $ Frequency: int 237 219 206 199 195 186 142 125 123 122 ...
skim(baby_names)
| Name | baby_names |
| Number of rows | 1050 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1 | 3 | 9 | 0 | 97 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Order | 0 | 1 | 13.00 | 7.21 | 1 | 7 | 13 | 19 | 25 | ▇▇▇▇▇ |
| Year | 0 | 1 | 2006.00 | 6.06 | 1996 | 2001 | 2006 | 2011 | 2016 | ▇▆▆▆▆ |
| Frequency | 0 | 1 | 101.66 | 44.34 | 44 | 69 | 89 | 122 | 348 | ▇▃▁▁▁ |
names %>% head(10)
## Order Name Gender Decade Frequency
## 1 1 MARIA Female Before 1930 2085
## 2 2 CARMEN Female Before 1930 1463
## 3 3 JOSEFA Female Before 1930 1032
## 4 4 MONTSERRAT Female Before 1930 879
## 5 5 ROSA Female Before 1930 762
## 6 6 TERESA Female Before 1930 734
## 7 7 PILAR Female Before 1930 694
## 8 8 DOLORES Female Before 1930 679
## 9 9 MERCEDES Female Before 1930 603
## 10 10 FRANCISCA Female Before 1930 596
str(names)
## 'data.frame': 1100 obs. of 5 variables:
## $ Order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "MARIA" "CARMEN" "JOSEFA" "MONTSERRAT" ...
## $ Gender : chr "Female" "Female" "Female" "Female" ...
## $ Decade : chr "Before 1930" "Before 1930" "Before 1930" "Before 1930" ...
## $ Frequency: int 2085 1463 1032 879 762 734 694 679 603 596 ...
skim(names)
| Name | names |
| Number of rows | 1100 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1 | 3 | 16 | 0 | 276 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| Decade | 0 | 1 | 4 | 11 | 0 | 11 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Order | 0 | 1 | 25.50 | 14.44 | 1 | 13 | 25.5 | 38 | 50 | ▇▇▇▇▇ |
| Frequency | 0 | 1 | 1263.57 | 1989.23 | 39 | 379 | 621.5 | 1145 | 19173 | ▇▁▁▁▁ |
life_expectancy %>% head(10)
## Neighborhood 2006-2010 2007-2011 2008-2012
## 1 el Raval 87.5 84.9 84.7
## 2 el Barri Gòtic 88.0 84.3 84.4
## 3 la Barceloneta 88.2 85.3 84.4
## 4 Sant Pere, Santa Caterina i la Ribera 88.9 85.9 86.1
## 5 el Fort Pienc 89.7 87.1 87.2
## 6 la Sagrada Família 88.9 86.6 86.8
## 7 la Dreta de l'Eixample 87.0 85.0 85.1
## 8 l'Antiga Esquerra de l'Eixample 88.9 86.4 86.3
## 9 la Nova Esquerra de l'Eixample 90.0 87.4 87.5
## 10 Sant Antoni 89.0 86.3 86.5
## 2009-2013 2010-2014 Gender
## 1 84.9 85.3 Female
## 2 87.5 84.4 Female
## 3 86.3 84.7 Female
## 4 85.5 85.3 Female
## 5 87.7 86.7 Female
## 6 85.1 87.2 Female
## 7 85.7 85.1 Female
## 8 85.8 86.9 Female
## 9 87.0 88.7 Female
## 10 86.0 87.6 Female
str(life_expectancy)
## 'data.frame': 146 obs. of 7 variables:
## $ Neighborhood: chr "el Raval" "el Barri Gòtic" "la Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ 2006-2010 : num 87.5 88 88.2 88.9 89.7 88.9 87 88.9 90 89 ...
## $ 2007-2011 : num 84.9 84.3 85.3 85.9 87.1 86.6 85 86.4 87.4 86.3 ...
## $ 2008-2012 : num 84.7 84.4 84.4 86.1 87.2 86.8 85.1 86.3 87.5 86.5 ...
## $ 2009-2013 : num 84.9 87.5 86.3 85.5 87.7 85.1 85.7 85.8 87 86 ...
## $ 2010-2014 : num 85.3 84.4 84.7 85.3 86.7 87.2 85.1 86.9 88.7 87.6 ...
## $ Gender : chr "Female" "Female" "Female" "Female" ...
skim(life_expectancy)
| Name | life_expectancy |
| Number of rows | 146 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Neighborhood | 0 | 1 | 5 | 44 | 0 | 73 | 0 |
| Gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| 2006-2010 | 8 | 0.95 | 84.93 | 3.87 | 74.2 | 81.70 | 83.9 | 88.50 | 90.9 | ▁▂▇▂▇ |
| 2007-2011 | 6 | 0.96 | 82.47 | 3.96 | 70.7 | 79.50 | 81.7 | 86.12 | 88.2 | ▁▁▇▂▇ |
| 2008-2012 | 8 | 0.95 | 82.78 | 3.83 | 69.5 | 79.90 | 82.0 | 86.30 | 88.2 | ▁▁▆▃▇ |
| 2009-2013 | 6 | 0.96 | 83.10 | 3.85 | 69.0 | 80.10 | 82.7 | 86.53 | 88.5 | ▁▁▆▃▇ |
| 2010-2014 | 4 | 0.97 | 83.45 | 3.91 | 71.3 | 80.53 | 83.4 | 86.97 | 89.8 | ▁▂▇▃▇ |
Let’s begin with the population by year
# Population by year
population %>%
group_by(Year, Gender) %>%
summarise(count = sum(Number)) %>%
mutate(percent = paste0(round((count/sum(count))*100, 2), "%")) %>%
ggplot(aes(x = Year, y = count)) +
geom_bar(stat = 'identity', aes(fill = Gender)) +
geom_text(aes(label = percent, group = Gender),
position = position_stack(vjust = 0.5), color = 'white') +
scale_y_continuous(labels = comma) +
labs(x = 'Year', y = 'Population', title ='Population by year (2013-2017)') +
theme_bw()
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
The city of Barcelona’s population remains relatively stable as it’s already extremely densely populated. The female population is slightly higher than the male. The population per year is very similar, so in the following visualizations we will only take in account the last year (2017). We can continue with the population by age.
#Ordered levels
population$Age <- ordered(population$Age, levels=c("0-4", "5-9", "10-14", "15-19",
"20-24", "25-29", "30-34", "35-39",
"40-44", "45-49", "50-54", "55-59",
"60-64", "65-69", "70-74", "75-79",
"80-84", "85-89", "90-94", ">=95"))
# Population by age (2017)
population2017 <- population %>%
filter(Year == '2017')
ggplot(data = population2017, aes(x = Age, fill = Gender)) +
geom_bar(data = filter(population2017, Gender == 'Female'), aes( y =Number), stat = 'identity') +
geom_bar(data = filter(population2017, Gender == 'Male'), aes(y = Number*(-1)), stat = 'identity') +
scale_y_continuous(breaks = seq(-100000, 100000, 20000),
labels = comma(abs(seq(-100000, 100000, 20000)))) +
labs(x = 'Age', y = 'Population', title = 'Population by age (2017)') +
coord_flip() +
theme_bw()
The population distribution is centered around 25-44 years. It’s interesting to observe how the male population decreases considerably from the 40-44 years old range, while in the female gender the decrease is less pronounced . It seems clear that men live less years in Barcelona!
# Population by district (2017)
population %>% filter(Year == "2017") %>%
group_by(District.Name, Gender) %>%
summarise(count = sum(Number)) %>%
mutate(percent = paste0(round((count/sum(count))*100, 2), "%")) %>%
ggplot(aes(x = reorder(District.Name, count), y = count)) +
geom_bar(stat = 'identity', aes(fill = Gender)) +
geom_text(aes(label = percent, group = Gender), position = position_stack(vjust = 0.5)) +
scale_y_continuous(breaks = seq(0, 300000, 100000), labels = comma) +
labs(x = 'District name', y = 'Population', title = 'Population by district (2017)') +
theme_bw() +
coord_flip()
## `summarise()` has grouped output by 'District.Name'. You can override using the `.groups` argument.
Example is the most populated district of Barcelona, followed by Sant Martí and Sants-Montjuïc. Conversely, Les Corts is the least populated district. We can see how the female population is higher than the male population in all neighborhoods, except in Ciutat Vella. We can go into more detail and analyze the population by neighborhoods. There are more than 70 neighborhoods, so we’re just going to visualize the most populated.
# Population by neighbordhood - Top 15 neighbordhoods (2017)
population %>%
filter(Year=="2017") %>%
filter(Neighborhood.Name %in% c("la Nova Esquerra de l'Eixample", "Sant Andreu",
"la Sagrada Família", "la Vila de Gràcia",
"Sant Gervasi - Galvany", "el Raval", "les Corts",
"la Dreta de l'Eixample", "l'Antiga Esquerra de l'Eixample",
"Sants", "el Poble Sec", "Sant Antoni",
"el Camp de l'Arpa del Clot", "el Guinardó",
"el Camp d'en Grassot i Gràcia Nova")) %>%
group_by(Neighborhood.Name, Gender) %>%
summarise(count=sum(Number)) %>%
mutate(percent=paste0(round((count/sum(count))*100, 2), "%")) %>%
ggplot(aes(x=reorder(Neighborhood.Name, count), y=count)) +
geom_bar(stat="identity", aes(fill=Gender)) +
geom_text(aes(label=percent, group=Gender), position=position_stack(vjust=0.5)) +
scale_y_continuous(breaks=seq(0, 60000, 20000), labels=comma) +
labs(x="Neighbordhood name", y="Population", title="Population by neighbordhood (2017)") +
theme_bw() +
coord_flip()
## `summarise()` has grouped output by 'Neighborhood.Name'. You can override using the `.groups` argument.
In the Top 15 neighborhoods the female population is higher than the male population, except in el Raval.
In this section we are going to show some visualizations to understand the complex phenomenon of immigration and emigration in the city of Barcelona. Let’s begin with the analysis by year.
immigrant_emigrants_by_sex <- gather(immigrant_emigrants_by_sex,
`Immigrants/Emigrants`, value, Immigrants:Emigrants,
na.rm = TRUE)
immigrant_emigrants_by_sex %>%
group_by(Gender, `Immigrants/Emigrants`, Year) %>%
summarise( count = sum(value)) %>%
ggplot(aes(x = `Immigrants/Emigrants`, y = count, fill = Gender)) +
geom_bar(stat = 'identity', position = 'stack') +
facet_grid(~ Year) +
theme_bw() +
labs(y = 'Population', title = 'Immigration and emigration by year (2013-2017)') +
scale_y_continuous(labels = comma) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank())
## `summarise()` has grouped output by 'Gender', 'Immigrants/Emigrants'. You can override using the `.groups` argument.
Whereas emigration has remained more or less constant over recent years, immigration has increased considerably since 2013. Let’s study the immigration and emigration distribution
immigrants_emigrants_by_age <- gather(immigrants_emigrants_by_age, `Immigrants/Emigrants`,
value, Immigrants:Emigrants, na.rm=TRUE)
# Ordered levels
immigrants_emigrants_by_age$Age <- ordered(immigrants_emigrants_by_age$Age,
levels=c("0-4", "5-9", "10-14", "15-19",
"20-24", "25-29", "30-34", "35-39",
"40-44", "45-49", "50-54", "55-59",
"60-64", "65-69", "70-74", "75-79",
"80-84", "85-89", "90-94", "95-99",
">=100"))
# Immigration and emigration by age (2017)
immigrants_emigrants_by_age_2017 <- immigrants_emigrants_by_age %>%
filter(Year=="2017")
ggplot(data=immigrants_emigrants_by_age_2017, aes(x=Age, fill=`Immigrants/Emigrants`)) +
geom_bar(data=filter(immigrants_emigrants_by_age_2017, `Immigrants/Emigrants`=="Immigrants"),
aes(y=value), stat="identity") +
geom_bar(data=filter(immigrants_emigrants_by_age_2017, `Immigrants/Emigrants`=="Emigrants"),
aes(y=value*(-1)), stat="identity") +
scale_y_continuous(breaks=seq(-30000, 30000, 5000), labels=comma(abs(seq(-30000, 30000, 5000)))) +
labs(x="Age", y="Population", title="Immigration and emigration by age (2017)") +
coord_flip() +
theme_bw() +
guides(fill=guide_legend(title=NULL))
As we can see, there were more than 20.000 immigrants between 20-24 years old in 2017. With respect to emigration, the distribution is centered around 25-39 years old.
In the next visualization we are going to analyze the emigration by place of destination (2017) using a Sankey diagram. A Sankey Diagram displays flows and their quantities, in proportion to one another using the width of arrows or lines to show their magnitudes. Let’s see!
library(networkD3)
## Warning: package 'networkD3' was built under R version 4.0.4
# Nodes
nodes <- as.data.frame(unique(immigrants_emigrants_by_destination$from))
nodes$id <- 1:nrow(nodes)
nodes <- nodes[, c(2, 1)]
names(nodes) <- c('id', 'label')
# Emigrants
emigrants_by_destination <- immigrants_emigrants_by_destination %>%
filter(from == "Barcelona")
# Edges
edges <- emigrants_by_destination %>%
left_join(nodes, by = c("from" = "label")) %>%
select(-from) %>%
dplyr::rename(from = id)
edges <- emigrants_by_destination %>%
left_join(nodes, by = c("from" = "label")) %>%
select(-from) %>%
dplyr::rename(from = id)
edges <- edges %>% left_join(nodes, by = c('to' = 'label')) %>%
select(-to) %>%
dplyr::rename(to = id)
nodes_d3 <- mutate(nodes, id = id - 1)
edges_d3 <- mutate(edges, from = from - 1, to = to -1)
# sankeyNetwork - Emigrants destination
sankeyNetwork(Links = edges_d3, Nodes = nodes_d3, Source= 'from',
Target = 'to', NodeID = 'label', Value = 'weight', fontSize = 16, unit = 'Letter(s)')
Firstly it’s important to emphasize that the data set used for this visualization doesn’t specify destinations outside of Spain (category “Abroad”). Some comments about the graph:
As we can see, vast majority of inhabitants migrate to other areas of Catalonia (due to the high rental prices in Barcelona, perhaps?).
A high percentage of inhabitants emigrate outside of Spain (looking for better job opportunities, maybe?).
Within Spain, the autonomous communities with more prominence are: Community of Madrid, Andalusia, Balearic Islands and Valencian Community.
We can further detail the visualization and also study the emigration by place of destination and by neighbourhoods of the city of Barcelona (2017).
# Nodes
nodes <- as.data.frame(unique(immigrants_emigrants_by_destination2$from))
nodes$id <- 1:nrow(nodes)
nodes <- nodes[, c(2,1)]
names(nodes) <- c("id", "label")
# Emigrants
emigrants_by_destination2 <- immigrants_emigrants_by_destination2 %>%
filter(from %in% c("Ciutat Vella", "Eixample", "Gràcia", "Horta-Guinardó",
"Les Corts", "Nou Barris", "Sant Andreu", "Sant Martí",
"Sants-Montjuïc", "Sarrià-Sant Gervasi"))
# Edges
edges <- emigrants_by_destination2 %>%
left_join(nodes, by=c("from"="label")) %>%
select(-from) %>%
dplyr::rename(from=id)
edges <- edges %>%
left_join(nodes, by=c("to"="label")) %>%
select(-to) %>%
dplyr::rename(to=id)
nodes_d3 <- mutate(nodes, id=id-1)
edges_d3 <- mutate(edges, from=from-1, to=to-1)
# sankeyNetwork - Emigrants destination
sankeyNetwork(Links=edges_d3, Nodes=nodes_d3, Source="from", Target="to",
NodeID="label", Value="weight", fontSize=16, unit="Letter(s)")