The goal of this assignment is to get familiar with the data representation. The data used was taken on french open data Web site (http://data.gouv.fr). I focused on the evolution of French population from 1950 to 2018, according to several age groups. These figures were provided by INSEE, which is the French Institute for Statistics and Economical Studies.
#knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyr)
library(tidyverse)
library(plotly)
FrancePopDF <- read.csv('fm_dod_struct_pop.csv',skip=3,sep=';',dec=',')
DF_column_names = c('year','Pop 0 to 19 years','Pop 0 to 14 years','Pop 20 to 59 years','Pop 60 to 64 years','Pop over 65 years','Pop over 75 years','% 0 to 19 years','% 0 to 14 years','% 20 to 59 years','% 60 to 64 years','% over 65 years','% over 75 years')
colnames(FrancePopDF) <- DF_column_names
The graphical representation was done with plotly. The x-axis gives the time (in years) whereas the y-axis shows the number of people in each age group, in millions. I chose to represent this data with several lines graphs, one for each year group. Main categories are shown in continuous line, whereas subcategories are represented with dotted lines. Contiguous categories are shown with solid lines. When a subcategory exists it is represented with a dotted line of the same colour as the parent category.
For example, the age category [0-19 years] is represented with a black solid line, whereas its subcategory [0-14 years] is shown with a black dotted line.
library(plotly)
p <- plot_ly(FrancePopDF, x = ~year, y = ~`Pop 0 to 14 years`/1000000, name = '0 to 14 years', type = 'scatter', mode = 'lines',
line = list(color = 'rgb(0, 0, 0)', width = 2, dash='dot')) %>%
add_trace(y = ~`Pop 0 to 19 years`/1000000, name = '0 to 19 years', line = list(color = 'rgb(0, 0, 0)', width = 2,dash='solid')) %>%
add_trace(y = ~`Pop 20 to 59 years`/1000000, name = '20 to 59 years', line = list(color = 'rgb(22, 96, 167)', width = 2, dash='solid')) %>%
add_trace(y = ~`Pop 60 to 64 years`/1000000, name = '60 to 64 years', line = list(color = 'rgb(205, 12, 24)', width = 2, dash='solid')) %>%
add_trace(y = ~`Pop over 65 years`/1000000, name = 'over 65 years', line = list(color = 'rgb(0, 176, 80)', width = 2, dash='solid')) %>%
add_trace(y = ~`Pop over 75 years`/1000000 , name = 'over 75 years', line = list(color = 'rgb(0, 176, 80)', width = 2, dash = 'dot')) %>%
layout(title = "Evolution of population in France from 1946 to 2018",
xaxis = list(title = "Year", dtick=10),
yaxis = list (title = "population (in millions)"))
p
We can notice several points in this graphs :
Until 1967, the proportion of young people (less than 19), is growing regularly, and more than the other categories. We can also notice that from 1967, the number of people with age between 15 and 19 is stable, because the two black lines (solid and dot) are quite parallel.
From 1967, the number of people in other categories (between 20 and 59 years, and over than 65) increases significantly. Once more, the solid green and the solid blue line seem to be quite parallel. This means that the number of births tends to decrease (perhaps because the standard of living is getting higher, and more people work or make studies, so the age of procreation may be higher). The population lives older (one of the causes is the growth of life expectancy).
From 2007, the people between 20 and 59 decrease, whereas people over 64 years old increase.
From the middle of 1980s, the number of people aged between 65 and 74 is quite stable, because the two green lines are parallel.
In conclusion, these graphs show that the population in France is getting older, and that the renewal of generations is more difficult when time goes on. It can be explained by sociological facts (more and more people are working) or scientific facts (the medicine has done lots of progress), but we need other figures to analyze these graphs in detail.