Introduction

The goal of this assignment is to get familiar with the data representation. The data used was taken on french open data Web site (http://data.gouv.fr). I focused on the evolution of French population from 1950 to 2018, according to several age groups. These figures were provided by INSEE, which is the French Institute for Statistics and Economical Studies.

Step 1 : Read and transform source data

#knitr::opts_chunk$set(echo = TRUE)

library(dplyr)
library(tidyr)
library(tidyverse)
library(plotly)
FrancePopDF <- read.csv('fm_dod_struct_pop.csv',skip=3,sep=';',dec=',')
DF_column_names = c('year','Pop 0 to 19 years','Pop 0 to 14 years','Pop 20 to 59 years','Pop 60 to 64 years','Pop over 65 years','Pop over 75 years','% 0 to 19 years','% 0 to 14 years','% 20 to 59 years','% 60 to 64 years','% over 65 years','% over 75 years')

colnames(FrancePopDF) <- DF_column_names

Step 2 : plot data

The graphical representation was done with plotly. The x-axis gives the time (in years) whereas the y-axis shows the number of people in each age group, in millions. I chose to represent this data with several lines graphs, one for each year group. Main categories are shown in continuous line, whereas subcategories are represented with dotted lines. Contiguous categories are shown with solid lines. When a subcategory exists it is represented with a dotted line of the same colour as the parent category.

For example, the age category [0-19 years] is represented with a black solid line, whereas its subcategory [0-14 years] is shown with a black dotted line.

library(plotly)
p <- plot_ly(FrancePopDF, x = ~year, y = ~`Pop 0 to 14 years`/1000000, name = '0 to 14 years', type = 'scatter', mode = 'lines',
        line = list(color = 'rgb(0, 0, 0)', width = 2, dash='dot')) %>%
  add_trace(y = ~`Pop 0 to 19 years`/1000000, name = '0 to 19 years', line = list(color = 'rgb(0, 0, 0)', width = 2,dash='solid')) %>%
  add_trace(y = ~`Pop 20 to 59 years`/1000000, name = '20 to 59 years', line = list(color = 'rgb(22, 96, 167)', width = 2, dash='solid')) %>%
  add_trace(y = ~`Pop 60 to 64 years`/1000000, name = '60 to 64 years', line = list(color = 'rgb(205, 12, 24)', width = 2, dash='solid')) %>%
  add_trace(y = ~`Pop over 65 years`/1000000, name = 'over 65 years', line = list(color = 'rgb(0, 176, 80)', width = 2, dash='solid')) %>%
  add_trace(y = ~`Pop over 75 years`/1000000 , name = 'over 75 years', line = list(color = 'rgb(0, 176, 80)', width = 2, dash = 'dot')) %>%
  layout(title = "Evolution of population in France from 1946 to 2018",
         xaxis = list(title = "Year", dtick=10),
         yaxis = list (title = "population (in millions)"))

p

Interpreting results

We can notice several points in this graphs :

In conclusion, these graphs show that the population in France is getting older, and that the renewal of generations is more difficult when time goes on. It can be explained by sociological facts (more and more people are working) or scientific facts (the medicine has done lots of progress), but we need other figures to analyze these graphs in detail.