Introduction

We consider the average production of primary (crude) and secondary (processed) products in each month, every year for as many as 109 countries.

## 'data.frame':    3684200 obs. of  8 variables:
##  $ country  : Factor w/ 109 levels "ALGERIA","ANGOLALL",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ product  : Factor w/ 4 levels "CRUDEOIL","NGL",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ flow     : Factor w/ 10 levels "CSNATTER","DIRECTUSE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ unit     : Factor w/ 5 levels "CONVBBL","KBBL",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ date     : Factor w/ 169 levels "APR2002","APR2003",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ quantity : num  8130 8130 8130 8130 8130 8130 8130 8130 8130 8130 ...
##  $ code     : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Qualifier: Factor w/ 3 levels "              ",..: 1 1 1 1 1 1 1 1 1 1 ...

Data cleaning

In this step, 30 MB of data is extracted and cleaned to provide a simple data frame containing average monthly production between 2002 and 2016. The graphs depict the trends for a few major players - Canada, USA, Australia, India and China.

## Warning: package 'dplyr' was built under R version 3.2.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Primary products

Primary products consists of CRUDE, NGL, Other Crude and TotCrude.

## Warning in merge.data.frame(total2, ind, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning in merge.data.frame(total3, chi, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning: package 'ggplot2' was built under R version 3.2.4
## Warning: package 'gridExtra' was built under R version 3.2.4
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Secondary products

Secondary products consists of GASDIES, GASOLINE, JETKERO, KEROSENE, LPG, NAPHTHA, ONONSPEC, RESFUEL, TOTPRODS.

## Warning in merge.data.frame(total2, ind, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning in merge.data.frame(total3, chi, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result