Introduction
We consider the average production of primary (crude) and secondary (processed) products in each month, every year for as many as 109 countries.
## 'data.frame': 3684200 obs. of 8 variables:
## $ country : Factor w/ 109 levels "ALGERIA","ANGOLALL",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ product : Factor w/ 4 levels "CRUDEOIL","NGL",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ flow : Factor w/ 10 levels "CSNATTER","DIRECTUSE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ unit : Factor w/ 5 levels "CONVBBL","KBBL",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ date : Factor w/ 169 levels "APR2002","APR2003",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ quantity : num 8130 8130 8130 8130 8130 8130 8130 8130 8130 8130 ...
## $ code : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Qualifier: Factor w/ 3 levels " ",..: 1 1 1 1 1 1 1 1 1 1 ...
Data cleaning
In this step, 30 MB of data is extracted and cleaned to provide a simple data frame containing average monthly production between 2002 and 2016. The graphs depict the trends for a few major players - Canada, USA, Australia, India and China.
## Warning: package 'dplyr' was built under R version 3.2.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Primary products
Primary products consists of CRUDE, NGL, Other Crude and TotCrude.
## Warning in merge.data.frame(total2, ind, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning in merge.data.frame(total3, chi, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning: package 'ggplot2' was built under R version 3.2.4
## Warning: package 'gridExtra' was built under R version 3.2.4
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine

Secondary products
Secondary products consists of GASDIES, GASOLINE, JETKERO, KEROSENE, LPG, NAPHTHA, ONONSPEC, RESFUEL, TOTPRODS.
## Warning in merge.data.frame(total2, ind, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
## Warning in merge.data.frame(total3, chi, by = "year"): column names
## 'total.x', 'total.y' are duplicated in the result
