Last update: Aug 07 12:51:42 PM 2015 CEST

At prevalidation step we are to make a decision should we accept data from a specific country for the further processing or not. A country could provide data of good quality for one part of commodities and inadequate level of quality for another part. We want to estimate quality differences between commodities of a country.

Quality of data is estimated by following indicators:

Procedure:

  1. Mark trade flows with problems.
  2. Group data set by year, reporter and some level of commodities.

Required functions

Trade flow testing

Missing quantity or value

On input:

  • Vector with data
  • Vector with values we treat as missing (NA, 0, etc)

On output:

Vector of logical values (is it missing or not)

missingIndicator <- function(indicator, missingCases) {
  indicator %in% missingCases
}

Unit value outlier

We suggest to use median reporter unit value.

Input:

  • Vector with unit values
  • Function to use
  • Additional parameters for function

Calculations

##   year reporter partner     hs flow weight    qty qunit       value hs2  hs4    hs6
## 1 2009      100     804 081090    2  17390  17390     8   19964.937  08 0810 081090
## 2 2009      100     430 210320    2    329    329     8    1179.756  21 2103 210320
## 3 2009      100     840 130219    2   1271   1271     8   74006.067  13 1302 130219
## 4 2009      100     616 210390    1 326857 326857     8 1372406.708  21 2103 210390
## 5 2009      100     688 151620    2 239300 239300     8  236708.599  15 1516 151620
## 6 2009      100     380 110510    1   1000   1000     8    2603.026  11 1105 110510
##   hs2 total_rows
## 1  01      20949
## 2  02      78765
## 3  03       5500
## 4  04      88846
## 5  05      18617
## 6  06      35992

Trade flows with missing quantity

We identify which reporters provide data of insufficient quality. Firstly for every reporter proportion of trade flows with missing quantity is calculated.

Four countries, Bermuda, Lesotho, Palau and Palestine, in 2011 didn’t provide quantities at all. Countries with no more than 2% of trade flows with missing quantity are removed from the graph.

In the following graph we calculate proportion of trade flows with missing quantities for every HS heading separately. Some countries provide nearly the same proportion of missing quantities across all HS headings. For example, Germany reports quantities almost under all HS headings. Exceptions include headings 50, 52, 53. But amount of missing indicators is close to zero: no more than 2.44%. In case of the United States there are reported headings with proportion of missing quantities up to 30%.

Interactive version of this plot is published in “Tariffline no quantity” tab of Comtrade prevalidation Shiny application.

Missings between HS-headings

Outliers

It was shown before median is suitable measure of central tendency of unit value distribution. Between global and reporter median unit value it is better to choose reporter median value. To compare reporters by amount of outliers for each reporter we calculate median proportion of unit value differences.

Calculations

\[ Me_{reporter} \left [ \frac{x_{trade flow} - Me_{commodity_{reporter}}} {x_{trade flow}} \right ] \]

  1. Prepare data.
    1. Calculate unit value for every trade flow.
    2. Calculate median reporter unit value for every commodity.
    3. Calculate difference between unit value of a trade flow and reporter unit value.
    4. Calculate proportion of the difference and unit value.
  2. Assess reporters.
    1. Calculate median of proportions for each reporter.