Summary

After conversion HS codes to FCL of 2013 Eurostat trade data we get 30% of records where HS code is not converted to FCL. 26% of all unique FCL codes from mapping table are not found in resultant trade data set. 3% of all records have multiple matches. We have made some checks of HS ranges from original MDB files. All of them do not provide significant findings in dealing with low rate of successful mapping. HS codes from new trade data sets simply are absent in MDB mapping tables.

Disclaimer: numbers above are based on sampled Eurostat data (100K records).

We plan to recheck the numbers on full data set including Tariffline data. In case of confirmation we recommend to develop approaches to generate FCL/CPC trade data with less dependency on mapping tables from MDB files. For example, using of HS6 Comtrade data and employing of split ratios in case of one-to-many HS6->FCL links.

Validation of HS ranges from MDB

Requirements for HS ranges

Valid HS range from MDB map complies with the following:

  • Only digits (several records failed);
  • fromcode <= tocode (less than a hundred of records failed);
  • HS codes conform to two-digit heading structure (XX.XX.XX… for HS-chapters 1-9, X.XX.XX… for HS-chapters 10+) and belong to HS-chapters of interest (4% of records failed)

Hands-on

Nonnumeric codes in HS

##   area flow                                 fromcode     tocode  fcl
## 1   59    1 15149010                              00 1514901000  271
## 2  154    1                                  26873ex    26873ex 1010
## 3  154    2                                  26873ex    26873ex 1010
## 4  173    1                                  26873ex    26873ex 1010
##   nodigs_total from_records_total
## 1            4             885013

fromcode is greater than tocode

##    area flow  fromcode   tocode  fcl startyear
## 1     8    1   4051011   405900  886        NA
## 2   255    2 210609030 21069030 1232        NA
## 3    33    1 708900000  7089000  420        NA
## 4    85    1  40510000  4058999  886        NA
## 5    97    2   1031000  1023999  946        NA
## 6    97    2   1040000  1029099  866        NA
## 7   101    1  40510000  4058999  886        NA
## 8   256    1  10213900  1023999  946        NA
## 9   256    1  10214000  1029079  866        NA
## 10  256    2  10213900  1023999  946        NA
## 11  256    2  10214000  1029079  866        NA
## 12  174    1  19053320 19053299  110        NA

Two-digit structure of HS codes and chapters of interest

  1. Remove leading zeros
  2. If odd length than chapter is in 1:9
  3. If even length than chapter is >=10
  4. Check chapter in the list
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 33 35 38 40 41 42 43 50 51 52 53
##   notvalid_chap total_rows notvalid_chap_prop
## 1         32826     885013         0.03709098
##       area flow fromcode   tocode  fcl startyear
## 819     11    2 84331100 84331199 1306        NA
## 6411    98    2 87011000 87011999 1302        NA
## 11958  109    2 84369900 84369999 1300        NA
## 1611    17    2   310200   310249 1360        NA
## 13342  114    1 26541000 26541000  789        NA
## 13499  114    2 29198000 29198999 1293        NA
## 6165    41    2 87011000 87011999 1302        NA

Possible additional directions to investigate