How many HS ranges in the map are wider than single HS6 code.
data("hsfclmap2", package = "hsfclmap")
hsfcl <- hsfclmap2 %>%
mutate(hs6from = str_extract(fromcode, "^\\d{1,6}"),
hs6to = str_extract(tocode, "^\\d{1,6}"))
hsfcl %>%
summarize(sum(hs6from != hs6to)/n())
## sum(hs6from != hs6to)/n()
## 1 0.171674
How many HS-to-FCL mapping records have one-to-many link. We take into account only left-hand side of HS range.
fclinhs <- hsfcl %>%
select(mdbyear, area, validyear, flow, hs6from, fcl) %>%
group_by(mdbyear, area, validyear, flow, hs6from) %>%
mutate(fclinhs = length(unique(fcl))) %>%
ungroup()
# Total share
fclinhs %>%
summarise(sum(fclinhs > 1) / n())
## # A tibble: 1 × 1
## `sum(fclinhs > 1)/n()`
## <dbl>
## 1 0.2619757
# By year
fclinhs %>%
group_by(mdbyear) %>%
summarise(one2manyprop = sum(fclinhs >1) / n()) %>%
ggplot(aes(mdbyear, one2manyprop)) +
geom_path(stat = "identity")
How many duplicate links are in the map if we don’t take into account year of map production (mdbyear).
hsfclmap2 %>%
select(validyear, area, flow, fromcode, tocode, fcl) %>%
distinct() %>%
summarize(dupl_prop = 1 - n() / nrow(hsfclmap2))
## dupl_prop
## 1 0.8505714
So we can drop duplicates!
hsfcluniq <- hsfcl %>%
select(-starts_with("mdb")) %>%
distinct()
dim(hsfcluniq)
## [1] 885013 8
How many one-to-many HS6-to-FCL links now?
hsfcluniq %>%
select(validyear, area, flow, hs6from, fcl) %>%
group_by(validyear, area, flow, hs6from) %>%
summarize(fclnumb = length(unique(fcl))) %>%
ungroup() %>%
summarize(sum(fclnumb > 1) / n())
## # A tibble: 1 × 1
## `sum(fclnumb > 1)/n()`
## <dbl>
## 1 0.1020934