The present project is part of a course in data science and business analytics. In this work, we’ve chosen to investigate Agora, a market place of the darkweb that used to be very popular before it was closed in 2015. Hopefully, our analysis will contribute to produce intelligence about this specific platform and help understanding how the platform supporting illegal trades are organized.
For our project, we used a dataset we found on Kaggle called Dark net marketplace drug data
It was made from an html rip by a reddit user called “usheep” that was blackmailing the vendors on the website, saying he would expose them to the police if they didn’t meet his demand. We don’t know what happend, he posted the html rip and there is no more information about what happened to him or his demands.
The dataset is ripped from a dark/deep web market place called Agora from the years 2014 et 2015. It contains offers for drugs, weapons, books, services and others. There is about 100’000 items to sell listed. The market place was shut down a few months after this data release, we don’t know if it’s related to the reddit user or not.
It is organized by:
Vendor: The seller
Category: Where in the marketplace the item falls under
Item: The title of the listing
Description: The description of the listing
Price: Cost of the item (averaged across any duplicate listings between 2014 and 2015)
Origin: Where the item is listed to have shipped from
Destination: Where the item is listed to be shipped to (blank means no information was provided, but mostly likely worldwide. I did not enter worldwide for any blanks however as to not make assumptions)
Rating: The rating of the seller (a rating of [0 deals] or anything else with “deals” in it means there is not concrete rating as the amount of deals is too small for a rating to be displayed)
Remarks: Only remark options are blank, or “Average price may be skewed outliar > .5 BTC found” which is pretty self explanatory.
As we both followed classes about criminality, web-criminality and cybersecurity before following this class. We thought it would be interresting to make links between classes and work on a criminality related project.
Furthermore, we often read or hear about dark/deep web and the various stuff that you could find in here, finding this dataset gave us the opportunity to explore the reality of a market place selling mostly illegal items.
If the darkweb is a set of unreferenced websites that can provide anonymity and freedom of speech, it’s also heavily used by criminals to meet, makes deals, exchanges informations and propose illegal products and services.
As transactions are anonymous, it has been well documented that those places share common features in the development of specific governance modes, such as reputation systems, which we can see in the grading system present in the Agora dataset, in order to ensure a certain level of confidence among users.
If the identification of individuals behind the transactions is very difficult to do, analyzing the dynamiques present in the marketplaces has a lot to offer in terms of informations and intelligence production regarding criminal activity in general and could be used as indicator of international traffic and fluxes.
Our project aims at giving insights about the composition of Agora as a market place, what products are mostly available through the platform, where they come from and to what destinations they’re shipped. It will also investigate how criminal activity is distributed: are the majority of offers the result of a few prolific authors? Is this organization dependant on the categories and types of products proposed? Are the most prolific vendors specialized actors of a sub-market or do they diversify their scope of activity?
Thus, the analysis project will be organized around those 3 dimensions:
Investigating the type of products available in Agora
Investigating the geographical distribution of proposed products in Agora
Investigatinng the distribution of activity amongst Agora’s Vendors
In order to pursue our research questions, a substantial amount of work was required to clean the dataset. In particular, declared categories of products had to be decomposed in different levels of precision (branches) that describe the content of the offer, which we called form thicker to thiner descripion: “Category”, “Type1” and “Type2”.
What took the most of our time was to recode declared destinations and origins of offers because they were free text fields inputs given by the vendors. They contained many spelling mistakes, and unstandardized ways of declaring locations. Many indicated areas of differents scopes (cities, countries, continents, etc.) in varying forms (countrycode,english full name, french, etc.). In order to perform this cleaning step, we mainly used regular expression and the “unnnest_tokens” function from the tidytext mining package. Having an automatic way of doing those corrections would have been nice but the data was so messy that we were forced to operate by iteration and explorative ways. It was also a good opportunity to get in touch we the dataset. After this first cleaning step, we then checked for matches with fidex and standardized list of countries and regions provided by the “countrycode” and “worldmap” packages.
We decided to let apart the more detailed variables “Item” and “Item description”, because they were the results of inputs from the vendors in a totally unstandardized manner, which implies that they contain very messy text data and would have required a lot of preprocessing and cleaning to become useful for our analysis.
We also choose not to focus on prices and ratings. Analyse prices among different offers would have required to mine into the item description variables in order to find quantities contained in the deals and have commensurable data, such a process would have taken us more than the time we could allow for the data cleaning phase. In what concerns ratings, all were very highly graded and we figured it wouldn’t be the most interesting dimension to analyze as a first exploration of this dataset.
Our first thought were:
-109 Categories is too much
-We can remove frequence 1 categories
-We can regroup less frequent categories
-There are many catogries we can regroup
In this part we tried to regroup the 109 categories at different level in order to only have a few of them so we can use them more easily in the analysis.
## using regex for the first group
dark_market_data_cat <- dark_market_data %>% mutate(Group=gsub('^(Drugs|Services|Data|Info|Forgeries|Electronics|Weapons|Counterfeits|Tobacco|Chemicals|Drug paraphernalia){1}(/.*)','\\1',Category)) %>% mutate(Spec1=gsub('^(Drugs|Services|Data|Info|Forgeries|Electronics|Weapons|Counterfeits|Tobacco|Chemicals|Drug paraphernalia){1}(/.*)','\\2',Category))
dark_market_data_cat %>% count(Spec1, sort=TRUE)
#We still have 106 different categories
#utilisation des regex pour diviser les niveaux suivants (spec1 et spec2)
dark_market_data_cat <- dark_market_data_cat %>% mutate(Spec2=gsub('(/[^/]*)(.*)','\\1',Spec1)) %>% mutate(Spec3=gsub('(^/[^/]*/$)(/.*)','\\2',Spec1)) %>% mutate(Spec4=gsub('(/[^/]*)(.*)','\\2',Spec3)) %>% mutate(Spec3=gsub('(/[^/]*)(.*)','\\1',Spec3))
#checking results
dark_market_data_cat %>% count(Spec1, sort = TRUE)
dark_market_data_cat %>% count(Spec2, sort = TRUE)
dark_market_data_cat %>% count(Spec3, sort = TRUE)
dark_market_data_cat %>% count(Spec4,sort=TRUE)
#dropping everything that is useless
dark_market_data_cat <- dark_market_data_cat %>% mutate(Spec1=gsub('/','',Spec1),Spec2=gsub('/','',Spec2),Spec3=gsub('/','',Spec3),Spec4=gsub('/','',Spec4)) %>% mutate(Spec1=Spec3,Spec2=Spec4)
drop.cols <- c('Spec3','Spec4')
dark_market_data_cat <- dark_market_data_cat %>% select(-one_of(drop.cols))
dark_market_data_cat %>% transmute(Group,Spec1,Spec2)
dark_market_data <- dark_market_data_cat
### Some more Categories adjustments
#display per value freq:
dark_market_data %>% count(Category,sort=TRUE) %>% tail()
#remove values with only on occurence
dark_market_data <- dark_market_data %>% group_by(Category) %>% filter(n() > 1)
#check
dark_market_data %>% count(Category,sort=TRUE) %>% tail()
We had a lot of values for origins and destinations where they selected a few countries, and then add “no australia” for example. There was about 2848 values with those exceptions so we tried to filter them out.
We could see there’s more destinations than origins: we knew the cleaning workload would probably be heavier
#getting all the values with exceptions
pattern <- '(exc|exept)'
dark_market_data <- dark_market_data %>% mutate(excepts_dest = grepl(pattern,tolower(Destination)))
dark_market_data <- dark_market_data %>% mutate(excepts_orig = grepl(pattern,tolower(Origin)))
#find remaining "no .... country"
dark_market_data %>% transmute(no=grepl('no .*',Destination),tolower(Destination)) %>% filter(no==TRUE)
dark_market_data <- dark_market_data %>% mutate(no_dest = grepl('no .*',tolower(Destination)))
dark_market_data %>% transmute(no=grepl('no .*',Origin),tolower(Origin)) %>% filter(no==TRUE)
dark_market_data <- dark_market_data %>% mutate(no_orig = grepl('no .*',tolower(Origin)))
## Then filter every TRUE on both cols with excepts
dark_market_data_old <- dark_market_data
dark_market_data_old %>% filter(excepts_orig == TRUE | excepts_dest == TRUE | no_orig == TRUE | no_dest == TRUE) %>% transmute(Destination,Origin,ID,excepts_dest,excepts_orig)
dark_market_data <- dark_market_data %>% filter(excepts_orig == FALSE & excepts_dest == FALSE & no_orig == FALSE & no_dest == FALSE)
#SAVE what has been left apart in case we want to further investigate or clean it
dark_market_data_filtered_out <- dark_market_data_old %>% filter(excepts_orig == TRUE | excepts_dest == TRUE | no_orig == TRUE | no_dest == TRUE)
## *** SAVING LEFT APARTS ***
write_csv(dark_market_data_filtered_out, path = "../dark_market_filtered_out.csv")
# number of observations filtered out:
nrow(dark_market_data_old)-nrow(dark_market_data)
### SOME FIRST INSIGHTS
dark_market_data %>% summarise(n_obs=length(unique(ID)),n_cat=length(unique(Category)),n_vendors=length(unique(Vendor)),destinations=length(unique(Destination)),origins = length(unique(Origin))) %>% kable()
it is necessary to spot duplicates in the same line before unnest words, because it seperates words by spacechar, so if we don’t want to end up with two counts for same record i.g. switzerland switzerland, that will end up counted two times in at the end of the process, we need to make sure they aggregate in one word.
dark_market_data %>% count(Origin,sort=TRUE) %>% kable()
dark_market_data %>% count(Destination,sort=TRUE) %>% kable()
### list of countries composed by two words /duplicates to join, that will lose meaning after unnest
duplicates <- list(c('middle east','middleeast'),c('hong kong','hongkong'), c('netherlands netherlands','netherlands'),c('germany germany','germany'),c('uk uk','uk'),c('canada canada','canada'),c('eu schengen','eu'),c('united states','unitedstates'),c('s. america','southamerica'),c('usa usa','usa'),c('switzerland switzerland','switzerland'),c('world wide','worldwide'),c('worldwide international','worldwide'),c('international worldwide','worldwide'),c('new zealand','newzealand'),c('e.u. countries','eu'),c('everywhere worldwide','worldwide'),c('all','worldwide'),c('every where','worldwide'),c('worldwide any destination','worldwide'),c('* w o r l d w i d e *','worldwide'),c('planet earth','worldwide'),c('rest of the world','worldwide'),c('united kingdom','uk'))
## convert all destinations and origins to lower
dark_market_data <- dark_market_data %>% mutate(Origin = tolower(Origin),Destination = tolower(Destination))
dark_market_data %>% count(Origin,sort=TRUE) %>% kable()
dark_market_data %>% count(Destination,sort=TRUE) %>% kable()
## apply conversion to list of duplicates
for (i in duplicates) {
#print(i[1])
#print(i[2])
dark_market_data$Origin <- gsub(i[1], i[2],dark_market_data$Origin)}
for (i in duplicates) {dark_market_data$Destination <- gsub(i[1],i[2],dark_market_data$Destination)}
dark_market_data %>% count(Origin,sort=TRUE) %>% kable()
dark_market_data %>% count(Destination,sort=TRUE) %>% kable()
### unnest the words contained in destination and origin, add them to new dataset
df_origin <- dark_market_data %>% unnest_tokens(origin,Origin,drop=FALSE)
df_destination <- dark_market_data %>% unnest_tokens(destination,Destination,drop=FALSE)
df_origin %>% tail() %>% kable()
df_destination %>% tail() %>% kable()
### anti_join with a customed dataframe of stop_words
stop_words %>% kable()
custom_stop_words <- stop_words %>% filter(word != 'us' & word != 'state' & word != 'states')
custom_stop_words %>% kable()
stop_words_origin <- custom_stop_words %>% mutate(origin=word)
stop_words_destination <- custom_stop_words %>% mutate(destination=word)
df_origin <- df_origin %>% anti_join(stop_words_origin,by='origin')
df_destination <- df_destination %>% anti_join(stop_words_destination,by='destination')
### check and comparison of what has been done for now
df_origin %>% count(Origin,sort=TRUE) %>% kable()
df_origin %>% count(origin, sort = TRUE) %>% kable()
df_destination %>% count(Destination,sort=TRUE) %>% kable()
df_destination %>% count(destination, sort = TRUE) %>% kable()
We had then to normalize the datas for the origin and destination columns in order to produce graphs and maps in the analysis part of the report. We won’t provide the source code in this report because it’s quite long and not that interresting since it’s mostly removing typos, words that aren’t countries and at the end formatting everything the same way to use it later.
You can find the whole process in the source code linked with this report to check how it is done
Now that the cleaning part is done, we wanted to save our results in different dataframes, you can see them just below.
This part took us a lot of time because users were not really careful while entering information on Agora, but it’s really practical to use now that we cleaned it
After having applied the necessary modifications, we were ready to pursue our project’s goals and begin explorative analysis of the cleaned dataset. This cleaned data is composed of a main dataset where each observations correspond to a unique offer posted on the Agora marketplace and 2 others datasets composed of origins and destinations mentioned in the offers, each record corresponding to a declared unique location(origin/destination) for a specific offer.
The main dataset, which we’ll call “dark market dataset”, is composed of 106337 observations. The dataset of declared origins is composed of 61707 mentions of places and the dataset for declared destinations is composed of 99988 mentions.
The table below shows the list of remaining variables and the number of distinct values they’re composed of.
| ID | Vendor | Category | Item | Item Description | Price | Rating | Type1 | Type2 | Offer_ID | dest_list | orig_list |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 106337 | 3166 | 15 | 104348 | 67997 | 99320 | 478 | 46 | 62 | 106337 | 130 | 153 |
As we can see, numbers of distinct items and offer_IDs are very similar, partly because the provider of the dataset mentioned having curated the duplicates, but this could also mean that each offer describes the item differently or that it concerns an item of different nature everytime. We can also see that the descriptions of those items vary a lot, there are approximatively 2 items per descriptions.
The following tables show the different places declared as origins and destinations and their respective counts:
| clean_destination | n |
|---|---|
| africa | 24 |
| asia | 101 |
| australia | 4743 |
| austria | 10 |
| belgium | 41 |
| brazil | 3 |
| canada | 1316 |
| china | 7 |
| denmark | 34 |
| europe | 6486 |
| finland | 63 |
| france | 231 |
| germany | 1122 |
| Grenada | 2 |
| Hong Kong SAR China | 26 |
| hungary | 11 |
| india | 9 |
| internet | 15 |
| iraq | 2 |
| ireland | 110 |
| israel | 19 |
| italy | 4 |
| Japan | 7 |
| luxembourg | 4 |
| mexico | 9 |
| mississippi | 2 |
| netherlands | 76 |
| New Zealand | 230 |
| norway | 198 |
| oceania | 61 |
| other | 428 |
| Philippines | 21 |
| poland | 8 |
| scandinavia | 244 |
| Singapore | 7 |
| spain | 10 |
| sweden | 379 |
| switzerland | 313 |
| thailand | 19 |
| uk | 3607 |
| usa | 18363 |
| worldwide | 23342 |
| clean_origin | n |
|---|---|
| africa | 22 |
| argentina | 85 |
| asia | 110 |
| australia | 8860 |
| austria | 252 |
| belarus | 9 |
| belgium | 1171 |
| belize | 2 |
| bolivia | 24 |
| brazil | 20 |
| cambodia | 2 |
| canada | 5518 |
| Cayman Islands | 2 |
| china | 4186 |
| croatia | 4 |
| czech republic | 5 |
| czech republicrepublic | 273 |
| denmark | 383 |
| Dominican Republic | 9 |
| estonia | 4 |
| europe | 5082 |
| fiji | 11 |
| finland | 67 |
| france | 844 |
| germany | 7685 |
| guatemala | 5 |
| Hong Kong SAR China | 417 |
| hungary | 8 |
| india | 1122 |
| internet | 4010 |
| ireland | 269 |
| israel | 19 |
| italy | 279 |
| japan | 4 |
| Japan | 7 |
| latvia | 9 |
| lithuania | 6 |
| luxembourg | 2 |
| mexico | 105 |
| morocco | 3 |
| netherlands | 6701 |
| New Zealand | 180 |
| North America | 15 |
| northamerica | 2 |
| norway | 336 |
| oceania | 61 |
| other | 137 |
| pakistan | 55 |
| panama | 8 |
| peru | 7 |
| Philippines | 278 |
| poland | 111 |
| romania | 2 |
| scandinavia | 55 |
| serbia | 2 |
| seychelles | 4 |
| singapore | 14 |
| Singapore | 7 |
| slovakia | 9 |
| South Africa | 177 |
| spain | 442 |
| St. Vincent & Grenadines | 2 |
| swaziland | 2 |
| sweden | 1059 |
| switzerland | 451 |
| thailand | 66 |
| Thailand | 3 |
| uk | 11346 |
| ukraine | 127 |
| usa | 34962 |
| worldwide | 2472 |
Here are the different categories present in our cleaned dataset.
| Category | Type1_list | Type2_list |
|---|---|---|
| Chemicals | Chemicals | Chemicals |
| Counterfeits | Accessories, Clothing, Electronics, Money, Watches | |
| Data | Accounts, Pirated, Software | |
| Drug paraphernalia | Containers, Grinders, Injecting equipment, Paper, Pipes, Scales, Stashes | Filters, Needles, Other, Syringes |
| Drugs | Barbiturates, Benzos, Cannabis, Dissociatives, Ecstasy, Opioids, Other, Prescription, Psychedelics, RCs, Steroids, Stimulants, Weight loss | 2C, 5-MeO, Buprenorphine, Cocaine, Codeine, Concentrates, Dihydrocodeine, DMT, Edibles, Fentanyl, GBL, GHB, Hash, Heroin, Hydrocodone, Ketamine, LSD, MDA, MDMA, Mephedrone, Mescaline, Meth, Morphine, Mushrooms, MXE, NB, Opium, Other, Others, Oxycodone, PCP, Pills, Prescription, Salvia, Seeds, Shaketrim, Speed, Spores, Synthetics, Weed |
| Electronics | Electronics | Electronics |
| Forgeries | Forgeries, Other, Physical documents, Scans | Forgeries, Photos |
| Info | eBooks | AliensUFOs, Anonymity, Doomsday, Drugs, Economy, IT, Making money, Other, Philosophy, Politics, Psychology, RelationshipsSex, Science |
| Information/eBooks | InformationeBooks | Information |
| Information/Guides | InformationGuides | Information |
| Jewelry | Jewelry | Jewelry |
| Other | Other | Other |
| Services | Advertising, Hacking, Money, Other, Travel | |
| Tobacco | Paraphernalia, Smoked | |
| Weapons | Ammunition, Fireworks, Lethal firearms, Melee, Non-lethal firearms |
| Category | Type1 | Type2_list |
|---|---|---|
| Chemicals | Chemicals | Chemicals |
| Counterfeits | Accessories | |
| Counterfeits | Clothing | |
| Counterfeits | Electronics | |
| Counterfeits | Money | |
| Counterfeits | Watches | |
| Data | Accounts | |
| Data | Pirated | |
| Data | Software | |
| Drug paraphernalia | Containers | |
| Drug paraphernalia | Grinders | |
| Drug paraphernalia | Injecting equipment | Filters, Needles, Other, Syringes |
| Drug paraphernalia | Paper | |
| Drug paraphernalia | Pipes | |
| Drug paraphernalia | Scales | |
| Drug paraphernalia | Stashes | |
| Drugs | Barbiturates | |
| Drugs | Benzos | |
| Drugs | Cannabis | Concentrates, Edibles, Hash, Seeds, Shaketrim, Synthetics, Weed |
| Drugs | Dissociatives | GBL, GHB, Ketamine, MXE, Other, PCP |
| Drugs | Ecstasy | MDA, MDMA, Other, Pills |
| Drugs | Opioids | Buprenorphine, Codeine, Dihydrocodeine, Fentanyl, Heroin, Hydrocodone, Morphine, Opium, Other, Oxycodone |
| Drugs | Other | |
| Drugs | Prescription | |
| Drugs | Psychedelics | 2C, 5-MeO, DMT, LSD, Mescaline, Mushrooms, NB, Other, Others, Salvia, Spores |
| Drugs | RCs | |
| Drugs | Steroids | |
| Drugs | Stimulants | Cocaine, Mephedrone, Meth, Prescription, Speed |
| Drugs | Weight loss | |
| Electronics | Electronics | Electronics |
| Forgeries | Forgeries | Forgeries |
| Forgeries | Other | |
| Forgeries | Physical documents | |
| Forgeries | Scans | Photos |
| Info | eBooks | AliensUFOs, Anonymity, Doomsday, Drugs, Economy, IT, Making money, Other, Philosophy, Politics, Psychology, RelationshipsSex, Science |
| Information/eBooks | InformationeBooks | Information |
| Information/Guides | InformationGuides | Information |
| Jewelry | Jewelry | Jewelry |
| Other | Other | Other |
| Services | Advertising | |
| Services | Hacking | |
| Services | Money | |
| Services | Other | |
| Services | Travel | |
| Tobacco | Paraphernalia | |
| Tobacco | Smoked | |
| Weapons | Ammunition | |
| Weapons | Fireworks | |
| Weapons | Lethal firearms | |
| Weapons | Melee | |
| Weapons | Non-lethal firearms |
And here are the number of distinct values for each group
| Category | Type1 | Type2 | Vendor | dest_list | orig_list | Item | Offer_ID |
|---|---|---|---|---|---|---|---|
| Drugs | 13 | 41 | 2923 | 121 | 143 | 88573 | 89697 |
| Services | 5 | 1 | 322 | 10 | 30 | 2557 | 2642 |
| Counterfeits | 5 | 1 | 102 | 8 | 25 | 2316 | 2367 |
| Info | 1 | 13 | 88 | 6 | 11 | 2023 | 2169 |
| Data | 3 | 1 | 145 | 9 | 21 | 1910 | 2118 |
| Other | 1 | 1 | 347 | 15 | 23 | 1402 | 1425 |
| Forgeries | 4 | 3 | 108 | 6 | 20 | 1018 | 1051 |
| Information/Guides | 1 | 1 | 76 | 5 | 10 | 908 | 927 |
| Information/eBooks | 1 | 1 | 67 | 5 | 10 | 895 | 918 |
| Drug paraphernalia | 7 | 5 | 80 | 9 | 13 | 838 | 840 |
| Weapons | 5 | 1 | 81 | 16 | 24 | 655 | 656 |
| Electronics | 1 | 1 | 123 | 12 | 16 | 594 | 599 |
| Tobacco | 2 | 1 | 40 | 6 | 13 | 383 | 420 |
| Jewelry | 1 | 1 | 24 | 3 | 12 | 415 | 418 |
| Chemicals | 1 | 1 | 18 | 6 | 10 | 90 | 90 |
The categories with higher number of offers are Drugs (by far), then come Services, Counterfeits, Infos and Data in way more moderate proportions.
It would be interesting to normalize quantities of distinct vendors, destinations and origins by the number of offers in each category (i.g. express them as percentages) to see if there are some different trends within the categories.
For all categories, we can observe there are higher numbers of distinct origins declared than distinct destinations. It would be interesting to further investigate why.
| Category | Type1 | Type2 | Vendor | dest_list | orig_list | Item | Offer_ID |
|---|---|---|---|---|---|---|---|
| Drugs | Cannabis | 7 | 1365 | 69 | 79 | 30052 | 30280 |
| Drugs | Ecstasy | 4 | 968 | 53 | 55 | 13672 | 13867 |
| Drugs | Stimulants | 5 | 1223 | 65 | 69 | 12013 | 12196 |
| Drugs | Psychedelics | 11 | 634 | 31 | 42 | 8010 | 8084 |
| Drugs | Opioids | 11 | 721 | 37 | 51 | 6609 | 6675 |
| Drugs | Prescription | 1 | 570 | 32 | 47 | 5489 | 5556 |
| Drugs | Benzos | 1 | 491 | 32 | 46 | 5322 | 5384 |
| Drugs | Steroids | 1 | 139 | 18 | 28 | 2716 | 2761 |
| Info | eBooks | 13 | 88 | 6 | 11 | 2023 | 2169 |
| Drugs | RCs | 1 | 138 | 17 | 23 | 2064 | 2092 |
| Drugs | Dissociatives | 6 | 262 | 18 | 28 | 1601 | 1659 |
| Services | Money | 1 | 224 | 9 | 24 | 1445 | 1481 |
| Other | Other | 1 | 347 | 15 | 23 | 1402 | 1425 |
| Counterfeits | Watches | 1 | 18 | 5 | 10 | 1264 | 1309 |
| Data | Accounts | 1 | 95 | 8 | 19 | 1032 | 1233 |
| Information/Guides | InformationGuides | 1 | 76 | 5 | 10 | 908 | 927 |
| Information/eBooks | InformationeBooks | 1 | 67 | 5 | 10 | 895 | 918 |
| Drugs | Other | 1 | 230 | 17 | 34 | 861 | 864 |
| Forgeries | Physical documents | 1 | 77 | 6 | 15 | 607 | 616 |
| Electronics | Electronics | 1 | 123 | 12 | 16 | 594 | 599 |
| Data | Pirated | 1 | 34 | 4 | 6 | 526 | 529 |
| Services | Other | 1 | 126 | 8 | 16 | 477 | 487 |
| Services | Hacking | 1 | 63 | 5 | 10 | 428 | 453 |
| Jewelry | Jewelry | 1 | 24 | 3 | 12 | 415 | 418 |
| Tobacco | Smoked | 1 | 32 | 6 | 11 | 356 | 393 |
| Counterfeits | Money | 1 | 62 | 7 | 17 | 384 | 385 |
| Counterfeits | Clothing | 1 | 14 | 2 | 7 | 359 | 364 |
| Data | Software | 1 | 68 | 7 | 9 | 353 | 356 |
| Weapons | Lethal firearms | 1 | 51 | 13 | 16 | 343 | 344 |
| Forgeries | Scans | 1 | 48 | 4 | 10 | 319 | 327 |
| Counterfeits | Accessories | 1 | 25 | 5 | 14 | 250 | 250 |
| Drugs | Weight loss | 1 | 67 | 11 | 21 | 246 | 249 |
| Drug paraphernalia | Pipes | 1 | 42 | 8 | 10 | 195 | 195 |
| Drug paraphernalia | Containers | 1 | 18 | 5 | 7 | 185 | 186 |
| Drug paraphernalia | Stashes | 1 | 10 | 5 | 5 | 149 | 149 |
| Weapons | Ammunition | 1 | 23 | 8 | 11 | 138 | 138 |
| Services | Advertising | 1 | 15 | 3 | 5 | 131 | 131 |
| Drug paraphernalia | Grinders | 1 | 4 | 6 | 7 | 106 | 106 |
| Weapons | Melee | 1 | 16 | 6 | 12 | 103 | 103 |
| Forgeries | Other | 1 | 25 | 3 | 8 | 100 | 100 |
| Drug paraphernalia | Injecting equipment | 4 | 25 | 5 | 6 | 96 | 96 |
| Chemicals | Chemicals | 1 | 18 | 6 | 10 | 90 | 90 |
| Services | Travel | 1 | 6 | 2 | 4 | 90 | 90 |
| Drug paraphernalia | Paper | 1 | 3 | 3 | 4 | 61 | 61 |
| Counterfeits | Electronics | 1 | 16 | 3 | 11 | 59 | 59 |
| Weapons | Non-lethal firearms | 1 | 11 | 6 | 9 | 57 | 57 |
| Drug paraphernalia | Scales | 1 | 9 | 5 | 4 | 46 | 47 |
| Drugs | Barbiturates | 1 | 13 | 4 | 7 | 30 | 30 |
| Tobacco | Paraphernalia | 1 | 10 | 3 | 5 | 27 | 27 |
| Weapons | Fireworks | 1 | 9 | 3 | 6 | 14 | 14 |
| Forgeries | Forgeries | 1 | 4 | 1 | 4 | 8 | 8 |
| Category | Type1 | Type2 | Vendor | dest_list | orig_list | Item | Offer_ID |
|---|---|---|---|---|---|---|---|
| Drugs | Cannabis | Weed | 1124 | 58 | 69 | 20584 | 20747 |
| Drugs | Ecstasy | Pills | 448 | 34 | 36 | 6759 | 6798 |
| Drugs | Ecstasy | MDMA | 753 | 47 | 45 | 5651 | 5782 |
| Drugs | Stimulants | Cocaine | 696 | 45 | 51 | 5502 | 5603 |
| Drugs | Prescription | NA | 570 | 32 | 47 | 5489 | 5556 |
| Drugs | Benzos | NA | 491 | 32 | 46 | 5322 | 5384 |
| Drugs | Cannabis | Concentrates | 331 | 22 | 26 | 4221 | 4247 |
| Drugs | Psychedelics | LSD | 327 | 25 | 33 | 3539 | 3564 |
| Drugs | Cannabis | Hash | 383 | 37 | 38 | 2948 | 2969 |
| Drugs | Steroids | NA | 139 | 18 | 28 | 2716 | 2761 |
| Drugs | Stimulants | Meth | 320 | 23 | 31 | 2392 | 2427 |
| Drugs | Stimulants | Speed | 310 | 33 | 30 | 2164 | 2178 |
| Drugs | RCs | NA | 138 | 17 | 23 | 2064 | 2092 |
| Drugs | Stimulants | Prescription | 335 | 22 | 30 | 1929 | 1955 |
| Drugs | Opioids | Heroin | 240 | 19 | 19 | 1688 | 1693 |
| Services | Money | NA | 224 | 9 | 24 | 1445 | 1481 |
| Other | Other | Other | 347 | 15 | 23 | 1402 | 1425 |
| Drugs | Opioids | Oxycodone | 241 | 18 | 27 | 1334 | 1343 |
| Counterfeits | Watches | NA | 18 | 5 | 10 | 1264 | 1309 |
| Data | Accounts | NA | 95 | 8 | 19 | 1032 | 1233 |
| Drugs | Opioids | NA | 235 | 20 | 27 | 1204 | 1207 |
| Drugs | Psychedelics | Mushrooms | 183 | 12 | 18 | 1122 | 1127 |
| Drugs | Cannabis | Edibles | 134 | 13 | 15 | 1097 | 1101 |
| Drugs | Ecstasy | Other | 114 | 13 | 17 | 957 | 966 |
| Drugs | Psychedelics | NB | 81 | 12 | 20 | 958 | 959 |
| Information/Guides | InformationGuides | Information | 76 | 5 | 10 | 908 | 927 |
| Information/eBooks | InformationeBooks | Information | 67 | 5 | 10 | 895 | 918 |
| Drugs | Psychedelics | 2C | 119 | 9 | 18 | 905 | 917 |
| Drugs | Dissociatives | Ketamine | 149 | 15 | 24 | 873 | 906 |
| Drugs | Other | NA | 230 | 17 | 34 | 861 | 864 |
| Drugs | Opioids | Fentanyl | 99 | 11 | 13 | 833 | 848 |
| Drugs | Psychedelics | DMT | 126 | 16 | 18 | 694 | 723 |
| Info | eBooks | Other | 45 | 4 | 7 | 673 | 691 |
| Drugs | Cannabis | Synthetics | 55 | 10 | 19 | 635 | 637 |
| Drugs | Opioids | Other | 165 | 16 | 23 | 628 | 631 |
| Forgeries | Physical documents | NA | 77 | 6 | 15 | 607 | 616 |
| Electronics | Electronics | Electronics | 123 | 12 | 16 | 594 | 599 |
| Data | Pirated | NA | 34 | 4 | 6 | 526 | 529 |
| Services | Other | NA | 126 | 8 | 16 | 477 | 487 |
| Drugs | Cannabis | Seeds | 60 | 10 | 11 | 458 | 458 |
| Services | Hacking | NA | 63 | 5 | 10 | 428 | 453 |
| Jewelry | Jewelry | Jewelry | 24 | 3 | 12 | 415 | 418 |
| Drugs | Dissociatives | MXE | 77 | 9 | 15 | 380 | 404 |
| Tobacco | Smoked | NA | 32 | 6 | 11 | 356 | 393 |
| Counterfeits | Money | NA | 62 | 7 | 17 | 384 | 385 |
| Counterfeits | Clothing | NA | 14 | 2 | 7 | 359 | 364 |
| Data | Software | NA | 68 | 7 | 9 | 353 | 356 |
| Weapons | Lethal firearms | NA | 51 | 13 | 16 | 343 | 344 |
| Forgeries | Scans | Photos | 48 | 4 | 10 | 319 | 327 |
| Drugs | Ecstasy | MDA | 62 | 10 | 13 | 310 | 321 |
| Info | eBooks | Making money | 46 | 4 | 6 | 307 | 313 |
| Info | eBooks | Drugs | 26 | 3 | 6 | 278 | 289 |
| Drugs | Opioids | Buprenorphine | 72 | 12 | 16 | 278 | 282 |
| Drugs | Psychedelics | Other | 52 | 9 | 15 | 272 | 272 |
| Counterfeits | Accessories | NA | 25 | 5 | 14 | 250 | 250 |
| Drugs | Weight loss | NA | 67 | 11 | 21 | 246 | 249 |
| Drugs | Opioids | Morphine | 68 | 11 | 19 | 246 | 248 |
| Drugs | Psychedelics | 5-MeO | 35 | 8 | 11 | 212 | 213 |
| Drugs | Dissociatives | GHB | 42 | 9 | 12 | 205 | 206 |
| Info | eBooks | Anonymity | 23 | 3 | 4 | 199 | 204 |
| Drug paraphernalia | Pipes | NA | 42 | 8 | 10 | 195 | 195 |
| Drugs | Opioids | Hydrocodone | 61 | 7 | 6 | 190 | 190 |
| Drug paraphernalia | Containers | NA | 18 | 5 | 7 | 185 | 186 |
| Info | eBooks | Science | 12 | 3 | 4 | 155 | 163 |
| Drug paraphernalia | Stashes | NA | 10 | 5 | 5 | 149 | 149 |
| Info | eBooks | RelationshipsSex | 16 | 3 | 4 | 141 | 145 |
| Info | eBooks | IT | 23 | 3 | 3 | 142 | 144 |
| Weapons | Ammunition | NA | 23 | 8 | 11 | 138 | 138 |
| Services | Advertising | NA | 15 | 3 | 5 | 131 | 131 |
| Drugs | Cannabis | Shaketrim | 36 | 6 | 8 | 121 | 121 |
| Drug paraphernalia | Grinders | NA | 4 | 6 | 7 | 106 | 106 |
| Drugs | Psychedelics | Others | 26 | 7 | 11 | 106 | 106 |
| Weapons | Melee | NA | 16 | 6 | 12 | 103 | 103 |
| Forgeries | Other | NA | 25 | 3 | 8 | 100 | 100 |
| Drugs | Opioids | Codeine | 37 | 7 | 14 | 92 | 92 |
| Chemicals | Chemicals | Chemicals | 18 | 6 | 10 | 90 | 90 |
| Services | Travel | NA | 6 | 2 | 4 | 90 | 90 |
| Drugs | Opioids | Opium | 34 | 11 | 14 | 85 | 87 |
| Drugs | Psychedelics | Mescaline | 28 | 7 | 12 | 86 | 86 |
| Drugs | Psychedelics | Spores | 14 | 5 | 6 | 79 | 80 |
| Drugs | Dissociatives | GBL | 20 | 6 | 11 | 76 | 76 |
| Info | eBooks | Economy | 13 | 4 | 3 | 75 | 76 |
| Drugs | Dissociatives | Other | 11 | 5 | 6 | 63 | 63 |
| Drug paraphernalia | Paper | NA | 3 | 3 | 4 | 61 | 61 |
| Counterfeits | Electronics | NA | 16 | 3 | 11 | 59 | 59 |
| Weapons | Non-lethal firearms | NA | 11 | 6 | 9 | 57 | 57 |
| Drugs | Opioids | Dihydrocodeine | 8 | 3 | 4 | 54 | 54 |
| Drug paraphernalia | Scales | NA | 9 | 5 | 4 | 46 | 47 |
| Drug paraphernalia | Injecting equipment | Syringes | 15 | 5 | 3 | 45 | 45 |
| Info | eBooks | Doomsday | 9 | 2 | 2 | 42 | 43 |
| Info | eBooks | Psychology | 11 | 3 | 2 | 40 | 40 |
| Drugs | Psychedelics | Salvia | 15 | 3 | 8 | 37 | 37 |
| Drugs | Stimulants | Mephedrone | 11 | 8 | 6 | 33 | 33 |
| Drug paraphernalia | Injecting equipment | Other | 7 | 3 | 5 | 30 | 30 |
| Drugs | Barbiturates | NA | 13 | 4 | 7 | 30 | 30 |
| Tobacco | Paraphernalia | NA | 10 | 3 | 5 | 27 | 27 |
| Info | eBooks | Politics | 7 | 3 | 2 | 26 | 26 |
| Info | eBooks | Philosophy | 5 | 3 | 3 | 25 | 25 |
| Drug paraphernalia | Injecting equipment | Needles | 7 | 4 | 3 | 15 | 15 |
| Weapons | Fireworks | NA | 9 | 3 | 6 | 14 | 14 |
| Info | eBooks | AliensUFOs | 5 | 3 | 3 | 10 | 10 |
| Forgeries | Forgeries | Forgeries | 4 | 1 | 4 | 8 | 8 |
| Drug paraphernalia | Injecting equipment | Filters | 4 | 2 | 2 | 6 | 6 |
| Drugs | Dissociatives | PCP | 2 | 2 | 2 | 4 | 4 |
The products with higher numbers of offers are almost only drugs, with weed, ectasy, and cocaine in the top 4, which are the more commonly used drugs according to the litterature. An explanation could be that the demand for those products is high and thus there’s a good incentive to propose them on the web. Interestingly, prescriptions is also highly available, but it could be due to the fact that it encompasses a lot of different products in one label (further analysis of the item description variable could be made).
Outside of drug products, the most proposed products and services are “money” (just by looking at this category it’s hard to know what it relates to and it deserves further investigations to clarify what it relates to.), counterfeited watches, stolen data of hacked accounts, and guides and ebooks, which interestingly highlight the fact that darknet markets are not only place of illegal trade, but also serve as places for accessing informations, learning skills and give tips for crime commitment. Forgeries, fake official papers and counterfeited goods also show a substantial share of the offers.
| Category | n | freq |
|---|---|---|
| Drugs | 89697 | 0.8435164 |
| Services | 2642 | 0.0248455 |
| Counterfeits | 2367 | 0.0222594 |
| Info | 2169 | 0.0203974 |
| Data | 2118 | 0.0199178 |
| Other | 1425 | 0.0134008 |
| Forgeries | 1051 | 0.0098837 |
| Information/Guides | 927 | 0.0087176 |
| Information/eBooks | 918 | 0.0086329 |
| Drug paraphernalia | 840 | 0.0078994 |
| Weapons | 656 | 0.0061691 |
| Electronics | 599 | 0.0056330 |
| Tobacco | 420 | 0.0039497 |
| Jewelry | 418 | 0.0039309 |
| Chemicals | 90 | 0.0008464 |
| Type1 | n | freq |
|---|---|---|
| Cannabis | 30280 | 0.2847551 |
| Ecstasy | 13867 | 0.1304062 |
| Stimulants | 12196 | 0.1146920 |
| Psychedelics | 8084 | 0.0760225 |
| Opioids | 6675 | 0.0627721 |
| Prescription | 5556 | 0.0522490 |
| Benzos | 5384 | 0.0506315 |
| Other | 2876 | 0.0270461 |
| Steroids | 2761 | 0.0259646 |
| eBooks | 2169 | 0.0203974 |
| RCs | 2092 | 0.0196733 |
| Money | 1866 | 0.0175480 |
| Dissociatives | 1659 | 0.0156013 |
| Watches | 1309 | 0.0123099 |
| Accounts | 1233 | 0.0115952 |
| InformationGuides | 927 | 0.0087176 |
| InformationeBooks | 918 | 0.0086329 |
| Electronics | 658 | 0.0061879 |
| Physical documents | 616 | 0.0057929 |
| Pirated | 529 | 0.0049748 |
| Hacking | 453 | 0.0042600 |
| Jewelry | 418 | 0.0039309 |
| Smoked | 393 | 0.0036958 |
| Clothing | 364 | 0.0034231 |
| Software | 356 | 0.0033478 |
| Lethal firearms | 344 | 0.0032350 |
| Scans | 327 | 0.0030751 |
| Accessories | 250 | 0.0023510 |
| Weight loss | 249 | 0.0023416 |
| Pipes | 195 | 0.0018338 |
| Containers | 186 | 0.0017492 |
| Stashes | 149 | 0.0014012 |
| Ammunition | 138 | 0.0012978 |
| Advertising | 131 | 0.0012319 |
| Grinders | 106 | 0.0009968 |
| Melee | 103 | 0.0009686 |
| Injecting equipment | 96 | 0.0009028 |
| Chemicals | 90 | 0.0008464 |
| Travel | 90 | 0.0008464 |
| Paper | 61 | 0.0005736 |
| Non-lethal firearms | 57 | 0.0005360 |
| Scales | 47 | 0.0004420 |
| Barbiturates | 30 | 0.0002821 |
| Paraphernalia | 27 | 0.0002539 |
| Fireworks | 14 | 0.0001317 |
| Forgeries | 8 | 0.0000752 |
| Category | Type1 | n | freq |
|---|---|---|---|
| Drugs | Cannabis | 30280 | 0.3375810 |
| Drugs | Ecstasy | 13867 | 0.1545983 |
| Drugs | Stimulants | 12196 | 0.1359689 |
| Drugs | Psychedelics | 8084 | 0.0901256 |
| Drugs | Opioids | 6675 | 0.0744172 |
| Drugs | Prescription | 5556 | 0.0619419 |
| Drugs | Benzos | 5384 | 0.0600243 |
| Drugs | Steroids | 2761 | 0.0307814 |
| Drugs | RCs | 2092 | 0.0233230 |
| Drugs | Dissociatives | 1659 | 0.0184956 |
| Drugs | Other | 864 | 0.0096324 |
| Drugs | Weight loss | 249 | 0.0027760 |
| Drugs | Barbiturates | 30 | 0.0003345 |
| Type1 | Type2 | n | freq |
|---|---|---|---|
| Accessories | NA | 250 | 1.0000000 |
| Accounts | NA | 1233 | 1.0000000 |
| Advertising | NA | 131 | 1.0000000 |
| Ammunition | NA | 138 | 1.0000000 |
| Chemicals | Chemicals | 90 | 1.0000000 |
| Clothing | NA | 364 | 1.0000000 |
| Containers | NA | 186 | 1.0000000 |
| eBooks | Other | 691 | 0.3185800 |
| eBooks | Making money | 313 | 0.1443061 |
| eBooks | Drugs | 289 | 0.1332411 |
| eBooks | Anonymity | 204 | 0.0940526 |
| eBooks | Science | 163 | 0.0751498 |
| eBooks | RelationshipsSex | 145 | 0.0668511 |
| eBooks | IT | 144 | 0.0663900 |
| eBooks | Economy | 76 | 0.0350392 |
| eBooks | Doomsday | 43 | 0.0198248 |
| eBooks | Psychology | 40 | 0.0184417 |
| eBooks | Politics | 26 | 0.0119871 |
| eBooks | Philosophy | 25 | 0.0115260 |
| eBooks | AliensUFOs | 10 | 0.0046104 |
| Electronics | Electronics | 599 | 0.9103343 |
| Electronics | NA | 59 | 0.0896657 |
| Fireworks | NA | 14 | 1.0000000 |
| Forgeries | Forgeries | 8 | 1.0000000 |
| Grinders | NA | 106 | 1.0000000 |
| Hacking | NA | 453 | 1.0000000 |
| InformationeBooks | Information | 918 | 1.0000000 |
| InformationGuides | Information | 927 | 1.0000000 |
| Injecting equipment | Syringes | 45 | 0.4687500 |
| Injecting equipment | Other | 30 | 0.3125000 |
| Injecting equipment | Needles | 15 | 0.1562500 |
| Injecting equipment | Filters | 6 | 0.0625000 |
| Jewelry | Jewelry | 418 | 1.0000000 |
| Lethal firearms | NA | 344 | 1.0000000 |
| Melee | NA | 103 | 1.0000000 |
| Money | NA | 1866 | 1.0000000 |
| Non-lethal firearms | NA | 57 | 1.0000000 |
| Other | Other | 1425 | 0.7082505 |
| Other | NA | 587 | 0.2917495 |
| Paper | NA | 61 | 1.0000000 |
| Paraphernalia | NA | 27 | 1.0000000 |
| Physical documents | NA | 616 | 1.0000000 |
| Pipes | NA | 195 | 1.0000000 |
| Pirated | NA | 529 | 1.0000000 |
| Scales | NA | 47 | 1.0000000 |
| Scans | Photos | 327 | 1.0000000 |
| Smoked | NA | 393 | 1.0000000 |
| Software | NA | 356 | 1.0000000 |
| Stashes | NA | 149 | 1.0000000 |
| Travel | NA | 90 | 1.0000000 |
| Watches | NA | 1309 | 1.0000000 |
##### FIRST VISUALIZATION OF THE DESTINATIONS and Origins
destinations <- df_destinations %>% count(clean_destination , sort = TRUE) %>% filter(n>1)
destinations %>% kable()
| clean_destination | n |
|---|---|
| worldwide | 23342 |
| usa | 18363 |
| europe | 6486 |
| australia | 4743 |
| uk | 3607 |
| canada | 1316 |
| germany | 1122 |
| other | 428 |
| sweden | 379 |
| switzerland | 313 |
| scandinavia | 244 |
| france | 231 |
| New Zealand | 230 |
| norway | 198 |
| ireland | 110 |
| asia | 101 |
| netherlands | 76 |
| finland | 63 |
| oceania | 61 |
| belgium | 41 |
| denmark | 34 |
| Hong Kong SAR China | 26 |
| africa | 24 |
| Philippines | 21 |
| israel | 19 |
| thailand | 19 |
| internet | 15 |
| hungary | 11 |
| austria | 10 |
| spain | 10 |
| india | 9 |
| mexico | 9 |
| poland | 8 |
| china | 7 |
| Japan | 7 |
| Singapore | 7 |
| italy | 4 |
| luxembourg | 4 |
| brazil | 3 |
| Grenada | 2 |
| iraq | 2 |
| mississippi | 2 |
origins <- df_origins %>% count(clean_origin , sort = TRUE) %>% filter(n>1)
origins %>% kable()
| clean_origin | n |
|---|---|
| usa | 34962 |
| uk | 11346 |
| australia | 8860 |
| germany | 7685 |
| netherlands | 6701 |
| canada | 5518 |
| europe | 5082 |
| china | 4186 |
| internet | 4010 |
| worldwide | 2472 |
| belgium | 1171 |
| india | 1122 |
| sweden | 1059 |
| france | 844 |
| switzerland | 451 |
| spain | 442 |
| Hong Kong SAR China | 417 |
| denmark | 383 |
| norway | 336 |
| italy | 279 |
| Philippines | 278 |
| czech republicrepublic | 273 |
| ireland | 269 |
| austria | 252 |
| New Zealand | 180 |
| South Africa | 177 |
| other | 137 |
| ukraine | 127 |
| poland | 111 |
| asia | 110 |
| mexico | 105 |
| argentina | 85 |
| finland | 67 |
| thailand | 66 |
| oceania | 61 |
| pakistan | 55 |
| scandinavia | 55 |
| bolivia | 24 |
| africa | 22 |
| brazil | 20 |
| israel | 19 |
| North America | 15 |
| singapore | 14 |
| fiji | 11 |
| belarus | 9 |
| Dominican Republic | 9 |
| latvia | 9 |
| slovakia | 9 |
| hungary | 8 |
| panama | 8 |
| Japan | 7 |
| peru | 7 |
| Singapore | 7 |
| lithuania | 6 |
| czech republic | 5 |
| guatemala | 5 |
| croatia | 4 |
| estonia | 4 |
| japan | 4 |
| seychelles | 4 |
| morocco | 3 |
| Thailand | 3 |
| belize | 2 |
| cambodia | 2 |
| Cayman Islands | 2 |
| luxembourg | 2 |
| northamerica | 2 |
| romania | 2 |
| serbia | 2 |
| St. Vincent & Grenadines | 2 |
| swaziland | 2 |
##frequencies
destinations <- destinations %>% mutate(freq=n()/sum(n)) %>% arrange(-freq) %>% mutate(place=clean_destination)
kable(destinations)
| clean_destination | n | freq | place |
|---|---|---|---|
| worldwide | 23342 | 0.0006806 | worldwide |
| usa | 18363 | 0.0006806 | usa |
| europe | 6486 | 0.0006806 | europe |
| australia | 4743 | 0.0006806 | australia |
| uk | 3607 | 0.0006806 | uk |
| canada | 1316 | 0.0006806 | canada |
| germany | 1122 | 0.0006806 | germany |
| other | 428 | 0.0006806 | other |
| sweden | 379 | 0.0006806 | sweden |
| switzerland | 313 | 0.0006806 | switzerland |
| scandinavia | 244 | 0.0006806 | scandinavia |
| france | 231 | 0.0006806 | france |
| New Zealand | 230 | 0.0006806 | New Zealand |
| norway | 198 | 0.0006806 | norway |
| ireland | 110 | 0.0006806 | ireland |
| asia | 101 | 0.0006806 | asia |
| netherlands | 76 | 0.0006806 | netherlands |
| finland | 63 | 0.0006806 | finland |
| oceania | 61 | 0.0006806 | oceania |
| belgium | 41 | 0.0006806 | belgium |
| denmark | 34 | 0.0006806 | denmark |
| Hong Kong SAR China | 26 | 0.0006806 | Hong Kong SAR China |
| africa | 24 | 0.0006806 | africa |
| Philippines | 21 | 0.0006806 | Philippines |
| israel | 19 | 0.0006806 | israel |
| thailand | 19 | 0.0006806 | thailand |
| internet | 15 | 0.0006806 | internet |
| hungary | 11 | 0.0006806 | hungary |
| austria | 10 | 0.0006806 | austria |
| spain | 10 | 0.0006806 | spain |
| india | 9 | 0.0006806 | india |
| mexico | 9 | 0.0006806 | mexico |
| poland | 8 | 0.0006806 | poland |
| china | 7 | 0.0006806 | china |
| Japan | 7 | 0.0006806 | Japan |
| Singapore | 7 | 0.0006806 | Singapore |
| italy | 4 | 0.0006806 | italy |
| luxembourg | 4 | 0.0006806 | luxembourg |
| brazil | 3 | 0.0006806 | brazil |
| Grenada | 2 | 0.0006806 | Grenada |
| iraq | 2 | 0.0006806 | iraq |
| mississippi | 2 | 0.0006806 | mississippi |
counts <- df_destinations %>% group_by(clean_destination) %>% count()
sum(counts$n)
## [1] 61707
mean(counts$n)
## [1] 1469.214
max(counts$n)
## [1] 23342
min(counts$n)
## [1] 2
df_destinations %>% group_by(clean_destination) %>% summarise(n=n(),freq=n()/sum(counts$n), total = sum(counts$n)) %>% arrange(clean_destination)
#origins <- origins %>% mutate(freq=n/sum(n)) %>% arrange(-freq) %>% mutate(place=clean_origin)
#kable(origins)
#origins
#destinations
on voit que les NA ne sont pas du tout dans des proportions similaires: les auteurs ont davantage tendance à signaler l’origin et moins la destination, ce qui peut pourrait signifier que les informations sur la destination sont plus souvent communiquées ultérieurement, alors que l’origine est plus souvent indiquée, même si elle peut potentiellment exposer le vendeur à des risques de localisation(bien que cela reste sur des zones très larges et donc une info peut discriminante):hypothèse: on peut imaginer qu’indiquer l’origine est une argument de vente et de qualité des produits
certains endroits importent sont plus présents en tant qu’origine que destination, notamment les usa, uk, allemagne,les pays-bas, la chine, le canada et l’australie, et bcp d’autres… il faudrait toutefois filtrer les NA pour voir si ces chiffres tiennents
on trouve des proportions en tant que destinations supérieures dans les pays nordiques, la nouvelle zélande et israel… même remarque que précédemment –> filtrer les NAS?
évidemment le monde et l’europe sont davantage cités comme destinations, car cette appellation n’est pas vraiment une information ayant du sens pour indiquer une origine
Maintenant on peut observer des vraies patterns, avec les pays-bas comme premier sur la balance destination/origine, ce qui semble plus en lien avec la réalité/littérature. Suivi par: allemagne, uk, usa, chine, canada, belgique…
c’est intéressant de voir que les pays nordiques, la nouvelle zélande et la suisse restent dans les plus cités comme destination plutôt que pays d’origine
on voit aussi qu’en accord à la première intuition, les zones plus larges sont principalement indiquées comme destinations
il faudrait tester les diff avec un Chi-carré, voir si elles restent significatives et/ou trouver un moyen de normaliser un peu tout ça
il faudrait faire une analyse sectorielle/par catégories de produit
on pourrait séparer les analyses entre régions et pays puisqu’ils ne suivent pas les mêmes logiques. Attention avec les régions car le nettoyage n’était pas optimal (bcp de pertes sur amérique du sud/du nord, etc.).
We wanted to provide visual world maps to see what country exports or imports the most items from our dataset compared to the rest of the world. In order to do that we cleaned the destinations and origins in the original dataset to match them to the country names in the map and mapdata packages. In the data, we removed entries with continent names and “worldwide” values that composed most the the datas in order to focus on specific countries and see if we could find out trends.
We can see in the first map that USA is the country where most of the items can be shipped, with more than 15’000 appearance in the destination. Australia and UK are the number 2 and 3 on the list (4743 for australia and 3607 for UK) followed then by european countries, canada and others coutries such as mexico, brazil, china and india. It’s important to note that both Canada and Germany are the two coutries in the “blue” range with more than 1000 appearance This confirms a tendency that drugs are mostly shipped to western countries where it’s hard to produce those drugs but easier to buy anonymously on the internet.
If we take a look at the second map, we can actually see that USA are also the number one provider of items, this could be due to the recent legislation around cannabis products’ legallity in some states, creating a new market on the darkweb to sell those products in states where this kind of legislation is not in place yet. The products also come a lot from Australia, Poland, UK, China and Canada.
The last map represent a ratio of import-exports of every country, we can see that almost everyone of them imports more items than what they export. The math behind this map was origin of a country/total origins - destination of a country/total destinations.
The fact that european coutries are nor in Destinations nor in Origins is because sellers on Agora didn’t take time to list every country possible to ship to most of the time, the data usually was “europe”(5000 appearance in origins, 6400 in destinations). Adding this data to what is already presented in the maps makes europe a strong contender in the drug market even if we can’t see it on these maps
sum_dest <- df_destinations %>% group_by(clean_destination) %>% summarise(n=n()) %>% arrange(-n) %>% kable(format='html',caption='Destinations of products sorted by decreasing count',col.names=c('Destination','Count')) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_orig <- df_origins %>% group_by(clean_origin) %>% summarise(n=n()) %>% arrange(-n) %>% kable(format='html',caption='Origins of products sorted by decreasing count',col.names=c('Origin','Count')) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
destinations <- transmute(df_destinations,Place=clean_destination)
origins <- transmute(df_origins,Place=clean_origin)
#destinations
#origins
#places <- merge(destinations,origins)
#places %>% head() %>% kable()
## $Chemicals
##
## $Counterfeits
##
## $Data
##
## $`Drug paraphernalia`
##
## $Drugs
##
## $Electronics
##
## $Forgeries
##
## $Info
##
## $`Information/eBooks`
##
## $`Information/Guides`
##
## $Jewelry
##
## $Other
##
## $Services
##
## $Tobacco
##
## $Weapons
The empirical cumulative distribution functions shows strong inequalities in the distribution of activity: there are a few vendors that propose a high proportion of offers. We can also see that there are different trends depending on the category, where activity for drugs, forgeries and infos seem to be more evenly distributed than other categories.
The following boxplots show that vendors’ activity in some categories in some categories vary more than in others: it’s particularly the case of counterfeits, drugs and info.
| Vendor | n | perc | nb_cat | nb_type1 | nb_type2 | nb_dest | nb_orig | mostcat | mosttype1 |
|---|---|---|---|---|---|---|---|---|---|
| optiman | 881 | 0.0082850 | 13 | 22 | 15 | 12 | 8 | Drug paraphernalia | Containers |
| sexyhomer | 860 | 0.0080875 | 4 | 6 | 2 | 3 | 2 | Counterfeits | Watches |
| mssource | 823 | 0.0077395 | 2 | 10 | 15 | 2 | 2 | Drugs | RCs |
| profesorhouse | 804 | 0.0075609 | 11 | 12 | 12 | 2 | 3 | Services | Other |
| RXChemist | 729 | 0.0068556 | 4 | 7 | 5 | 4 | 6 | Drugs | Prescription |
| rc4me | 648 | 0.0060938 | 1 | 8 | 9 | 3 | 3 | Drugs | RCs |
| fake | 608 | 0.0057177 | 13 | 23 | 13 | 2 | 7 | Information/Guides | InformationGuides |
| medibuds | 604 | 0.0056801 | 2 | 2 | 4 | 3 | 1 | Drugs | Cannabis |
| Gotmilk | 479 | 0.0045045 | 2 | 7 | 9 | 6 | 12 | Drugs | Prescription |
| Bigdeal100 | 451 | 0.0042412 | 5 | 6 | 4 | 1 | 3 | Jewelry | Jewelry |
| captainkirk | 447 | 0.0042036 | 5 | 6 | 13 | 2 | 1 | Information/eBooks | InformationeBooks |
| TheDigital | 435 | 0.0040908 | 8 | 12 | 12 | 1 | 3 | Services | Other |
| OnePiece | 430 | 0.0040437 | 5 | 6 | 5 | 2 | 2 | Info | eBooks |
| HollandDutch | 416 | 0.0039121 | 2 | 3 | 4 | 2 | 1 | Drugs | Ecstasy |
| Optumis | 407 | 0.0038275 | 9 | 15 | 21 | 2 | 2 | Info | eBooks |
```