The clustering process is done in three steps:
- The document colection is transformed to a nxm matrix, where n is the number of documents and m the diension of each vector representing a document.
- A SOM of some size (32x32 nodes in subsequent experiments) is created and trained with the above matrix.
- A hierarchical clustering algorithm is run on the resulting map, and the documents belonging to each cluster are dumped into sub-collections.




















