Information Governance, Now with Concept Clustering

Mon, Dec 5, 2016

Albany, NY – Rational Enterprise, a leader in Information Governance (IG) technology, announced today that it has released a beta version of Concept Clustering. Rational Governance’s implementation of unsupervised clustering will complement its already broad analytical toolset, which includes Boolean Search, Content Navigator, and supervised machine learning based on Support Vector Machine (SVM) modeling.

RG’s clustering technology uncovers similarities between documents based on word frequency, with the algorithm paying a higher degree of attention to co-occurrences of words that constitute topics. A weighting algorithm evaluates the degree of similarity between documents and then displays that degree of similarity spatially, clustering together those documents with a high degree of similarity, while more loosely associated ones are further apart.

While clustering is a tool typically employed during litigation review and is integrated into various e-discovery platforms, RG’s clustering toolset is unique in its application at the start of the Electronic Discovery Reference Model (EDRM) during Information Governance.

Clustering will be most useful when working with unfamiliar document populations. While RG’s unsupervised clustering requires no user input to direct the analysis of a data set, once the clustering model is run, each cluster is labeled with the topics most prevalent in the clustered documents to guide users in analyzing and classifying content. This type of insight is particularly powerful for tackling dark data, legacy data, and defensible disposition.

Clustering in Rational Governance can be applied to any documents that exist across an organization’s managed data stores; document sets can be predefined and filtered based on a keyword, date range, custodian, document type, and/or a number of other parameters made available within the RG user interface.

The BigARTM implementation of clustering develops cluster models based on the Additive Regularization of Topic Models technique, which simplifies the creation of topic models and makes them easier to explain. To learn more, visit