Rational Review’s (“RR”) predictive coding technology is built on one of the most advanced machine learning techniques available today: Convolutional Neural Networks (“CNN”). A CNN is a type of deep learning, that companies like Google use to automatically recognize and categorize images, behavior, emotions, and more – all things that a human’s brain is wired to do instantly – yet require incredibly complicated and intricate analyses of relationships and patterns. Rational has brought the power of this technology to eDiscovery review.
Foundations of Predictive Coding
Predictive Coding is known by many names, and covers a wide range of technologies, techniques, and workflows. The use of predictive coding in eDiscovery has been widely approved in case law, but decisions on methodology , what technologies to use, and what workflows to follow are still left to individual attorneys. On top of that, over 36 states have adopted Duty of Technology Competence rules, which should incentivize lawyers to increase their knowledge base as much as possible on the best implementations of the technology.
All predictive coding implementations face the basic problem of making human language machine readable. Various methods can accomplish this task:
- Simple character/word to number exchanges.
- Exchanges overlaid with predetermined weighting to certain words, accounting for semantic relationships inherent in language.
- Exchanges where weightings are determined through applications of machine learning that discover specific semantic relationships in the targeted character set.
- Exchanges that use a pre-trained machine learning algorithm with semantic and syntactic relationships factored into its weighting.
Rational Review uses a pre-trained implementation of the algorithm Global Vectors for Word Representation (“GloVe”), which outperforms other algorithms in capturing complex linguistic functions – such as analogies, word similarity, and named Entity Recognition – in its weighting (this process is also known as word embeddings).
Once a document is converted to numerical data, machine learning technology is applied to uncover patterns in the number sets. By itself, the algorithm will not be able to distinguish between important and unimportant patterns, however, once a human trainer provides example documents to the technology, it begins to understand (or learn) the important patterns (i.e., most relevant). Using this training set, the technology develops a model to uncover similar patterns in new documents. This modeling is the essential function of predictive coding: to learn the patterns that lawyers detect within a set of documents and identify that same pattern in documents it has never seen before.
In a traditional applied knowledge approach, an attorney deeply familiar with the factual and legal issues of a matter will convey that knowledge to a review team with instructions to manually and laboriously identify the type of documents relevant to a matter. The knowledgeable attorney might only list a handful of features for the review team to evaluate when reviewing documents.
With machine learning, the knowledgeable attorney instead provides the machine examples of relevant documents, so every feature – even those that cannot be articulated – is available for the machine to consider in the formulation of its model. Essentially, the machine reviewer is able to absorb orders of magnitude more information in constructing its definition of relevancy than a team of inconsistent reviewers.
The Difference: Why Neural Networks?
First invented in 1963 and standardized in 1995, Support Vector Machine (“SVM”) technology is one of the oldest and most popular machine learning applications. Indeed, many eDiscovery companies are still using SVM technology today. SVMs are capable of evaluating many different features of a document and provide useful confidence levels of their prediction outputs. Most studies attempting to compare algorithms for predictive coding in eDiscovery point to SVM as the most capable, however, these studies have not compared SVM to the powerful combination of word embeddings and CNNs.
A neural network uncovers patterns in the numerical data ; then, unlike previous forms of machine learning, a secondary layer of neurons looks for patterns in the primary patterns; and a third looks for patterns in those patterns, etc.; some networks can be comprised of dozens or even hundreds of layers. This structure is what makes neural networks so rich in their analysis of data. SVMs, by contrast, only take a single pass at analyzing the patterns in the data and are limited to a binary classification.
SVM is akin to a junior reviewer who arranges every document in a matter on a large conference table. All the hundreds of things he observes about a document prescribes where on the table he places it. Then, when the senior attorney asks for a type of document she wants to see more of (e.g., a relevant document), the junior attorney identifies where that type of document may be, and hands the senior attorney a pile of nearby documents on the table, in the hope that those documents are also relevant.
Now imagine if CNN were a junior reviewer At first, he observes all the different document features but doesn’t make any decisions about the documents or place them into piles yet. Instead, he makes a post-it note for each feature he notices, and arranges the those notes on the conference room wall. He then notices various patterns in the post-it notes, recording these observations on set of cue cards, which he arranges on the conference room table. Next, he recognizes patterns in the cue cards, and records them on scrap pieces of paper, laid out on the floor. He is then able to analyze all the relationships within the post-it notes, cue cards, and scraps of paper, and without knowing anything about the case, makes a guess as to whether or not a certain document is relevant.
At this point, he checks with the Senior attorney to see if his guess is right. If he is correct, he knows that the patterns he identified were useful, and thus those same patterns should be used in judging the next document; if his guess was wrong, he will place less importance on those patterns when looking at the next document. The more times he sees the same pattern lead to a confirmed guess, the more confidence he places in his predictions. The junior attorney makes sure to track his performance, so if the Senior Attorney finds that he was correct for the past 100 documents, she can be confident he is doing well.
If you are thinking that the CNN reviewer is going to need a lot more space (and office supplies) to do his work than the SVM reviewer, you’re right; this kind of analysis takes much more computing power. However, he still does it just as fast, and we think he does a much better job. Furthermore, you can give him as many classification options as you want. His process still works if you want him to decide between Hot, Warm, Somewhat Relevant, Not Relevant, etc. instead of just Relevant/Not Relevant.
CNNs in Rational Review
The most significant impact of using this technology is a marked improvement in understanding complex ideas, but there are several additional benefits that manifest in Rational’s tool:
- Precision – The deep learning technology does not force users to categorize a document in a binary selection. In other words, the classification does not need to be either relevant or non-relevant, but can instead be hot, warm, relevant, non-relevant, etc. The model also provides a confidence score from 0-100 for each categorization. There is no limit to the number of categories that the machine can learn, it just needs 25 examples from each category to get started.
- Context – RR’s implementation can incorporate the metadata of the document into the initial input of the neural network and the pattern analysis. Users can specify through the graphic user interface what metadata fields should be considered for a particular model.
- Auditability – RR’s implementation allows users to track a “Global” machine learning model in addition to each individual, reviewer-specific model separately. Dashboards allow users to compare visually side-by-side the predictions from these different models to understand how they differ. These audit features allow review managers not only to flag irregularities and pinpoint exactly where accuracy fluctuations are occurring, but also to hand-select a subset of reviewers to create a custom model and eliminate subpar work product. If there is ever a question of whether the model predictions influenced the decision making of individual reviewers or case managers, administrators are able to view a full audit log of exactly who saw what prediction and when, to maximize defensibility.
- Accuracy – Rational built its machine learning application in-house based on best-of-breed tools, and as a result, understands which metrics are most useful to display in its visual, intuitive dashboard. Tracking accuracy and performance has never been easier, with the platform providing calculations for the user on the per-document impact of the current model state, such as the expected rate of false positives rendered, or average prediction accuracy per confidence level category. A continuous learning implementation ensures this information is always up-to-date and gives review managers all the information they need to decide how to use the model and when.
As exciting as this technology is, it represents another tool in a legal professional’s eDiscovery toolbox, albeit a powerful one. Rational Review is fully equipped with email threading, near duplicate analysis, and all other analytic tools that complement machine learning in eDiscovery review. Perhaps RR’s greatest differentiator is a simple, approachable design that is focused on helping lawyers do their work, where analytics and machine learning are integrated intuitively and do not require the assistance of expert data scientists. We look forward to giving you a closer look soon.