Automation in eDiscovery processing

In a lawsuit or investigation, lawyers of both parties are required to exchange all documents that may be used as evidence in the case. Traditionally this involved hard copies of documents. With the coming of the electronic age, the definition of relevant documents has been expanded to include any communication created, sent, received, and stored using electronic media. These types of documents, known as Electronically Stored Information (ESI), include email, documents created using various applications (word processors, spreadsheets, databases, presentations, collaboration tools, etc.), phone messages, photos, audio and video recordings, text posted in message apps or social media sites - in short, any form of electronic media. Studies have shown that as much as 90% of all ESI consists of email. ESI could be stored on computers, laptops, and tablets, other devices such as cameras and smart TVs, locally or in the cloud. The discovery of documents relevant to the matter at hand from ESI is referred to as eDiscovery.

As may be imagined eDiscovery involves multiple sources of data, stored in different file formats and might run into terabytes of data; collecting, reviewing, analysing, and using the data presents an enormous challenge in terms of manpower, cost, and time. To the rescue comes specialised software that can automate most aspects of the process. The most commonly used framework for mapping the eDiscovery workflow is the Electronic Data Reference Model (EDRM). The model specifies the following stages:

Information Management: The first step is to accumulate the relevant data. This process of extracting data from different sources can be automated; not only can the automation software extract data from multiple sources quickly but it can also be programmed to target retrieval of only relevant data by applying filtering criteria such as keywords or date ranges. This process of locating and cataloguing data is referred to as Data Mapping.

Identification: Next comes the step of identifying the data and cataloguing it according to source and type. During this phase begins the process of ‘culling’ the data, that is separating the relevant from the irrelevant documents. AI, in this context known as ‘predictive coding’, can be used to train the software in this identification process. As human reviewers review and tag an initial sample of data, the AI can learn the criteria used by the reviewers to generate algorithms to automate the process. Over time, with greater exposure, the software improves in accuracy in prediction.

Preservation and Collection: Once the relevant data has been identified it is essential to prevent it from deletion or modification – in legal parlance, from spoliation. At this stage, the interested party can use ‘legal holds’ to prevent spoliation. Collection refers to the process of storing the relevant data in a manner that correctly identifies it and makes it easy to retrieve. This collection must not only include the content but also the ‘metadata’ that contains particular information related to the files: author, date/time of creation and subsequent modification, recipient, the application used, etc.

Processing, Review, and Analysis: Processing converts the multiple formats used by the data into a single format that is optimised for fast searching using a single approach rather than having to use native applications to search the content. This is the phase where humans and AI systems can work together to vastly reduce the amount of data to be further analysed by effectively cataloguing and indexing the data. Using software, duplicates can be easily identified and removed, search parameters can be refined, and irrelevant documents can be found and set aside.

With this reduced set the documents can be reviewed further for the degree of relevance and for admissibility as evidence if required. This process is greatly facilitated by the previous processing of the data to make it easily searchable.
The software can use tools like a dashboard, concept clustering, and identifying email threads to facilitate effective, informed decision making during the Analysis phase.

Production and Presentation: With this phase, we come to the output of the whole process—the production of documents, visual aids, audio and video clips, etc. that can be presented at a deposition, hearing, or trial. The legal team can take full advantage of all the computer-aided audio-visual tools available these days for dynamic, impactful presentations.

Outsourcing of eDiscovery, the only option available till recently, is very expensive; however, today there are software solutions that can be purchased and used by trained in-house staff, with all the data collected being stored securely behind the company firewall. There are also SaaS solutions that store the data in a highly secure, encrypted way in the cloud which facilitates use by lawyers and paralegals in multiple locations.

* For organizations on the digital transformation journey, agility is key in responding to a rapidly changing technology and business landscape. Now more than ever, it is crucial to deliver and exceed on organizational expectations with a robust digital mindset backed by innovation. Enabling businesses to sense, learn, respond, and evolve like a living organism, will be imperative for business excellence going forward. A comprehensive, yet modular suite of services is doing exactly that. Equipping organizations with intuitive decision-making automatically at scale, actionable insights based on real-time solutions, anytime/anywhere experience, and in-depth data visibility across functions leading to hyper-productivity, Live Enterprise is building connected organizations that are innovating collaboratively for the future.

Recent Posts