+ All Categories
Home > Documents > Predictive Coding Legaltech

Predictive Coding Legaltech

Date post: 14-Apr-2017
Category:
Upload: rlandau
View: 163 times
Download: 1 times
Share this document with a friend
47
Predictive Coding 2.0 Making E-Discovery More Efficient and Cost Effective John Tredennick Jeremy Pickens Jim Eidelman
Transcript
Page 1: Predictive Coding Legaltech

Predictive Coding 2.0 Making E-Discovery More Efficient and Cost Effective

John Tredennick Jeremy Pickens Jim Eidelman

Page 2: Predictive Coding Legaltech

How Many Do I Have to Check?

1.  You have a bag with 1 million M&Ms 2.  It contains mostly brown M&Ms 3.  You cannot see into the bag 4.  You have a scoop that will pull out 100

M&Ms at a time 5.  Your hope is that there are no red

M&Ms in the bag 6.  You pull out a scoop and they are all

brown

How many scoops do you need to review to be confident there are no red M&Ms?

Page 3: Predictive Coding Legaltech

Let’s Take a Poll

How many scoops?

1 3

5 10 20

2

100? 500? 1,000?

Page 4: Predictive Coding Legaltech

How Confident Do You Need to Be?

How many errors can you tolerate?

Does 95% work?

At a 95% confidence level and 5% percent margin of error: 384 M&Ms At a 99% confidence level and 1% margin of error: 459 M&Ms

§  Five out of a hundred? §  One out of a hundred? §  One percent = 10,000

How about 99%

At a 100% confidence level and 0% margin of error: 1,000,000 M&Ms

Page 5: Predictive Coding Legaltech

Predictive Coding

Page 6: Predictive Coding Legaltech

Does it Work?

Page 7: Predictive Coding Legaltech

What Have the Courts Said?

Page 8: Predictive Coding Legaltech

What Have the Courts Said?

“Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ (Fed. R. Civ. P. 1) determination of cases in our e-discovery world.”

Magistrate Judge Andrew Peck

Page 9: Predictive Coding Legaltech

Predictive Coding 1.0

1.  Assemble your corpus. 2.  Assemble a seed set of

documents. 3.  Review the seed set. 4.  Apply machine learning and

automatically tag the remainder of the corpus.

Page 10: Predictive Coding Legaltech

Predictive Coding 1.0

§  Tremendous gains in review effectiveness

§  Substantial cost savings §  It works. Often quite well

….when the corpus is complete.

Page 11: Predictive Coding Legaltech

533 matters, nearly 36,000 uploads across the matters.

67.5 uploads per case

Page 12: Predictive Coding Legaltech

This is collection driven, not loading limits.

166.3 days loading case

Page 13: Predictive Coding Legaltech

67 uploads

166 days

In which upload and on which day do your responsive documents show up?

Terms that do not appear early begin appearing later.

Page 14: Predictive Coding Legaltech

Machine-Assisted Decision Making

Upload timeline of 6 TB case. When should machine-assisted decision making (e.g. early case assessment) begin?

Is it here?

Or here?

Page 15: Predictive Coding Legaltech

Example: Responsive Early, Junk Later

To: [email protected], [email protected]

From: [email protected]

Subject: Company Picnic

Bob, would you coordinate with Alice and make sure we have enough hamburger buns for the company picnic? Please try and find them at a reasonable price.

Responsive Junk

Page 16: Predictive Coding Legaltech

Example: Junk Early, Responsive Later

To: [email protected], [email protected]

From: [email protected]

Subject: Get Together

Let’s get together at 7pm at the Sports Bar to discuss pricing of our components. The Broncos are playing and I really want to watch Tebow.

Junk Responsive

Page 17: Predictive Coding Legaltech

Problems With Predictive Coding 1.0

The corpus is almost never complete §  Continuous collection and rolling uploads §  When does “Early Case Assessment” begin?

Changing Issues §  Responsiveness is “bursty”

Shifting Concept Relationships §  Due both to increasing corpus and changing issues §  Exploration is extremely limited

Page 18: Predictive Coding Legaltech

Our Approach

Predictive Coding 2.0 necessitates the ability to deal with dynamic change and flux. We have developed a flexible analytics framework based on bipartite graphs It is aware of changes in corpus and in coding so as to enable smart review and adaptive related concept suggestion as information pours in.

Page 19: Predictive Coding Legaltech

Goal: Continuous Case Assessment

Our Approach

Avoid the lock-in that arises due to poor decision making that occurs early in the matter when corpus (collection) and coding information is incomplete.

Page 20: Predictive Coding Legaltech

What Is Underneath?

A full bipartite graph of the documents and features (e.g. words, phrases, dates) that comprise those documents

Page 21: Predictive Coding Legaltech

Documents Terms

Page 22: Predictive Coding Legaltech

Feedback: Immediate and Continuous

Continuous feedback aids better decision making and predictive coding. Adapts to both:

New arrival of coding information New arrival of documents and terms

Page 23: Predictive Coding Legaltech

Documents Terms

Page 24: Predictive Coding Legaltech

Predictive Coding 2.0

Feedback – and improvement – is iterative, continuous, amplified.

% of Docs Examined Manually

The more you review, the less you have to review

Page 25: Predictive Coding Legaltech

Term relationships change over time Using continuous improvement, decisions can be revised and refined as the matter proceeds.

Better Decisions As Understanding Improves

Page 26: Predictive Coding Legaltech

Time uncovers new relationships

Documents Terms

Page 27: Predictive Coding Legaltech

Looking at Concepts Over Time 20%   65%  lube   fuels  

piping   fob  battery   purityethane  

mounted   petrochemicals  redundant   fin  batteries   paraxylene  

compartments   cif  mixture   phy  airflow   fwd  ansi   swopt  

ventilation   brentpartials  chargers   brg  stainless   locswap  

rotor   benzene  bleed   diff  

accessory   spd  plenum   liquids  detector   opt  

Start with the key term “fuel”

At 20% these are the related terms

And at 65%

Page 28: Predictive Coding Legaltech

Related Terms Through Coding Filters

Page 29: Predictive Coding Legaltech

Documents Terms

Responsive

NonResponsive

Page 30: Predictive Coding Legaltech

TREC collection with many topics

identified

Putting Related Concepts to Work

The whole corpus

Topic 203 …whether the Company had met, or could, would, or might meet its financial forecasts, models, projections, or plans… Topic 205 …analyses, evaluations, projections, plans, and reports on the volume(s) or geographic location(s) of energy loads.

Page 31: Predictive Coding Legaltech

Term   Score  

modeling   1000  equation   864  

stochastic   706  variables   677  

parameters   518  probability   365  simulation   337  

assumption   325  returns   251  curves   211  

Model In the Whole Collection

Scope is the whole collection

Look at the keyword “model”

Page 32: Predictive Coding Legaltech

Term   Score  

flows   1000  assumptions   913  

gains   872  shares   864  liquidity   486  

fluctuations   374  analysts   285  

cents   254  whitewing   237  handles   166  

Model In Topic 203

Look at the keyword “model”

Scope: Topic 203

meeting financial forecasts

Page 33: Predictive Coding Legaltech

Term   Score  

bids   1000  congestion   611  

loads   455  constraints   354  

clearing   292  zonal   194  

signals   192  procure   190  dispatch   152  

csc   120  

Model In Topic 205

Look at the keyword “model”

Scope: Topic 205

analyzing energy

volumes

Page 34: Predictive Coding Legaltech

Whole Corpus   Topic 203   Topic  205  

modeling   flows   bids equation   assumptions   congestion

stochastic   gains   loads variables   shares   constraints

parameters   liquidity   clearing probability   fluctuations   zonal simulation   analysis   signal

assumption   cents   procure returns   whitewing   dispatch curves   handles   csc

Model In Comparison Now,

imagine this with batches and coding

changes over time!

Note: Our system can accept any combination of coding and metadata filters to dynamically assess your data

Page 35: Predictive Coding Legaltech

Summary

Incomplete Collections

Changing Coding Calls

Havoc for Machine Coding

Page 36: Predictive Coding Legaltech

Predictive Coding 2.0

Problem: The corpus is almost never complete Answer: Review Algorithms that are iterative and continuous

Problem: Changing Issues Answer: Review Algorithms that are adaptive and continuous

Problem: Shifting Concept Relationships Answer: Concept Relationships that are calculated dynamically, on-the-fly, and coding-aware.

Continuous Case Assessment

Page 37: Predictive Coding Legaltech

Analytics Consulting

§  Analytics consulting and predictive ranking for nearly 4 years §  How it started -- Before “Predictive Coding” became popular:

“Can’t you predict what documents are probably relevant based on your review so far?” – Judge, SDNY

§  Predictive Ranking: Iterative search techniques + algorithms §  Then off-the-shelf Predictive Coding 1.0 technologies §  Catalyst’s research is exciting! We apply the research to real-world

scenarios. Applying Bipartite Analytics…

Page 38: Predictive Coding Legaltech

Smart Review with the Bipartite Analytics Technology Advantages:

§  Accurate §  Dynamic §  Flexible §  “Just in Time” suggestions

Page 39: Predictive Coding Legaltech

Smart Review Scenarios 1. “What happened” – examples: FCPA investigation, conspiracy ECA 2. Typical large scale litigation with lots of ESI – e.g., class action lawsuit 3. Highly complex litigation with multiple issues – e.g. patent and unfair competition claims

Page 40: Predictive Coding Legaltech

Scenario 1 – What happened?

Goal: Rapidly determine facts and resolve matter if possible Applying the Technology Small number of knowledgeable attorneys drill into documents using the fusion of advanced search features and flexible predictive coding.

Page 41: Predictive Coding Legaltech
Page 42: Predictive Coding Legaltech
Page 43: Predictive Coding Legaltech
Page 44: Predictive Coding Legaltech

Scenario 1 – What happened? Goal: Rapidly determine facts and resolve matter if possible Applying the Technology Small number of knowledgeable attorneys drill into documents using the fusion of advanced search features and flexible predictive coding.

§  Faster location of valuable “veins” of information due to search filters

§  Rapid learning and application of that learning through flexible, “just in time” predictive coding 2.0.

§  “Choose your own adventure”

Page 45: Predictive Coding Legaltech

Scenario 2 – Large Scale Litigation

Goal: Minimize cost because of learning across large document set, increase quality with focused review, and maximize protection of privilege and trade secrets Applying the Technology:

§  Prioritized review based on rapid, continuous learning §  Large scale defensible culling §  More accurate ranking of “potentially privileged” documents

Page 46: Predictive Coding Legaltech

Scenario 3– Highly Complex Litigation

Goal: Review and produce with multiple and changing issues Applying the Technology §  Rapid learning across multiple topics §  Leverage ability to adjust for change in topics §  Review quality improves because of focus §  Explore otherwise hidden subjects with Concept Explorer §  Leverage learning across narrow, focused lines of inquiry (e.g.,

emails between two people in a narrow time window) §  Protect privileged documents

Page 47: Predictive Coding Legaltech

Predictive Coding 2.0 Making E-Discovery More Efficient and Cost Effective

John Tredennick Jeremy Pickens Jim Eidelman


Recommended