The Volume Problem in M&A
A mid-market M&A transaction typically generates 200 to 800 documents in its data room. Corporate and real estate lawyers, paralegals, and associates read through shareholder agreements, board minutes, employment contracts, property leases, IP licences, regulatory filings, pension scheme documents, and supplier agreements. Each document needs to be reviewed for key terms, risk flags, and anything that falls outside market standard.
In most corporate practices today, this work is still largely manual. A data room with 600 documents might require 400 to 700 hours of fee earner time at the first-pass review stage alone — before any legal analysis is written up or negotiation positions are formed. At mid-level associate rates of £80 to £150 per hour, that represents £32,000 to £105,000 in review cost on a single deal.
What AI Can Now Automate
The technology has reached a point where the first-pass review of standard document types can be largely automated. The key word is standard: AI extraction works best on documents that follow recognisable structures — commercial leases, employment contracts, share purchase agreements, NDAs, loan facility agreements. The more structurally consistent the document class, the higher the extraction accuracy.
For M&A due diligence, the typical automation workflow covers three phases:
- Ingestion and classification — Documents are pulled from the data room, converted from scanned PDF or Word format into machine-readable text, and automatically classified by document type. A 600-document data room is typically classified in under 30 minutes.
- Extraction — Each document is passed through a large language model with structured extraction prompts tailored to the document type. For a share purchase agreement, the system might extract: parties, completion conditions, locked box or closing mechanism, warranty schedule, indemnities, consideration structure, and escrow provisions. For a commercial lease, it extracts different fields: term, rent, break clauses, service charge structure, alienation restrictions.
- Report generation — Extracted data is consolidated into a structured due diligence summary — typically a spreadsheet or Word document in the firm's house style — with flags highlighting anything outside standard parameters or requiring a solicitor's attention.
Accuracy and Validation
A question every sensible lawyer asks: how accurate is it? The honest answer is that accuracy depends heavily on document quality and the specificity of the extraction task. On clean, typed commercial documents, well-engineered extraction systems achieve 95%+ accuracy on straightforward factual fields such as dates, party names, monetary amounts, and defined term definitions. Accuracy is somewhat lower on complex interpretive matters — assessing whether a particular warranty is standard, or whether an indemnity is unusually broad — which is why human review of the output remains an essential step.
A well-built system addresses this through confidence scoring: the model flags items it is uncertain about, and the review workflow directs solicitor attention to those specific points rather than requiring a full re-read of every document. The goal is not to eliminate legal review but to focus it — so that a senior associate's time goes on the genuinely complex and uncertain items, not on reading standard boilerplate for the fourteenth time.
Integration with Existing Workflows
The output of an AI due diligence system is designed to feed into the firm's existing workflow, not to replace it. The typical integration pattern is:
- Extracted data flows into the firm's due diligence template or report format
- Associates review the AI-generated summary and annotate with legal analysis
- Flagged items requiring attention are tracked in the firm's matter management system
- Final report is produced in the same format the client has always received
The partners and associates see the same deliverable — the difference is how much of the underlying data gathering happened automatically rather than manually.
Which Transaction Types Benefit Most
The ROI case for automation is clearest where document volume is high and document types are repetitive. This points to several transaction categories:
- Property acquisitions — Portfolio deals involving multiple leases, title documents, and planning consents are ideal. The documents are structurally consistent and the data points to extract are well-defined.
- Business acquisitions with large employee populations — Employment contracts, TUPE schedules, and pension documentation can be processed in bulk.
- Financial services transactions — Regulatory filings, FCA permissions, and compliance documentation are often numerous and structurally consistent.
- Mid-market M&A generally — Even transactions with lower total document counts see meaningful time savings on the extraction of key commercial terms from the principal agreements.
Cost and Payback
Building a due diligence automation system for a specific practice area typically costs £8,000 to £20,000 depending on document complexity and the number of document types covered. Ongoing API costs (for the LLM processing) run at roughly £50 to £200 per transaction depending on data room size.
Against manual review costs of £30,000+ per transaction on a mid-market deal, the payback period is typically one to three transactions. For a firm doing ten or more M&A transactions per year, the annual saving is substantial — and the competitive advantage of faster, cheaper due diligence is a meaningful differentiator in a price-sensitive market.
Getting Started
The right starting point is to pick one document type that appears in every transaction your practice handles — leases, employment contracts, NDAs — and build an extraction system for that document class. This produces a working system quickly, generates measurable time savings from day one, and builds the firm's confidence in the technology before expanding scope.
If you are considering AI automation for due diligence in your practice, get in touch and we can walk through what a system would look like for your specific transaction types.