Estate Document Extraction

PostedApril 21, 2026

UpdatedApril 22, 2026

ByPeter Westerman

name: estate-document-extraction

description: AI-powered extraction of structured data from estate documents (wills, trusts, deeds, financial statements) using Claude Vision API. Document classification, key field extraction, entity resolution, and confidence scoring. Use when building document intake pipelines, extracting entities from scanned legal documents, or classifying estate paperwork.

Estate Document Extraction

Instructions

Extract structured data from estate documents using AI vision and text analysis. Handle scanned PDFs, photographed documents, and digital text with appropriate extraction strategies.

Document Classification

Before extraction, classify the document into one of these categories:

Category	Document Types	Key Indicators
Testamentary	Last Will and Testament, Codicils, Holographic Wills	“Last Will”, “I bequeath”, “hereby revoke”, attestation clause
Trust	Revocable Living Trust, Irrevocable Trust, SNT	“Trust Agreement”, “Settlor”, “Trustee”, “Trust Estate”
Real Property	Deeds, Title Insurance, Property Tax Statements	“Grantor/Grantee”, “Legal Description”, parcel numbers
Financial	Bank Statements, Brokerage Statements, Insurance Policies	Account numbers, balances, CUSIP numbers
Court	Letters Testamentary, Court Orders, Petitions	Case numbers, court stamps, judge signatures
Identity	Death Certificates, Birth Certificates, Marriage Certificates	Vital records formatting, certificate numbers
Tax	Estate Tax Returns (706), Income Tax Returns, Gift Tax Returns	IRS form numbers, EIN, SSN references

Claude Vision API Integration

Submit document images via the Claude Messages API with type: "image" content blocks
For multi-page PDFs, convert each page to a PNG/JPEG and submit as a sequence of images
Use a structured extraction prompt that requests JSON output matching the target schema
Set temperature to 0 for deterministic extraction
For documents >20 pages, extract in batches of 5 pages with overlap context from the previous page’s extracted data

Key Field Extraction by Document Type

Wills:

Testator name, date of execution, jurisdiction
Executor nominations (primary, successor)
Beneficiary names and bequests (specific, residuary)
Trust creation provisions
Guardian nominations for minors
Signature attestation (number of witnesses, notarization)

Trust Agreements:

Settlor, Trustee, Successor Trustee names
Trust type (revocable, irrevocable, testamentary, SNT)
Beneficiary names and distribution provisions
Trust assets schedule
Amendment and revocation provisions
Governing law jurisdiction

Deeds:

Grantor and Grantee names
Legal description (metes and bounds, lot/block, or section/township/range)
Recording information (book, page, instrument number)
Consideration amount
Deed type (warranty, quitclaim, trust transfer)

Financial Statements:

Institution name and account number (last 4 digits only in extracted data)
Account type and ownership
Balance as of statement date
Beneficiary designations if shown

Entity Resolution

Normalize person names: “John A. Smith”, “John Smith”, “J. Arthur Smith” should resolve to the same entity
Track name variants with confidence: exact match (1.0), partial match (0.8), inferred match (0.5)
Cross-reference entities across documents: the “John Smith” in the will should link to the “John Smith” on the deed
Flag ambiguous matches for human review rather than auto-resolving

Confidence Scoring

Every extracted field gets a confidence score:

Score	Meaning	Action
0.95–1.0	High confidence — clearly legible, unambiguous	Auto-accept
0.80–0.94	Medium confidence — legible but could be misread	Flag for review
0.50–0.79	Low confidence — partially legible or ambiguous	Require human verification
<0.50	Very low — illegible or contradictory	Mark as unextracted, request better scan

Data Security

Never store full SSNs, account numbers, or EINs in extracted data — mask to last 4 digits
Process documents in memory; do not cache raw images on disk after extraction
Log extraction events (document type, page count, field count) but never log extracted content
All extracted data inherits the CONFIDENTIAL classification from the source document

Inputs Required

Document image(s): PNG, JPEG, or PDF pages as base64-encoded images
Document type hint (optional): if the user pre-classifies the document, skip classification step
Extraction schema: which fields to extract (default: all fields for the document type)
Existing entity list: previously resolved entities for cross-reference matching

Output Format

Document classification with confidence score
Structured JSON object with extracted fields, each annotated with confidence score and source page number
Entity resolution results: new entities created, existing entities matched, ambiguous matches flagged
Extraction summary: total fields extracted, fields requiring review, fields unextracted
Processing metadata: pages processed, API calls made, total processing time

Anti-Patterns

Storing full PII in extraction output: Always mask SSNs, account numbers, and EINs — only store last 4 digits
Auto-resolving ambiguous entities: When “John Smith” appears in two documents, do not assume they are the same person without corroborating evidence — flag for human review
Ignoring document quality: A blurry scan produces unreliable extraction — detect low image quality and request a rescan before wasting API calls
Extracting without classification: Different document types require different extraction schemas — always classify first
Treating OCR output as ground truth: AI extraction can hallucinate fields that do not exist in the source document — always include source page references so humans can verify
Processing sensitive documents in bulk without audit trail: Every document extraction must be logged for fiduciary compliance

AI Skill

Product Showcase

ITI Knowledge System

AI Agent

User Guide

Requirements

ScubaGPT

Grateful Dead Chatbot

Farmers Bounty

Technical Document

Answer Engine Optimizer

SEO Optimizer

Travel Planner

Fact Checker

Estate Manager

ITI Operations

ITI Marketing

Patriot University

Personal Assistant

Estate Document Extraction

Estate Document Extraction

Instructions

Document Classification

Claude Vision API Integration

Key Field Extraction by Document Type

Entity Resolution

Confidence Scoring

Data Security

Inputs Required

Output Format

Anti-Patterns