Can AI extract data from handwritten inspection notes?

Partially. Modern multimodal AI can read printed handwritten text (block letters, legible cursive) with moderate accuracy — roughly 70-80% for clean handwriting. Scrawled field notes or heavily abbreviated shorthand are unreliable. If your inspection reports include handwritten sections, run a sample extraction first to check accuracy before processing the full batch.

How does this handle inspection reports with embedded photos?

AI processes the entire PDF visually, including embedded photos. An inspection photo showing water stains on a ceiling gets analyzed alongside the inspector's written notes about the same issue. The extracted output combines both signals: 'Water damage — ceiling stain visible in photo, inspector notes active leak from upstairs bathroom.'

What about different inspection report formats across inspectors?

Different inspectors use different report templates, but AI extraction adapts because it reads the document visually rather than parsing a fixed format. Whether the roof section is on page 3 or page 7, whether it's in a table or a narrative paragraph, the AI finds it. Accuracy is highest with structured report formats (InterNACHI, ASHI standard) and lowest with free-form narrative reports.

Extracting structured data from home inspection reports and insurance documents

A property management company oversees 2,400 rental units. Each unit has an inspection report — a 15–30 page PDF with room-by-room condition assessments, photos of deficiencies, and repair recommendations. Some reports are from move-in inspections, some from annual checkups, some from move-out assessments. They span five years and three different inspection companies, each with a different report format.

The company also has insurance claim documents for 380 of those units — adjuster reports, repair estimates, photos of damage, and settlement letters. They need to match inspection findings to insurance claims to identify units where the inspection flagged a problem that later became a claim, or where a claim was filed for damage the inspection missed.

The data exists in those PDFs. But it’s trapped in 2,400 separate files that no one has time to read, cross-reference, and tabulate by hand.

What inspection reports contain

A standard home inspection report (InterNACHI or ASHI format) is surprisingly structured — just not in a database-friendly way. Each section covers a building system or area, with a condition rating and narrative findings.

Typical inspection report structure — one property

Section	Condition	Findings	Photo?
Roof	Fair	Asphalt shingles, ~15 yrs old, minor granule loss, no active leaks	Yes — 2 photos
Exterior	Good	Vinyl siding, good condition, minor caulking needed at windows	Yes — 1 photo
Foundation	Good	Poured concrete, no visible cracks, proper grading	No
Plumbing	Fair	Copper supply, PVC drain, slow drain in hall bath, water heater 2018	Yes — 1 photo
Electrical	Poor	Federal Pacific panel — recommend replacement, GFCI missing in kitchen	Yes — 2 photos
HVAC	Good	Carrier forced air, installed 2020, filter clean, operational	No
Kitchen	Good	Functional, granite counters, appliances operational	Yes — 1 photo
Bathrooms	Fair	Grout deterioration in master shower, caulking needed at tub	Yes — 2 photos

Each section has a condition rating (Good/Fair/Poor/Deficient), narrative findings, and often photos of deficiencies.

That single report contains 8 condition ratings, 8 narrative assessments, and 9 embedded photos. Manually entering this into a spreadsheet takes 15–20 minutes per report. At 2,400 reports, that’s 600–800 hours of data entry.

AI extraction reads the PDF — text and photos together — and produces structured output in seconds.

What insurance documents contain

Insurance claim files are a different beast. A single claim might include:

Adjuster report: Narrative assessment of damage, cause determination, affected areas
Repair estimate: Line-item costs for labor and materials, organized by trade (roofing, plumbing, electrical)
Photos: Damage documentation, often annotated with arrows and labels
Settlement letter: Approved amount, deductible, depreciation, net payment
Policy excerpt: Coverage limits, endorsements, exclusion applicability

Each document type requires different extraction. The adjuster report yields cause and severity. The estimate yields costs. The settlement letter yields financial outcomes.

AI-extracted data from an insurance claim file

Document	Extracted fields	Example values
Adjuster report	Cause, affected areas, severity, date of loss	Water damage, kitchen + hall bath, moderate, 2025-08-12
Repair estimate	Total cost, line items, trades involved	$14,200 — plumbing $3,800, drywall $4,100, flooring $6,300
Damage photos	Damage type, location, severity from visual evidence	Water staining on ceiling, active drip visible, mold growth at baseboard
Settlement letter	Approved amount, deductible, depreciation, net payment	Approved $14,200, deductible $1,000, depreciation $2,100, net $11,100

Four document types from a single claim, each producing different structured data. AI reads text and photos in one pass.

Matching inspections to claims

The core matching problem: connect an inspection finding (e.g., “slow drain in hall bath, recommend repair”) to a later insurance claim (e.g., “water damage from burst pipe in hall bath”).

The text descriptions don’t overlap neatly. “Slow drain” and “burst pipe” use different words. “Hall bath” and “hallway bathroom” are the same room with different names. “Recommend repair” and “water damage claim” describe different stages of the same problem.

But the semantic connection is clear: a plumbing issue in the hall bathroom was flagged by the inspector, not repaired, and eventually caused a water damage claim. AI embedding similarity captures this connection — the descriptions are semantically related even though they share few words.

Inspection-to-claim match rate by building system

Roofing Shingle damage → water intrusion claims

72%

Plumbing Slow drains/old pipes → water damage

68%

Electrical Panel issues → rarely become claims

45%

HVAC Equipment failures covered differently

38%

Foundation Cracks → structural claims, often excluded

52%

Exterior Siding/window gaps → storm damage

61%

Percentage of insurance claims that had a related finding in a prior inspection report. Based on matching 380 claims against 2,400 inspection reports.

Roofing and plumbing show the strongest inspection-to-claim correlation. When an inspector notes “asphalt shingles with granule loss” and a claim appears 18 months later for “water intrusion through roof,” the extracted attributes connect them: same property, same building system, same area, progressing severity.

Electrical issues rarely become claims because electrical fires are catastrophic (total loss, different claim process) while electrical deficiencies that don’t cause fires just get repaired at the owner’s expense. The low 45% match rate reflects this — most electrical findings never become claims.

Extraction prompts for inspection reports

Inspection reports have enough structure that a well-designed enrichment prompt captures the essential data from each section.

Property-level extraction:

“Extract the following from this home inspection report: property address, inspection date, inspector name/company, overall condition summary, and total number of deficiencies found.”

Section-level extraction (run once per major section):

“For the ROOFING section of this inspection report, extract: condition rating, roof type and material, estimated age, deficiencies found (list each separately), recommended repairs, and urgency level (routine maintenance, should repair within 1 year, safety concern — immediate action needed).”

Photo-aware extraction:

“This inspection report page contains photos of deficiencies. For each photo: describe the visible issue, assess severity from the visual evidence (cosmetic, moderate, severe), and note whether the damage appears active or historical.”

The photo-aware prompt is where multimodal AI shines. An inspector’s written note might say “water stain on ceiling.” The embedded photo shows the stain is large, dark-ringed (indicating repeated wetting), and located directly below a bathroom — details the written note didn’t capture. AI extraction from the photo adds severity and probable cause information that the text alone doesn’t provide.

Extraction prompts for insurance documents

Insurance documents are more varied in format but follow predictable patterns.

Adjuster report:

“Extract from this adjuster report: date of loss, cause of damage (e.g., water, fire, wind, hail), affected rooms/areas (list each), damage severity per area, and whether the cause is a single event or gradual deterioration.”

Repair estimate:

“Extract from this repair estimate: total estimated cost, and for each line item: trade category (plumbing, electrical, roofing, drywall, flooring, etc.), description of work, and cost. Group line items by trade.”

Settlement letter:

“Extract: claim number, date, approved repair amount, policyholder deductible, depreciation amount, net payment to policyholder, and any denied items with denial reason.”

The matching pipeline

Once both inspection reports and insurance claims are reduced to structured data, the matching pipeline connects them:

Pre-filter on address. Inspection reports include property addresses. Insurance claims include insured property addresses. String matching on address (with fuzzy tolerance for formatting differences) restricts comparisons to the same property.

Pre-filter on building system. Inspection findings are categorized by section (roof, plumbing, electrical). Insurance claims are categorized by damage type (water intrusion, fire, wind). A mapping layer connects these: “water intrusion” maps to roofing + plumbing inspection sections. This prevents comparing electrical findings against water damage claims.

Embedding similarity on findings. The inspector’s narrative (“Asphalt shingles showing granule loss on south-facing slope, approximately 15 years old, recommend evaluation by roofing contractor within next 2 years”) and the claim narrative (“Water intrusion through roof during heavy rain event, damage to second floor ceiling and walls, asphalt shingle failure at ridge line”) embed into semantically similar vectors — both describe the same roof deteriorating.

Temporal filtering. The claim must post-date the inspection. An inspection from 2025 can’t predict a claim from 2023. Numeric pre-filtering on dates ensures chronological ordering.

LLM confirmation. For candidate matches, the LLM sees both documents: “Does this inspection finding relate to this insurance claim? Consider the building system, location within the property, type of deficiency, and whether the inspection finding could reasonably have progressed to the claimed damage.” The LLM provides reasoning: “Yes — inspector noted granule loss and recommended roof evaluation. Claim filed 18 months later for roof leak in same area. Consistent with shingle deterioration progressing to active leak.”

What this analysis reveals

Matching inspection findings to insurance claims produces actionable insights:

Preventable claims. When 72% of roofing claims had a prior inspection flag, that suggests the inspection process is identifying real risks — but the follow-up repair isn’t happening. A property manager who acts on inspection findings could prevent a significant portion of claims.

Inspector accuracy. By tracking which inspection findings actually became claims, you can assess which inspectors are good at identifying real risks versus noting cosmetic issues that never progress. An inspector whose “Poor” ratings correlate strongly with later claims is providing genuinely useful risk assessment.

Claim validation. A claim for “sudden water damage” on a property where the inspection noted “active slow leak in hall bath” six months ago isn’t sudden — it’s progressive damage from a known issue. This changes the coverage analysis.

Portfolio risk scoring. Properties with multiple inspection deficiencies in the same system (e.g., three plumbing findings over three inspections) have a quantifiably higher claim probability. Extraction and matching across the full inspection history produces a risk score per property per building system.

Portfolio risk analysis from matched inspection + claim data

Risk level	Properties	Inspection flags (avg)	Claim rate	Avg claim cost
Low	1,420	0.8	4%	$3,200
Medium	680	2.1	12%	$8,400
High	240	3.7	28%	$16,100
Critical	60	5.2	52%	$24,800

Risk level derived from number and severity of unresolved inspection findings. Critical properties have 52% claim probability — 13x the rate of low-risk properties.

The 60 critical-risk properties — those with 5+ unresolved inspection findings — have a 52% claim probability and an average claim cost of $24,800. Proactively addressing those inspection findings would cost a fraction of the expected claim payouts.

Beyond residential: commercial and specialty inspections

The same extraction approach works for:

Commercial property inspections. Larger buildings, more systems (elevators, fire suppression, commercial HVAC), higher-value claims. The extraction prompts need to cover additional building systems but the workflow is identical.

Environmental assessments. Phase I and Phase II environmental site assessments are standardized PDF reports. AI extraction pulls recognized environmental conditions, recommended actions, and risk classifications — matchable against property transaction records and regulatory databases.

Appraisal reports. Property appraisals contain comparable sales, condition assessments, and valuation conclusions. Extracting the comps and matching them against MLS data validates the appraiser’s comparable selection.

Building code violation reports. Municipality inspection reports follow jurisdiction-specific formats but contain consistent data: address, violation type, severity, compliance deadline. Extracting and matching against property portfolios identifies compliance exposure.

Your inspection reports and insurance documents are sitting in folders, full of structured data that no one has time to read. AI extraction turns each PDF into a data row. Matching connects the dots across thousands of documents.

Start extracting from documents →

Keep reading

PDF categorization and data extraction — the general framework for extracting structured data from any PDF type
AI extraction vs AI enrichment — understanding the two extraction stages and when to use each
Matching real estate listings with photos — photo-based matching for property listings
Entity resolution explained — the theory behind linking records across data sources