A property data company maintains two datasets. Dataset A has 15,000 listings scraped from one platform — addresses, prices, square footage, and 3–5 listing photos per property. Dataset B has 22,000 listings from a different platform — same types of fields, different formatting, different photos of the same properties.

They need to match the overlapping properties. The obvious approach is address matching. But address formats diverge: “123 Main St, Apt 4B” vs “123 Main Street #4B” vs “123 Main, Unit 4-B.” Standardization helps, but doesn’t solve everything — typos, abbreviation inconsistencies, and missing unit numbers leave gaps.

Price matching narrows things down, but prices change between platforms and listing dates. Square footage sometimes disagrees by 50–200 sqft depending on measurement methodology. Bedroom and bathroom counts are reliable when present — but one platform uses “3 bed / 2 bath” and the other uses separate columns that are sometimes empty.

Meanwhile, both datasets include listing photos. And those photos show the same granite countertops, the same bay window, the same brick exterior with the green front door. The visual evidence is unambiguous — if only the matching system could see it.

What listing photos reveal

A single listing photo contains more matchable information than most people realize. A kitchen photo doesn’t just say “kitchen.” It says:

Attributes extractable from a single kitchen listing photo
Attribute Example extraction Matching value
Room type Kitchen — open concept to dining area Room-level matching between listings
Cabinet style White shaker cabinets, brushed nickel hardware Distinctive — narrows to specific renovation era
Countertop material Quartz, white with grey veining (Calacatta pattern) Highly distinctive across properties
Backsplash White subway tile, herringbone pattern Common but helps confirm when combined
Appliances Stainless steel — Samsung French door refrigerator, gas range Brand + model visible = strong signal
Flooring Engineered hardwood, medium oak tone Moderate signal — common but consistent
Lighting Three pendant lights over island, brushed gold finish Distinctive fixture choices narrow matches
Layout L-shaped with center island, seating for 3 Floor plan signal — stable across photos

A single kitchen photo can yield 8+ matchable attributes. Most text listings mention none of these details.

The text listing for this kitchen might say “Updated kitchen with granite counters and stainless appliances.” That’s 6 words of matchable content. The photo yields 8 structured attributes with specific details — white shaker cabinets (not just “updated”), quartz with Calacatta veining (not just “granite”), Samsung French door fridge (not just “stainless appliances”).

When both platforms have photos of the same kitchen, the extracted attributes match precisely because they’re describing the same physical space. Different photographer, different angle, different lighting — same cabinets, same countertops, same appliances.

Room-type categorization at scale

Before extracting detailed attributes, the first step is categorizing what each photo shows. Listing photo sets are unstructured — a mix of exterior shots, room interiors, aerial views, neighborhood photos, and occasionally floor plans. Without categorization, you don’t know which photo to compare against which.

AI categorization classifies each photo:

AI-categorized listing photos for a single property
Photo Room type Sub-type Key features noted
photo_01.jpg Exterior Front elevation Two-story colonial, brick, black shutters, red door
photo_02.jpg Interior Living room Hardwood floors, fireplace with white mantel, crown molding
photo_03.jpg Interior Kitchen White cabinets, quartz counters, island with pendant lights
photo_04.jpg Interior Primary bedroom Carpet, two windows, tray ceiling, ceiling fan
photo_05.jpg Interior Primary bathroom Double vanity, frameless glass shower, tile floor
photo_06.jpg Exterior Rear/yard Deck, fenced yard, mature trees, detached garage
photo_07.jpg Aerial Property overview Corner lot, ~0.25 acre, cul-de-sac location

Each photo categorized and described in a single AI pass. The descriptions become matchable text fields.

With photos categorized by room type, you can compare kitchen-to-kitchen, exterior-to-exterior, bathroom-to-bathroom. This prevents false negatives from comparing a kitchen photo in Dataset A against a bedroom photo in Dataset B and concluding the properties don’t match.

Exterior matching: the most reliable visual signal

Exterior photos are the strongest single signal for property matching. A property’s exterior — its architectural style, materials, color, roof line, landscaping, and distinctive features — is unique in a way that interiors often aren’t (many kitchens look similar after a renovation with the same trending finishes).

Match contribution by photo type
Front exterior Highly distinctive — architecture + color + features
92%
Kitchen Strong when finishes are distinctive
74%
Bathroom Good signal from fixtures + tile choices
68%
Living room Moderate — less distinctive finishes
55%
Bedroom Low — many bedrooms look similar
35%
Aerial/drone Roof + lot shape + surroundings
88%

Match contribution: how often this photo type provides the decisive signal for a correct property match. Based on matching 5,000 properties across two listing platforms.

Front exterior photos contribute to 92% of correct matches. The combination of architectural style (colonial, ranch, craftsman, modern), exterior material (brick, siding, stone, stucco), color, roof shape, and distinctive features (porch style, window placement, garage configuration) creates a near-unique visual fingerprint for each property.

Aerial and drone photos rank second at 88%. Roof shape, lot boundaries, pool presence, landscaping patterns, and the relationship to neighboring structures are all visible and highly distinctive.

Bedrooms score lowest at 35%. Most bedrooms are rectangular rooms with neutral walls and carpet or hardwood floors — not distinctive enough to identify a specific property. They help confirm a match but rarely create one.

The extraction-then-match workflow

The practical workflow combines photo categorization, attribute extraction, and multi-signal matching:

Phase 1: Categorize and extract. Each listing photo gets categorized by room type and has visual attributes extracted. The extraction prompt is tuned for real estate: “Categorize this listing photo by room type. Extract architectural style, materials, colors, fixtures, finishes, and any distinctive features. Note the condition: new construction, recently renovated, dated/original, or needs work.”

Phase 2: Create property-level descriptions. The per-photo extractions are concatenated into a property-level description: “Two-story colonial, brick exterior with black shutters and red front door. Interior: hardwood floors throughout main level, white shaker kitchen with quartz counters and stainless Samsung appliances, primary bath with frameless glass shower and double vanity.” This description captures the property’s visual identity in matchable text.

Phase 3: Embed and compare. The property descriptions are embedded. When two descriptions refer to the same property — same brick colonial, same white kitchen, same glass shower — their embeddings cluster tightly, even though different photographers described the scenes differently.

Phase 4: Confirm with visual comparison. For borderline embedding matches (similarity 0.75–0.85), the LLM sees photos from both listings side by side: “Compare these two sets of listing photos. Are they the same property? Note similarities and differences in exterior, kitchen, and bathrooms.”

Where photo matching outperforms text

The improvement isn’t marginal. Photo-based matching finds properties that text matching systematically misses.

Match rate: text-only vs text + photos
Address matching only Formatting gaps + typos + missing units
71%
Address + price + beds/baths Price drift + missing fields limit gains
79%
Text fields + photo attributes Visual attributes fill text gaps
89%
Full pipeline with visual LLM Side-by-side photo confirmation
94%

Match rate against 5,000 properties known to appear in both datasets. Full pipeline catches 94% while maintaining 97% precision.

The 15-point jump from 79% to 94% represents 750 additional correct matches per 5,000 properties. At scale — matching 50,000 listings — that’s 7,500 properties that would be invisible to text-only matching.

The properties that photo matching rescues share a pattern: address formatting prevents the text match, price has drifted between platforms, and one or both datasets have missing fields. The photos, however, are unambiguous.

Handling the edge cases

Same property, different renovation state

A property listed in 2024 with original oak cabinets might appear in a 2025 dataset with a white kitchen renovation. The exterior still matches. The bathroom might still match. The kitchen photos are different — but the AI doesn’t just say “different kitchen.” It can note: “Kitchen has been renovated between these listings. Cabinet style changed from raised-panel oak to flat-panel white. Countertops changed from laminate to quartz. Appliances upgraded. Same floor plan and window placement.”

This is actually valuable data. You’ve identified the property match and documented the renovation scope — information neither text dataset contains.

Same unit in a condo/apartment building

Interior photos of units in the same building can be nearly identical — same floor plan, same standard finishes. Here, photo matching needs support from text fields: unit number, floor level, square footage, and price. The photos confirm “this is the same building and same floor plan” while text fields disambiguate “but this is unit 4B, not 4A.”

Stock photos and virtual staging

Some listings use stock photos or virtual staging instead of actual property photos. AI can detect this: “This appears to be a virtually staged image — furniture has inconsistent shadows and the perspective suggests digital placement.” Virtually staged photos should be flagged rather than used for matching, since the staging is generic and not property-specific.

Seasonal differences

The same exterior photographed in summer (green trees, flowers) and winter (bare branches, snow) looks dramatically different to pixel-level comparison. AI handles this naturally — it identifies the architectural features, materials, and structural elements that don’t change with seasons rather than relying on vegetation or lighting.

Practical setup

The pipeline configuration for real estate photo matching:

  1. CSV columns. Your listing CSV needs at minimum: address, price, and a column with photo URLs or filenames. If you have multiple photos per listing, you can use the primary photo or create separate rows per photo and aggregate after matching.

  2. Photo column type. Set the photo column to “file” type. If your photos are public URLs (common for scraped listing data), the system fetches them directly — no upload needed. For local files, upload them through Project Files.

  3. AI extraction. Create an enrichment rule: “This is a real estate listing photo. Categorize by room type (exterior/kitchen/bathroom/bedroom/living/aerial/other). Extract: architectural style, primary materials, colors, notable fixtures and finishes, condition assessment, and any distinctive features. Be specific — ‘white shaker cabinets with brushed nickel pulls’ not just ‘white cabinets.’”

  4. Embeddings. Add the photo column to embeddings. The auto-generated descriptions capture visual content for semantic comparison.

  5. LLM confirmation. Include photos in the LLM check. Prompt: “Compare these listing photos. Same property? Consider exterior architecture, interior finishes, floor plan, and distinctive features. If renovated between listings, note that.”

  6. Pre-filters. Add string pre-filters on address (contains match on street name) and numeric pre-filters on price (within 20% — accounts for price changes between platforms). These reduce the comparison space before the expensive AI steps run.


Your listing photos are your most reliable matching signal. Addresses have typos. Prices drift. Square footage disagrees. But the brick colonial with the red front door looks the same on every platform.

Match listings with photos →


Keep reading