What's the cost difference between extraction and enrichment?

Extraction makes one AI call per row per rule. Enrichment also makes one AI call per row per rule — but that single call returns multiple fields. If you need 5 attributes from a file, five extraction rules cost 5x the API calls compared to one enrichment rule that returns all five. Enrichment is almost always more cost-effective for multi-attribute extraction.

Can extraction and enrichment work on text columns, not just files?

Yes. Both work on any column type. You can extract or enrich from a text column containing product descriptions, addresses, or any unstructured text. File columns (images, PDFs) are the most common use case because files contain the most untapped information, but the same AI pipeline processes text columns identically.

What happens when the AI can't extract a field from a file?

The extraction returns an empty value or 'N/A' for that row. This is common with low-quality images, blurry PDFs, or files that simply don't contain the requested information. Downstream matching treats empty extraction fields the same as any other missing data — they're skipped for that comparison, and other fields carry the matching signal.

AI extraction vs AI enrichment: how structured data gets pulled from files

You upload a CSV with 10,000 rows. One column contains product image URLs. Another contains PDF spec sheet filenames. The text columns have product names and prices — useful but incomplete. The real information is locked inside those files.

The matching pipeline needs structured data to compare records. It can’t compare two JPEGs pixel by pixel and tell you they’re the same product. It can’t diff two PDFs and decide they describe the same component. What it can do is extract structured attributes from those files and then match on those attributes.

Two pipeline stages handle this: AI extraction and AI enrichment. They sound similar. They both use AI to pull information from files. But they serve different purposes, and choosing the right one for each task determines whether your matching pipeline gets useful signals or noisy ones.

AI extraction: one question, one answer

AI extraction asks a single, focused question about a file and gets back a single value. One input column, one output column.

Examples:

“What brand is shown in this product image?” → "Nike"
“What is the document type of this PDF?” → "Invoice"
“What color is the primary item in this photo?” → "Navy blue"
“Is this property photo an interior or exterior shot?” → "Interior — kitchen"

Each extraction rule creates exactly one new column in your dataset. The AI sees the file, answers the specific question, and moves on.

Three extraction rules applied to product images

Image	→ brand	→ primary_color	→ condition
shoe_001.jpg	Nike	White/Black	New with tags
shoe_002.jpg	Adidas	Grey/White	New without box
shoe_003.jpg	New Balance	Navy/Red	Used — light wear
shoe_004.jpg	Nike	Black/Volt	New with box
shoe_005.jpg	Puma	White/Gold	New with tags

Three separate extraction rules, each producing one column. Each rule = one AI call per row.

The advantage of extraction is precision. When you ask “What brand?” the AI focuses entirely on brand identification. It looks for logos, brand text, design language — nothing else. The answer is short, consistent, and directly usable as a matching field.

The disadvantage is cost at scale. If you need five attributes from each image, you need five extraction rules, which means five AI calls per row. At 10,000 rows, that’s 50,000 API calls for five attributes.

AI enrichment: one call, many answers

AI enrichment asks a complex question and gets back multiple structured fields in a single AI call. One input file, multiple output columns.

Example enrichment prompt: “Analyze this product image. Extract: brand, product type, primary color, material, and condition.”

That single call returns:

brand: "Nike"
product_type: "Running shoe — low top"
primary_color: "White with black swoosh and accents"
material: "Mesh upper, rubber sole"
condition: "New with original box and tags"

Five fields from one API call instead of five.

One enrichment rule producing five columns

Image	→ brand	→ product_type	→ primary_color	→ material	→ condition
shoe_001.jpg	Nike	Running shoe	White/Black	Mesh/Rubber	New with tags
shoe_002.jpg	Adidas	Basketball shoe	Grey/White	Leather/Rubber	New without box
shoe_003.jpg	New Balance	Trail runner	Navy/Red	Synthetic/Rubber	Used — light wear

One enrichment rule, one AI call per row, five output columns. 5x more efficient than five separate extractions.

Enrichment is efficient. But it trades some precision for breadth. When the AI is asked to extract five attributes simultaneously, it distributes its attention across all of them. For most use cases this works fine — multimodal models are good at multi-task extraction. But for a particularly nuanced field (e.g., identifying a specific sub-model of a product from subtle visual differences), a dedicated extraction rule with a focused prompt will outperform the enrichment approach.

When to use which

The decision is straightforward:

Extraction vs enrichment: when to use each

Scenario	Use extraction	Use enrichment
Need 1-2 attributes from a file	✓
Need 3+ attributes from a file		✓
Need a field that requires focused analysis	✓
Need a quick categorization + description		✓
Working with a very large dataset (cost matters)		✓
Need maximum precision on a single field	✓
Building a matching pipeline from scratch		✓ (start here, then add extractions for weak fields)

The practical pattern is: start with enrichment to get broad coverage, then add targeted extraction rules for fields where the enrichment output isn’t precise enough.

For example, you run enrichment to extract brand, color, material, and product type from product images. The brand extraction is 95% accurate — good enough. But the material extraction is only 78% accurate because the enrichment prompt doesn’t give the AI enough room to analyze textures carefully. So you add a dedicated extraction rule: “Examine this product image closely. What is the primary material? Look at surface texture, sheen, visible grain patterns, and construction details. Be specific — ‘full-grain leather’ not just ‘leather.’” That focused prompt brings material accuracy to 90%.

How the prompts differ

The prompt is everything. Same AI model, same image — but a well-crafted prompt extracts precise, matchable data while a vague prompt returns useless descriptions.

Extraction prompt anatomy

An extraction prompt should be:

Focused on one attribute. Don’t ask about color in a brand-extraction prompt.
Specific about format. “Return the brand name only, no other text” prevents the AI from adding qualifiers.
Clear about edge cases. “If no brand is visible, return ‘Unknown’” prevents hallucinated brand names.

Good extraction prompt:

“Identify the brand of this product from any visible logos, labels, brand text, or distinctive brand design elements. Return only the brand name. If the brand is not identifiable, return ‘Unknown’.”

Bad extraction prompt:

“What brand is this?”

The bad prompt might return “This appears to be a Nike product based on the swoosh logo visible on the side” — a sentence, not a matchable value. The good prompt returns “Nike” — clean, consistent, directly comparable across rows.

Enrichment prompt anatomy

An enrichment prompt should be:

Structured with clear field names. Tell the AI exactly what fields you want and what to call them.
Specific about each field’s requirements. A one-line description per field prevents ambiguity.
Explicit about the output format. The system parses the response into columns, so consistency matters.

Good enrichment prompt:

“Analyze this product image and extract the following fields:

brand: The brand name from logos or labels. ‘Unknown’ if not identifiable.

product_type: Specific product category (e.g., ‘running shoe’ not just ‘shoe’).

primary_color: The dominant color(s), be specific (e.g., ‘navy blue’ not ‘blue’).

material: Primary material of the main body (e.g., ‘mesh upper’ or ‘full-grain leather’).

condition: New/Used/Refurbished with visible evidence.”

Bad enrichment prompt:

“Describe this product.”

The bad prompt returns a narrative paragraph. The good prompt returns five discrete, matchable fields.

File types and what they yield

Different file types contain different kinds of extractable information. The prompt strategy should match the file type.

Images (JPG, PNG, WebP)

Images are the most information-dense files for visual attributes. A single product photo can yield 8–12 structured fields.

Strong extractions: Brand, color, product type, design features, condition, packaging. Weak extractions: Exact dimensions (no reference object), weight (not visual), technical specifications (not in the image).

Prompt strategy: Focus on visually observable attributes. Don’t ask images for information that requires text data (model numbers, specifications, pricing).

PDFs

PDFs contain structured text, tables, images, and layout — the most varied file type. What you can extract depends on the document type.

Strong extractions: Entity names, dates, financial figures, technical specifications, compliance data, tabular content. Weak extractions: Implicit relationships (“this clause modifies that clause”), sentiment, intent.

Prompt strategy: Be specific about which sections to extract from. “Extract the manufacturer and part number from the header block” outperforms “What manufacturer made this?” because it directs the AI’s attention.

URLs (public web content)

When a column contains URLs instead of filenames, the AI fetches the page content server-side. This works for public product pages, listing URLs, and documentation.

Strong extractions: Structured data embedded in pages (prices, specifications, descriptions), visible content in images on the page. Weak extractions: Dynamic content (JavaScript-rendered data), content behind authentication, Cloudflare-protected pages.

Prompt strategy: Specify what you expect the URL to contain. “This URL points to a product listing page. Extract the product name, price, and specifications” gives the AI context about what kind of page it’s reading.

The pipeline flow

Extraction and enrichment run during Stage 2 of the matching pipeline. Here’s how they fit into the full flow:

Where extraction and enrichment sit in the pipeline

Stage	What happens	Uses files?
Stage 1: Prepare	Normalize columns, generate candidate pairs, apply string/numeric pre-filters	No — text and numbers only
Stage 2: Enrich	AI extraction → AI enrichment → Embeddings	Yes — files processed here
Stage 3: Match	Cosine similarity, thresholds, LLM confirmation	Yes — LLM can view files for confirmation
Stage 4: Output	Generate results CSV	No — uses extracted text from Stage 2

Files are processed in Stage 2. The extracted text flows through Stages 3 and 4 as regular columns.

The critical insight: files are read once in Stage 2, then the extracted text replaces them for all downstream operations. The embedding model doesn’t see your images — it sees the text descriptions generated from your images. The cosine similarity calculation doesn’t compare PDFs — it compares the vectors of the text extracted from those PDFs.

This design means extraction and enrichment quality directly determines matching quality. A sloppy enrichment prompt that returns vague descriptions produces vague embeddings that match imprecisely. A sharp enrichment prompt that returns specific, structured attributes produces specific embeddings that match accurately.

Cost and performance tradeoffs

At scale, the choice between extraction and enrichment has real cost implications.

API calls for 10,000 rows — 5 attributes needed

5 extraction rules 5 calls × 10,000 rows

50000 calls

1 enrichment rule 1 call × 10,000 rows

10000 calls

1 enrichment + 1 extraction Enrichment for 4 fields + focused extraction for 1

20000 calls

The hybrid approach (enrichment + targeted extraction) is typically the best balance of cost and quality.

The hybrid approach — one enrichment rule for broad coverage plus one or two targeted extraction rules for fields that need extra precision — typically delivers the best cost/quality balance. You get 80% of the value from the enrichment call and use targeted extraction to push the remaining fields to the accuracy you need.

Extraction for categorization

A common pattern is using extraction purely for categorization — sorting files into types before applying different enrichment strategies.

For example, a dataset with a mixed document column (some rows have product photos, some have PDF spec sheets, some have certificates):

First pass: extraction to categorize. “What type of file is this? Return one of: product_photo, spec_sheet, certificate, invoice, other.”
Second pass: enrichment tuned by category. Different enrichment prompts for each file type. Product photos get “Extract brand, color, material, condition.” Spec sheets get “Extract manufacturer, part number, material grade, pressure rating.” Certificates get “Extract issuing authority, certification type, expiry date.”

This two-pass approach is more accurate than a single generic enrichment prompt because the AI knows what kind of document it’s analyzing before trying to extract specific fields.

Common mistakes

Asking for too many fields in one enrichment call. Enrichment works well for 3–8 fields. Beyond that, extraction quality degrades — the AI has too many objectives and doesn’t focus enough on any single one. If you need 15 attributes, split into two enrichment rules of 7-8 fields each.

Not specifying output format. Without format guidance, the AI might return “The brand appears to be Nike, based on the swoosh logo” for one row and just “Adidas” for another. The inconsistency breaks downstream matching. Specify: “Return only the brand name, nothing else.”

Asking images for non-visual information. “What is the price of this product?” from a product photo is almost always wrong — the price isn’t in the image. Extract price from text columns, extract visual attributes from images. Use each data source for what it’s good at.

Ignoring extraction quality before running a full match. Run a sample (5 rows from each dataset) and check the extracted values. If the brand column says “Unknown” for 40% of rows, your enrichment prompt needs work before you scale to 10,000 rows. Debugging at 10 rows costs nothing. Debugging at 10,000 rows costs time and credits.

Extraction gives you precision on one field. Enrichment gives you breadth across many. The best pipelines use both — enrichment for the bulk of the work, extraction for the fields that need extra care.

Start extracting →

Keep reading

Matching with images and attributes — how extracted attributes feed into the full matching pipeline
Extracting matchable attributes from product images — deep dive on product image extraction specifically
PDF categorization and data extraction — extraction techniques for document files
Matching real estate listings with photos — extraction applied to property listing photos