A retailer acquires a competitor and needs to merge the product catalogs. They export both as CSVs: 12,000 products from their system, 8,400 from the acquisition. They run a matching job on product name and price.

The results are a mess. “Blue T-Shirt Size M” from catalog A matches six different products in catalog B because half their apparel has equally generic names. “Wireless Headphones” matches eleven times. Meanwhile, products with genuinely different names but identical items — “Men’s Crew Neck Tee (Navy, M)” vs “Blue T-Shirt Size M” — don’t match at all.

The problem isn’t the matching algorithm. The problem is the data. When your product records have three fields — name, price, category — the algorithm has three signals to work with. That’s not enough to distinguish thousands of products in the same category and price range.

What product data annotation means

Product data annotation is the process of adding structured attributes to product records. Every attribute you add gives the matching algorithm another signal to compare on.

A minimal product record looks like this:

FieldValue
NameBlue T-Shirt Size M
Price$24.99
CategoryApparel

A well-annotated product record looks like this:

FieldValue
NameBlue T-Shirt Size M
BrandHanes
Material100% cotton, jersey knit
ColorNavy blue
SizeM
FitRegular
NecklineCrew neck
SleeveShort sleeve
UPC038257364118
Price$24.99
CategoryMen’s > Tops > T-Shirts

The first record gives a matching algorithm 3 signals. The second gives it 11. When the algorithm can compare brand, material, color, size, fit, neckline, and sleeve length, it can confidently distinguish “Blue T-Shirt Size M” (Hanes navy cotton crew neck) from “Blue T-Shirt Size M” (Fruit of the Loom royal blue polyester V-neck).

The annotation quality spectrum

Same product at three annotation levels
Attribute Minimal Moderate Rich
Name Wireless Headphones Sony WH-1000XM5 Wireless Sony WH-1000XM5 Wireless NC Headphones
Brand Sony Sony
Model WH-1000XM5
Category Electronics Audio > Headphones Audio > Headphones > Over-ear > Noise-canceling
Color Black Black (matte finish)
Connectivity Bluetooth Bluetooth 5.2, 3.5mm aux
Features ANC, 30hr battery, multipoint, LDAC
Weight 250g
Price $298 $298 $298
Matching confidence 62% 81% 96%

Matching confidence represents how reliably this record can be distinguished from similar products in the same category.

At the minimal level, “Wireless Headphones” at $298 could be any of a dozen products. At the moderate level, brand and category narrow it to a handful. At the rich level, the model number alone is nearly unique — and the supporting attributes confirm it.

The relationship between annotation depth and matching quality isn’t linear. The first few attributes (brand, model, category) provide the largest jump. Additional attributes (weight, connectivity, features) provide diminishing but still meaningful improvements, especially for distinguishing variants and similar models.

Manual annotation doesn’t scale

The obvious approach is to hire someone to fill in the missing attributes. And for small catalogs, it works.

Manual annotation effort by catalog size
500 products ~1 week, 1 person
8 days
5,000 products ~5 weeks, 2 people
42 days
25,000 products ~3 months, 3 people
85 days
100,000 products ~10 months, 5+ people
100 days

Estimated annotation time assuming 50-100 products per person per day with quality checks.

At 50-100 products per annotator per day — a realistic rate for thorough annotation with quality checks — a catalog of 25,000 products takes three people three months. By the time they finish, hundreds of products have changed, been discontinued, or been added. The annotation is never truly “done.”

Manual annotation also introduces consistency problems. Annotator A writes “Navy blue.” Annotator B writes “Dark navy.” Annotator C writes “Blue (navy).” These are the same color described three different ways, and they won’t match unless you add another normalization step.

AI-powered annotation from text

Language models can read a product name and description and extract structured attributes automatically. Given the input “Sony WH-1000XM5 Wireless Noise Canceling Bluetooth Over-Ear Headphones, Black,” a model can reliably extract:

  • Brand: Sony
  • Model: WH-1000XM5
  • Type: Over-ear headphones
  • Features: Wireless, noise canceling, Bluetooth
  • Color: Black

This works well when the product name or description is detailed. The AI is parsing structured information that’s already present in the text — just not in separate fields.

But here’s the limitation: AI can only extract what the text contains. If the product listing says “Blue T-Shirt Size M” and nothing else, the AI can extract color (blue), size (M), and type (t-shirt). It cannot extract brand, material, neckline, sleeve length, or fit — because that information simply isn’t in the text.

When the product image tells you what the text doesn’t

This is where product images change the equation.

A product photo contains information that text listings routinely omit:

  • Brand — logos on the product, labels, packaging
  • Material — visual texture reveals cotton vs. polyester, matte vs. glossy, wood vs. laminate
  • Color accuracy — “blue” in text could be navy, royal, baby blue, or teal; the photo shows the exact shade
  • Design details — button count, zipper style, stitching pattern, hardware finish
  • Condition — new in packaging, used, refurbished, damaged
  • Size and proportions — relative to other objects in the image
Attributes available from text vs. text + image
Attribute From text only From text + image
Brand Sometimes (if in name) Almost always (logo visible)
Exact color Approximate ('blue') Precise (navy matte)
Material Rarely Usually (visual texture)
Condition If explicitly stated Visible (packaging, wear)
Design details Almost never Visible (buttons, stitching)
Neckline/fit Sometimes Always visible
Accessories included If listed Visible in photo
Packaging type Rarely Visible (box, bag, loose)

Image-based extraction is most valuable for attributes that sellers don't bother typing into text fields.

A product listing that says “Blue T-Shirt Size M” and includes a product photo can be annotated by AI as: brand Hanes (logo on collar tag), material cotton jersey (visible knit texture), color navy (precise shade from image), neckline crew (visible), sleeve short (visible), fit regular (proportions), condition new with tags (tag visible).

That’s seven additional attributes extracted from a single image — attributes that the text listing didn’t provide and a text-only AI couldn’t infer.

Building an annotation pipeline

The practical approach combines text and image annotation in stages:

Stage 1: Extract what the text already contains. Use AI completions to parse brand, model, category, and basic attributes from product names and descriptions. This is fast and high-confidence.

Stage 2: Fill gaps from product images. For attributes the text doesn’t provide — material, exact color, condition, design details — use file-based AI extraction. Upload the product photos, mark the image column as a “file” type, and create extraction rules that reference the images.

Stage 3: Embed the enriched records. With 10-12 attributes per product instead of 3, embedding similarity becomes much more discriminating. “Navy cotton crew-neck short-sleeve t-shirt by Hanes” embeds very differently from “Royal blue polyester V-neck tank top by Fruit of the Loom” — even though both started as “Blue T-Shirt Size M” in the original data.

Stage 4: Match with confidence. The enriched, embedded records produce matches that are accurate enough to act on without manual review for the clear cases, and focused enough for efficient human review of the borderline cases.

In Match Data Studio, this pipeline runs in a single project. Upload your CSVs, upload your product images, configure the extraction rules with the AI assistant, and let the pipeline handle the rest. The images never leave your project — they’re processed by Gemini for attribute extraction and then the extracted text attributes flow through the standard matching pipeline.

For a deeper look at how image categorization works at scale, see our guide on image categorization for product matching. And for the full technical walkthrough of extracting specific attributes from product photos, see extracting matchable attributes from product images.


Your product data is richer than your spreadsheet suggests. Add images to the matching pipeline and let AI extract the attributes your text fields are missing.

Start annotating your catalog →


Keep reading