Features

A ten-stage pipeline built for accuracy

Match Data Studio runs your data through a configurable sequence of transformations, AI enrichment, and comparisons — each stage tuned to your dataset by the AI assistant.

01

Data preparation

Normalize text, strip whitespace, apply custom transformations. The AI suggests transformations based on your column names and sample data.

  • Lowercase, trim, remove punctuation
  • Custom regex transforms
  • Column renaming and aliasing
02

AI extraction

Pull structured attributes out of unstructured text or files. Extract product names from descriptions, parse addresses into components, or read fields from PDFs and images.

  • Structured output from free text
  • Multimodal — images, PDFs, URLs
  • Custom extraction prompts
03

AI enrichment

Generate new fields that don't exist in the raw data. Classify records, standardize categories, translate text, or summarize content before matching.

  • LLM-generated columns
  • Classification and tagging
  • Standardization and translation
04

Vector embeddings

Convert text fields into dense vector representations for semantic similarity matching that goes beyond keyword overlap.

  • Multiple embedding models
  • Multi-field embedding support
  • Cached for performance
05

Cosine similarity

Compute pairwise similarity scores across all candidate pairs. Configurable thresholds filter out weak matches before further processing.

  • Fast vectorized computation
  • Per-field weight control
  • Configurable min threshold
06

Numeric matching

Match on numeric fields like prices, dates, or IDs with tolerance bands and custom comparison logic.

  • Exact, range, and tolerance modes
  • Date and number parsing
  • Multi-field conditions
07

String matching

Fuzzy string algorithms for cases where embeddings alone aren't enough — names, codes, identifiers.

  • Levenshtein, Jaro-Winkler
  • Phonetic matching
  • Token set ratio
08

LLM confirmation

For borderline matches, send candidate pairs to an LLM with a custom prompt for a final yes/no judgment.

  • Custom confirmation prompts
  • Threshold-gated (only uncertain pairs)
  • Reasoning logged per pair
09

Export

Download matched results as a clean CSV with all matched fields, scores, and the original rows from both datasets.

  • Custom column selection
  • Confidence scores included
  • One-click download

Start matching your data today

Start with 100 free credits. No credit card required.

Get started free →