A ten-stage pipeline built for accuracy
Match Data Studio runs your data through a configurable sequence of transformations, AI enrichment, and comparisons — each stage tuned to your dataset by the AI assistant.
Data preparation
Normalize text, strip whitespace, apply custom transformations. The AI suggests transformations based on your column names and sample data.
- Lowercase, trim, remove punctuation
- Custom regex transforms
- Column renaming and aliasing
AI extraction
Pull structured attributes out of unstructured text or files. Extract product names from descriptions, parse addresses into components, or read fields from PDFs and images.
- Structured output from free text
- Multimodal — images, PDFs, URLs
- Custom extraction prompts
AI enrichment
Generate new fields that don't exist in the raw data. Classify records, standardize categories, translate text, or summarize content before matching.
- LLM-generated columns
- Classification and tagging
- Standardization and translation
Vector embeddings
Convert text fields into dense vector representations for semantic similarity matching that goes beyond keyword overlap.
- Multiple embedding models
- Multi-field embedding support
- Cached for performance
Cosine similarity
Compute pairwise similarity scores across all candidate pairs. Configurable thresholds filter out weak matches before further processing.
- Fast vectorized computation
- Per-field weight control
- Configurable min threshold
Numeric matching
Match on numeric fields like prices, dates, or IDs with tolerance bands and custom comparison logic.
- Exact, range, and tolerance modes
- Date and number parsing
- Multi-field conditions
String matching
Fuzzy string algorithms for cases where embeddings alone aren't enough — names, codes, identifiers.
- Levenshtein, Jaro-Winkler
- Phonetic matching
- Token set ratio
LLM confirmation
For borderline matches, send candidate pairs to an LLM with a custom prompt for a final yes/no judgment.
- Custom confirmation prompts
- Threshold-gated (only uncertain pairs)
- Reasoning logged per pair
Export
Download matched results as a clean CSV with all matched fields, scores, and the original rows from both datasets.
- Custom column selection
- Confidence scores included
- One-click download
Start matching your data today
Start with 100 free credits. No credit card required.
Get started free →