How do recruiting consultants currently use AI for candidate matching?

Most use a combination of tools: ChatGPT or Claude for resume screening, Python scripts with embedding APIs for similarity scoring, and spreadsheets for tracking results. Some use no-code automation tools like Zapier to connect these steps. The result works for small projects but breaks down with multiple clients, changing data formats, and the need for reproducible results.

What are the limitations of using ChatGPT for candidate matching?

ChatGPT can evaluate individual candidate-role pairs well, but it cannot process thousands of pairs efficiently. Each API call evaluates one pair at a time, there is no built-in way to rank across candidates, and results are not structured for downstream processing. It also lacks pre-filtering — every pair gets the same expensive LLM evaluation, even obvious mismatches that a simple location or seniority filter would eliminate.

How many candidates can Match Data Studio match in a single run?

Match Data Studio handles datasets with tens of thousands of records on each side. The pipeline uses pre-filters to reduce the candidate pair space before running expensive AI operations, so a 10,000-candidate pool matched against 100 roles does not generate a million LLM calls. Pre-filters typically reduce pairs by 80-95%, and embedding similarity further narrows the set before any LLM evaluation runs.

Scaling your AI recruiting practice beyond spreadsheets and scripts

You started your AI recruiting practice the way most people do. A client asks you to find candidates for a handful of roles. You export their candidate database as a CSV, open it alongside the job descriptions, and start working.

Maybe you paste resumes into ChatGPT one at a time: “Rate this candidate’s fit for this role on a scale of 1-10.” Maybe you write a Python script that calls an embedding API, computes cosine similarity between resume vectors and job description vectors, and sorts the results. Maybe you use a combination — embeddings for initial ranking, then LLM evaluation for the top 50.

It works. The client is happy. You take on another client.

Then the cracks appear.

The pattern that doesn’t scale

Client two has different data. Their ATS export uses different column names. Their job descriptions are in a separate document, not inline. Some candidates have resume PDFs instead of text. Your script from client one does not work without modification.

Client three wants matching with different priorities — they care about industry experience more than skills overlap. Your scoring weights are hardcoded. You fork the script again.

By client five, you have five separate codebases, each with slightly different data cleaning logic, different embedding calls, different scoring functions, and different output formats. Debugging is painful because changes to one pipeline might break another. Re-running a previous client’s matching job requires remembering which version of the code they used.

The scaling wall — symptoms by client count

# Clients	Time per engagement	Maintenance burden	Failure mode
1-2	8-12 hours	None — scripts are fresh	None
3-5	10-15 hours	Moderate — forked codebases diverge	Wrong script version used for client
6-10	15-20 hours	Heavy — bugs in one pipeline affect others	Client data format breaks assumptions
10+	20+ hours	Unsustainable — more time on code than matching	Cannot reproduce previous results

Time includes data preparation, pipeline adaptation, matching, and output formatting. Does not include client communication.

The root cause is that every engagement is treated as a custom development project. The matching logic is entangled with data cleaning, API orchestration, and output formatting. There is no separation between the matching configuration (what to match, how to weight it, what thresholds to use) and the matching infrastructure (data ingestion, embedding computation, similarity scoring, output generation).

What breaks first: the common failure points

Understanding where custom matching workflows break helps identify what needs to change.

Data format surprises. A new client’s CSV has merged first/last name columns, dates in a non-standard format, or skills listed in a single comma-separated string instead of separate rows. Every format variation requires new parsing code.

API cost blowups. Without pre-filtering, every candidate-role pair gets an embedding computation and possibly an LLM evaluation. A client with 5,000 candidates and 50 roles generates 250,000 pairs. At typical API pricing, running embeddings on all pairs costs $50-200, and adding LLM evaluation pushes it into the thousands. Most of those pairs are obviously wrong matches that a simple filter would have eliminated.

Threshold confusion. You set a similarity threshold of 0.7 for one client and it works well. You use the same threshold for another client with different data and get terrible results — either too many false positives or too few matches. Thresholds are not portable across datasets because the underlying similarity distributions differ.

Unreproducible results. A client asks you to re-run their matching with a small tweak. You cannot find the exact version of the script you used. Or the embedding API has been updated and produces slightly different vectors. Or the data cleaning step changed between runs. The new results differ from the originals and you cannot explain why.

Most common failure points in custom matching workflows

Data format issues New client data breaks existing parsing

35%

Cost overruns No pre-filtering, all pairs hit APIs

25%

Threshold miscalibration Same threshold, different results

20%

Reproducibility Cannot recreate previous runs

12%

Output formatting Client needs different report format

Failure frequency based on common patterns in custom matching implementations.

The shift: from coding to configuring

The fix is not better scripts. It is separating the matching configuration from the matching infrastructure.

Matching infrastructure is the plumbing: data ingestion, column mapping, pre-filter execution, embedding computation, similarity scoring, LLM orchestration, and output generation. This is the same across every engagement. It should be built once and reused.

Matching configuration is the domain knowledge: which columns to match on, what pre-filters to apply, how to weight different signals, what similarity threshold to use, whether to run LLM confirmation, and what to include in the output. This changes per client and per engagement.

When these are separated, scaling looks different:

Custom scripts vs. configurable platform — per engagement

Task	Custom script approach	Configurable platform
Data ingestion	Write parsing code per format	Upload CSV, map columns in UI
Pre-filtering	Code filters per engagement	Toggle location/seniority/function filters
Embedding similarity	API calls, vector storage, scoring code	Select fields, set weights
LLM evaluation	Prompt engineering, API orchestration	Enable/disable, customize prompt
Threshold tuning	Edit code, re-run, check results	Adjust slider, preview results
Output	Custom formatting script	Download ranked CSV with scores
Re-run with changes	Find old code version, modify, pray	Load project, adjust config, run

Configuration-based approach eliminates per-engagement development work.

The consultant’s time shifts from writing code to making decisions: which signals matter for this client, what thresholds produce the right precision-recall balance, whether LLM confirmation adds enough value to justify the runtime.

The pre-filter economics that scripts miss

One of the most impactful differences between a custom script and a proper matching pipeline is pre-filtering. Most scripts skip this step entirely because implementing it feels like premature optimization. It is not.

Consider a talent matching engagement with 8,000 candidates and 60 roles. The full cross product is 480,000 pairs.

Cost comparison: with and without pre-filters

No pre-filters 480K pairs × embedding + LLM

480K pairs

Location filter ~216K pairs (55% eliminated)

216K pairs

+ Seniority ~96K pairs (80% eliminated)

96K pairs

+ Function ~43K pairs (91% eliminated)

43K pairs

Each eliminated pair saves an embedding computation and potentially an LLM call. At scale, pre-filters reduce API costs by 10-20x.

With pre-filters, the AI evaluation runs on 43,000 pairs instead of 480,000. That is an 11x reduction in API costs and runtime. The match quality actually improves because the AI is not wasting capacity evaluating obviously wrong pairs — a data scientist in Mumbai against a marketing manager role in Chicago.

A proper matching pipeline runs these pre-filters before any AI operation. String pre-filters (location, function, required certifications) are nearly free computationally. Numeric pre-filters (experience range, compensation range) are similarly cheap. They run in seconds even on large datasets.

Building repeatable configurations by vertical

Once you are working with configurable matching instead of custom code, you can build and refine matching configurations by vertical. A tech recruiting configuration looks different from a healthcare recruiting configuration, which looks different from an executive search configuration.

Matching configuration templates by recruiting vertical

Vertical	Pre-filters	Key embedding fields	LLM confirmation	Typical threshold
Tech (IC roles)	Location, seniority	Skills (0.5), tech stack (0.3), domain (0.2)	On — evaluate project relevance	0.65-0.75
Healthcare clinical	License type, state	Specialty (0.4), experience (0.3), certifications (0.3)	On — verify credential specifics	0.70-0.80
Executive search	Industry, seniority ≥ Director	Industry (0.3), leadership scope (0.4), domain (0.3)	On — detailed fit narrative	0.75-0.85
Contract / gig	Availability, location	Skills (0.6), rate range (0.2), recency (0.2)	Off — speed over precision	0.55-0.65
Sales / GTM	Industry, territory	Industry (0.4), deal size (0.3), product type (0.3)	On — evaluate market knowledge	0.60-0.70

These are starting configurations. Refine thresholds based on client feedback on result quality.

These configurations become your intellectual property. A new client in healthcare recruiting starts with your healthcare template. You adjust based on their specific needs — maybe they care more about research publications than certifications — and run the match. The configuration is saved with the project, reproducible, and reusable.

Over time, you accumulate a library of proven configurations across verticals. New engagements start from a template rather than a blank script. Onboarding a new client drops from days to hours.

What your clients actually see

From the client’s perspective, the shift from script-based matching to configured matching shows up in three ways.

Faster turnaround. When the first deliverable does not require days of pipeline coding, clients get initial results within hours of providing their data. Faster iteration means faster feedback, which means better final results.

Explainable results. A ranked CSV with match scores is better than a gut-feel shortlist, but a ranked CSV with match scores, matched signals, and LLM reasoning is actionable. Hiring managers can see why candidate A ranks above candidate B and make informed decisions about who to advance.

Adjustable precision. When a client says “these results are too broad — I’m getting candidates who are adjacent but not quite right,” you can tighten the threshold, add a pre-filter, or adjust field weights and re-run in minutes. With custom scripts, this feedback loop takes days.

The net effect is that your service becomes more valuable — faster, more transparent, more responsive — while requiring less effort per engagement.

Getting off the script treadmill

If you recognize the pattern — forked scripts, format-specific parsing, hardcoded thresholds, unreproducible results — the path forward is adopting matching infrastructure that separates configuration from plumbing.

Match Data Studio provides the full matching pipeline as a configurable platform. Upload candidate and job data as CSVs (including resume PDFs as file columns), configure pre-filters, set embedding weights and similarity thresholds, enable LLM confirmation with custom prompts, and get ranked output with match scores and reasoning. Each client gets their own project with saved configurations. Re-running with tweaks takes minutes, not days.

Start matching with your client data —>

Keep reading

AI talent matching as a service: the infrastructure gap holding consultants back — the business case for configurable matching infrastructure
Building a candidate-to-job matching workflow that actually scales — the step-by-step matching pipeline for talent matching
Five matching mistakes that silently ruin your results — configuration errors that produce bad matching output
Data cleaning before matching: the steps most people skip — preparing candidate data for reliable matching results