How to deduplicate your vendor list and stop paying the same supplier twice
Duplicate vendor records cause overpayments, missed discounts, and audit findings. Learn how to match vendor records across ERPs and maintain a clean vendor master.
Your accounts payable team processes an invoice from “Grainger Industrial Supply.” They match it to a vendor record, approve the payment, and move on. Two weeks later, another invoice arrives from “W.W. Grainger, Inc.” Different vendor number. Same supplier. The invoice gets matched to the second record, approved, and paid.
One of those invoices was a duplicate. The same goods, the same PO, the same dollar amount — paid twice because the ERP had two records for the same company.
This happens more often than most finance teams realize. Duplicate vendor records are the single most common root cause of duplicate payments, and they exist in virtually every organization’s ERP. The AP team isn’t negligent — they’re working with bad data. They matched each invoice to a valid vendor record. The problem is that two valid vendor records shouldn’t both exist.
The cost of duplicate vendors: duplicate payments, missed discounts, audit findings
The financial impact of vendor duplicates operates on three levels, each progressively harder to detect.
Duplicate payments are the most visible cost. When the same supplier exists under two or more vendor codes, invoices for the same purchase order can be entered and approved against different records, bypassing the duplicate-invoice detection that most ERP systems provide. Duplicate detection typically matches on vendor number + invoice number + amount. If the vendor number is different, the check passes even though the underlying payment is a duplicate.
Industry benchmarks from the Institute of Finance and Management and AP automation providers consistently estimate that 1-3% of total AP spend goes to duplicate payments. For mid-market companies with $50 million in annual supplier spend, that’s $500,000 to $1.5 million in overpayments per year. Most of it is recoverable — but only if you detect it, and the recovery process costs time and goodwill.
Missed volume discounts are the hidden cost. If you’re buying $800,000 annually from a supplier but the spend is split across three vendor codes showing $300K, $350K, and $150K, you never trigger the volume tier that would give you a 4% discount. Your procurement team doesn’t even know to negotiate because their spend analytics show three mid-tier suppliers, not one strategic partner. That’s $32,000 per year on a single supplier relationship — and most organizations have dozens of fragmented vendor relationships.
Audit findings are the compliance cost. SOX compliance, internal audit, and external auditors all flag duplicate vendor records as a control deficiency. Each finding requires remediation: documenting the duplicates, merging or deactivating records, proving that duplicate payments have been identified and recovered, and demonstrating that controls are in place to prevent recurrence. This isn’t optional — it’s a finding that goes into the audit report and may escalate to a material weakness if the dollar amounts are significant.
Why vendor records drift: acquisitions, name changes, multi-system ERP
Vendor duplicates don’t appear because someone is careless. They accumulate through entirely rational processes that just happen to produce bad data.
Supplier acquisitions and name changes. A supplier you’ve worked with for ten years gets acquired. Their legal name changes from “Pacific Components LLC” to “Integrated Micro Solutions — Pacific Division.” Your accounts payable team creates a new vendor record because the supplier’s remittance paperwork now shows the new name. The old record stays active because open POs still reference it. Six months later, both records are receiving invoices.
Company mergers and ERP consolidation. Your company acquires a competitor. Both companies used SAP, but with different vendor master configurations. The combined entity now has two vendor records for every supplier that served both companies — and for the 40% of suppliers that overlap, that’s a duplicate that no one systematically identifies until someone asks why the combined spend reports don’t add up.
Decentralized purchasing. The marketing team needs a print vendor. They submit a vendor creation request. Procurement creates the record. Meanwhile, the events team already works with the same print vendor under a different name variation — “FastPrint Solutions” vs. “Fast Print Solutions, Inc.” Both records are technically correct. Both are active. Nobody realizes they’re the same company until an invoice from one shows up on the other team’s cost center.
| Source system | Vendor name | Vendor ID | Tax ID (EIN) | Address | Status |
|---|---|---|---|---|---|
| ERP - East | Grainger Industrial Supply | V-10042 | 36-1150280 | 100 Grainger Pkwy, Lake Forest IL | Active |
| ERP - West | W.W. Grainger, Inc. | VND-8819 | 36-1150280 | PO Box 8428, Chicago IL 60680 | Active |
| Procurement portal | Grainger | SUP-00312 | — | 100 Grainger Parkway, Lake Forest, IL 60045 | Active |
Same company (same EIN), three different records with different names, IDs, and addresses. Only two of three records even have a tax ID.
Abbreviation and formatting inconsistency. This is the most mundane cause and the most pervasive. “Johnson & Johnson” vs. “Johnson and Johnson.” “3M Company” vs. “3M Co.” vs. “Minnesota Mining & Manufacturing.” “IBM” vs. “International Business Machines Corporation.” Without enforced naming conventions at the point of vendor creation, every person who creates a vendor record makes their own formatting choices. The ERP doesn’t know that “J&J” and “Johnson & Johnson” are the same entity — it just sees two different strings.
The matching fields: company name, tax ID, address, bank account, contact
Effective vendor deduplication requires matching on multiple fields because no single field is universally reliable.
Tax ID (EIN/TIN) is the strongest single identifier. Every US business entity has a unique Employer Identification Number issued by the IRS. If two vendor records share the same EIN, they are definitively the same legal entity — not “probably,” but definitely. The problem is that EIN is frequently missing. Many vendor records are created without collecting tax ID, especially for smaller or international suppliers. In a typical vendor master, 15-30% of records lack a tax ID.
Company name is available on every record but is noisy. The same company can appear under its legal name, its trade name (DBA), its parent company name, or an informal abbreviation. Fuzzy matching on company names catches many duplicates, but it also generates false positives — “Pacific Electric Supply” and “Pacific Electrical Supply” might be the same company, or they might be two different companies in the same industry.
Address is a strong supporting signal. Two vendor records with similar names and the same street address are very likely the same entity. But vendors have multiple addresses — headquarters, billing, warehouse, branch offices — so an address mismatch doesn’t rule out a match.
Bank account number is an underutilized matching field. If two vendor records share the same bank routing and account number, they are receiving payments at the same financial destination. This is a near-definitive match signal, and it catches cases where the company name and address differ (e.g., a DBA or subsidiary that uses the parent’s bank account).
Contact email domain provides a lightweight check. If two vendor records share the same email domain (e.g., both have contacts at @grainger.com), it’s a signal worth investigating even if the names don’t match perfectly.
Handling company name variations: Inc vs LLC, abbreviations, DBAs
Company names are the most visible matching field but also the most unreliable. A practical normalization pipeline handles the most common variation patterns before matching.
Legal suffix removal. Strip or standardize legal entity suffixes: Inc., Incorporated, Corp., Corporation, LLC, L.L.C., Ltd., Limited, Co., Company, LP, LLP. These suffixes have legal meaning but no discriminating value for matching. “Acme Supply, Inc.” and “Acme Supply LLC” are likely the same operational entity, possibly after a restructuring.
Punctuation and whitespace normalization. Remove periods, commas, and extra whitespace. Standardize ampersands: & becomes and. Hyphens in compound names should be normalized: “Hewlett-Packard” and “Hewlett Packard” should compare as equivalent.
Common abbreviation expansion. Expand industry-standard abbreviations: “Intl” to “International,” “Mfg” to “Manufacturing,” “Svcs” to “Services,” “Natl” to “National,” “Assoc” to “Associates.” This reduces surface-level variation without changing meaning.
The/A prefix removal. “The Home Depot” and “Home Depot” are the same entity. Leading articles should be stripped before comparison.
| Raw vendor name | After normalization | Match group |
|---|---|---|
| W.W. Grainger, Inc. | ww grainger | A |
| Grainger Industrial Supply | grainger industrial supply | A |
| Grainger | grainger | A |
| Johnson & Johnson | johnson and johnson | B |
| Johnson and Johnson Consumer Inc. | johnson and johnson consumer | B |
| J&J Consumer Health | j and j consumer health | B |
| The Sherwin-Williams Company | sherwin williams | C |
| Sherwin Williams Co. | sherwin williams | C |
| 3M Company | 3m | D |
| Minnesota Mining & Manufacturing | minnesota mining and manufacturing | D |
Normalization resolves many variations. But '3M' and 'Minnesota Mining and Manufacturing' require a known-alias lookup or AI embeddings to connect.
After normalization, basic fuzzy matching (Jaro-Winkler or Levenshtein distance) catches most remaining variation. But certain cases — like “3M” and “Minnesota Mining & Manufacturing” or “IBM” and “International Business Machines” — require either a known-alias lookup table or AI embedding similarity, which recognizes that these names refer to the same entity based on semantic context rather than string similarity.
A practical deduplication workflow for accounts payable
Here’s a step-by-step workflow that accounts payable and procurement teams can run quarterly — or as a one-time cleanup before an ERP migration.
Step 1: Export the vendor master. Pull all active vendor records with these fields: vendor ID, company name, tax ID (EIN/TIN), primary address, bank routing number, bank account number, primary contact email, and creation date. Don’t include inactive or archived records unless you suspect they were deactivated as part of a prior half-completed dedup effort.
Step 2: Normalize company names. Apply the suffix, punctuation, abbreviation, and prefix rules described above. This is preprocessing — it doesn’t change your source data, it just creates a clean comparison version.
Step 3: Block on tax ID. Any records sharing the same EIN are definitive matches. Flag these immediately. This typically resolves 20-30% of all duplicates with 100% precision.
Step 4: Match remaining records on name + address. For records without tax IDs (or with unique tax IDs), run fuzzy matching on normalized company name. For pairs scoring above 0.85 similarity, compare addresses as a confirmation signal. This catches the abbreviation and formatting variants.
Step 5: Check bank account matches. Compare bank routing + account numbers across all remaining records. Any matches here are strong duplicate signals regardless of what the name or address says.
Step 6: Review and merge. The output is a list of candidate duplicate groups, each with a confidence score. High-confidence groups (tax ID match, or name + address + bank account match) can be auto-merged or auto-flagged. Lower-confidence groups (name-only match with different addresses) go to a review queue for a human decision.
Step 7: Deactivate duplicates. For each confirmed duplicate group, designate one record as the primary and deactivate the others. Remap all open POs, pending invoices, and payment history to the surviving record. This is the step most teams skip — and it’s why duplicates keep accumulating.
Maintaining a clean vendor master going forward
Deduplication is a point-in-time cleanup. Without process changes, duplicates will reappear. The recurrence rate for organizations that clean up without changing their onboarding process is typically 60-70% within 18 months — meaning most of the duplicates you removed will be recreated.
Real-time duplicate detection at onboarding. The highest-leverage control is checking for duplicates before creating a new vendor record. When a requestor submits a new vendor, the system should automatically search for matching names, tax IDs, and addresses in the existing master. If potential matches are found, the requestor reviews them before proceeding. This catches 80-90% of would-be duplicates at the point of creation.
Require tax ID for all new vendors. Make EIN/TIN a mandatory field for domestic vendors. This single policy change gives you a deterministic matching key for every future vendor and eliminates the ambiguity that causes most name-based duplicates.
Standardize naming conventions. Publish and enforce rules: always use the legal entity name as it appears on the W-9, always include the legal suffix, never abbreviate unless the company’s official name uses an abbreviation. “3M” is fine because that’s the legal name. “Intl” instead of “International” is not.
Run quarterly batch deduplication. Even with onboarding controls, duplicates will slip through — vendor name changes, system migrations, manual overrides. A quarterly dedup run catches what the real-time check missed. Treat it like a financial close process: scheduled, documented, and reviewed by a data steward.
Monitor spend concentration. Track the number of unique vendor records per category and flag when the count increases faster than the actual number of new supplier relationships. If your office supplies category goes from 12 vendors to 18 in a quarter but you didn’t onboard 6 new suppliers, you probably created 6 duplicates.
Vendor deduplication pays for itself almost immediately — recovered overpayments, consolidated discounts, and clean audit reports. Upload your vendor master to Match Data Studio and identify duplicates across name variations, missing tax IDs, and multi-system formatting differences in a single matching run.
Keep reading
- Master data management explained — the broader discipline of maintaining a single version of truth across all business entities
- Address matching and standardization — handling the address variations that complicate vendor record comparison
- AI embeddings vs rule-based matching — when fuzzy string matching isn’t enough and you need semantic understanding