The 150,000 Company Problem

A BD professional opens the CRM to identify targets for the week ahead.

Filters by sector, sorts by employee count. Nothing stands out. Filters again by companies with no recent activity. Still searching. Tries revenue estimate plus last meeting date. Scans results looking for something worth pursuing.

After running multiple filter combinations, the list shows 20 companies to research. Half will be wrong size, wrong end-market, or already contacted by someone who left the firm.

The database holds 150,000 companies. Identifying which ones matter should not require this process.

Most PE firms pursuing buy-and-build strategies face this challenge: substantial CRM coverage without effective prioritization.

How Databases Lose Structure

Database growth follows a predictable path. A firm defines a thesis. Initial research generates a few thousand targets. The BD team adds contacts and firmographics. Coverage expands.

Then accumulation creates noise.

Portfolio companies submit add-on candidates, relevant and tangential mixed together. Conference lists add company names without context. Database vendors provide comprehensive coverage with no priority indicators. Old investment themes persist after the firm moves on.

High-priority companies sit in the same pipeline stages as businesses reviewed once years ago. Subsector tagging remains incomplete. Revenue estimates carry too much error for decisions. Accumulated data dilutes the original thesis specificity.

This is preventable. Only qualified companies should enter the CRM. If a business does not meet minimum criteria, it does not go in. Rigor at point of entry eliminates the need for cleanup later.

Most firms do not maintain this discipline. Two to three years in, databases often reach 100,000+ companies. At this scale, identifying priority targets requires running filter combinations and hoping something surfaces.

Three questions reveal whether prioritization has become the constraint:

  • Can your BD team identify the top 50 targets in under 10 minutes?

  • Do different team members produce similar target lists?

  • Has conversion improved as the database has grown?

If the answer to any is no, infrastructure is the problem, not coverage.

Why Sorting Fails

The problem starts before the CRM. Third-party databases—ZoomInfo, Grata, SourceScrub—provide what they can capture at scale: employee counts, estimated revenue, standard industry tags, web presence signals, location data. These platforms were built to make private companies searchable, not to enable investment prioritization.

That limited data gets imported into your CRM. Your CRM becomes a graveyard of companies characterized by metrics that were never designed to answer the prioritization question.

Employee count tells you how many people work there. It does not tell you what those people do, who they serve, or how the business generates revenue. An HVAC contractor might have a workforce concentrated in field technicians serving commercial property managers or split between sales, dispatch, and residential service calls. Same headcount range. Completely different business model. Headcount alone reveals nothing about institutional fit.

Revenue estimates are almost always wrong. Even when accurate, they reveal only directional scale, not the composition of that revenue. Customer mix, service mix, margin structure—all invisible. Two businesses with identical revenue estimates might operate in entirely different end-markets with fundamentally different competitive dynamics and PE fit. Estimates cannot distinguish between them.

Activity dates show when someone last touched the record. They do not explain why contact stopped or whether renewed outreach makes sense. A company untouched for six months might be a deprioritized poor fit or a strong prospect that fell through gaps during team transition. Timestamps provide no context.

Pipeline stages reflect past workflow decisions. They do not adapt when investment criteria evolve or team composition changes. A business marked "contacted" 18 months ago by someone no longer at the firm sits in that stage indefinitely unless manually updated.

Web presence signals market visibility. It does not predict alignment with investment thesis. High online activity might indicate wrong customer segment, wrong geography, or wrong business model despite appearing active and growing.

Standard fields answer "how many," "how big," and "how visible." They do not answer "what do they do," "who do they serve," and "does this fit our thesis." You are asking the database questions it cannot answer because the data was never built to answer them.

What We Learned From 150,000 Companies

We recently addressed this for a PE firm with 150,000 companies accumulated over the firm's history. All fell within their broad SaaS investment criteria. None were effectively prioritized.

Our work began with analysis. Over three weeks, we examined what each company actually does operationally, not just industry classification.

Roughly 55,000 companies did not belong. Many were not pure SaaS businesses but had significant services components that changed their economics. A substantial portion were already PE or VC-backed, making them unavailable. Others were wrong size, too small to meet criteria or too large to pursue. Some had gone out of business. End-market conflicts surfaced frequently: businesses classified as healthcare SaaS but serving consumer markets rather than institutional buyers.

Removing obvious misfits was straightforward. Deeper insight came from analyzing companies that remained.

Businesses that looked identical in the CRM (same employee count, similar revenue estimates, same industry codes) operated in completely different realities.

One healthcare SaaS business served hospitals with enterprise contracts and multi-year commitments. Another served individual practitioners with monthly subscriptions and high churn. Same data profile. Entirely different business quality, customer economics, and investment fit. This pattern repeated across subsectors.

Databases cannot distinguish between these companies because they lack the operational data to see the difference. No amount of improved sorting solves this. Information simply is not there.

Building What Actually Works

For the remaining companies, we used our proprietary data system to layer operational intelligence that standard CRM fields cannot capture.

Building this infrastructure required solving technical challenges most firms underestimate. Each data source has unique quality issues, false positive patterns, and reliability constraints. We learned what works through trial and error across dozens of sectors, understanding which signals actually predict conversion and which create false confidence.

For SaaS businesses, institutional scale and fit appear through signals like customer acquisition velocity, retention cohorts, end-market concentration, technical architecture complexity. These predict quality but do not exist in employee count or revenue estimate fields.

We built three scoring dimensions:

  • Growth score based on company expansion.

  • Size score for operational scale.

  • Relevancy score for thesis alignment.

Each company received subsector classification, customer sentiment signals, customer profile analysis, verified employee data, sector-specific operational proxies, and a searchable description. BD professionals can now search by keyword and surface every relevant company regardless of how it was originally tagged.

They select theme, choose the relevant score, and see ranked targets immediately. Sort by growth for high-velocity pursuit, by size for EBITDA targeting, by relevancy for optimal alignment.

Reports now answer "who should we call Monday morning" with confidence. Different team members produce consistent target lists because the system ranks on shared operational signals rather than individual interpretation of incomplete data.

This extends beyond SaaS. Service businesses reveal institutional scale through vehicle density, facility footprint, end-market concentration, and proprietary methods we have developed. Our approach adapts to sector economics while solving the same challenge.

Why This Requires Different Thinking

Most firms attempt to solve prioritization through better CRM discipline. Improved tagging protocols. Stricter stage management. Additional mandatory fields. These create cleaner records without enabling prioritization because they do not change what data exists in the system.

Data quality is not the constraint. Data type is. Employee counts and revenue estimates cannot support prioritization decisions regardless of how well-maintained those fields become. Effective prioritization requires operational data that reveals business model, customer composition, and market position.

Some firms build this capability internally. Others partner with specialized providers. Both paths require recognizing this as infrastructure development rather than process improvement. Work involves identifying what predicts quality in each sector, sourcing data that reveals those characteristics, and building scoring methodologies that translate operational signals into actionable rankings.

This is not purchased software or hired headcount. It is learned capability developed through applying data science to origination, understanding which sources work and which mislead, and building systems that compound effectiveness over time.

The Competitive Reality

AI makes adding companies to databases easier while making verification harder. Without qualification discipline at point of entry, databases will accumulate faster than teams can action them. This challenge intensifies.

Firms solving this now establish competitive separation. Over the next two to three years, more investors will recognize data infrastructure as requirement rather than advantage. Baseline keeps rising.

But cleanup is not the real game. Prevention is. Firms that win long-term are not the ones that get better at sorting 150,000 companies. They are the ones that never accumulate 150,000 companies in the first place. They establish qualification standards at entry. They build infrastructure that maintains signal clarity as coverage grows. They recognize that a focused, well-characterized database outperforms a comprehensive, poorly understood one.

Speed becomes decisive in competitive markets. When multiple firms pursue identical sectors, the firm that identifies and engages the right targets fastest captures disproportionate outcomes. This requires infrastructure that answers "who should we call Monday morning" without running filter combinations.

Coverage and actionability will continue diverging. Firms addressing this through infrastructure rather than process refinement establish advantages that compound over time. Those still attempting to sort their way out of the problem will keep running reports while their competitors identify what matters.

The Path Forward

A BD professional now opens the CRM Monday morning. Selects theme. Chooses growth score. Twenty ranked targets appear, each one operationally qualified and thesis-aligned. Research begins on companies that matter. No filter combinations. No manual scanning. No hoping something surfaces.

Start with the diagnostic. If your team cannot consistently identify top targets in under 10 minutes, infrastructure is the problem, not effort.

Previous
Previous

The First Five Minutes

Next
Next

How to Know What Inning You're In (And What to Do About It)