You know the moment. The LOI is signed. The diligence team sends over their first request list. And somewhere in your organization, someone opens a spreadsheet and starts sweating.
Data diligence is where deals get confirmed or killed. Not in the pitch deck. Not in the management presentation. In the room where someone asks “can you show me monthly revenue by customer for the last 36 months” and your team either pulls it up in ten minutes or disappears for a week.
I have spent twenty years watching this play out. At Fortune 100 scale and at mid-market companies where the data team is one person who also manages IT. The pattern is always the same. The companies that prepare their data before diligence close faster, at higher multiples, with fewer surprises. The ones that don’t leave money on the table.
This guide covers what buyers actually test, what good answers look like, and what you can do about it before the clock starts.
What data diligence actually is
Data diligence is not an IT audit. It is not someone checking whether your servers are patched or your passwords are strong enough.
It is a buyer testing whether your numbers are real, whether they are repeatable, and whether the business can actually produce the reporting it claims to produce. They want to know if the story in the CIM holds up when someone pulls the thread.
Think of it this way. Financial diligence asks “are these the right numbers?” Data diligence asks “can this company reliably produce numbers at all?”
That distinction matters. A company can have clean financials that were assembled through heroic manual effort. That works once. It does not work when the buyer needs to report to their investors every quarter after close.
Why it matters more now than five years ago
Three things changed.
Buyers got smarter. After enough deals where data problems showed up post-close, PE firms started asking harder questions earlier. The diligence request lists from 2020 and 2026 are different documents.
AI raised the stakes. Every buyer has an AI thesis now. They want to know whether your data can support it. If your customer data lives in three different systems with no common key, that AI thesis dies on contact with reality.
Multiples compressed. When money was cheap, buyers could afford to fix data problems after close. When every turn matters, they price the risk in or walk away.
The 15 questions buyers actually ask
These are not hypothetical. These are questions I have seen on actual diligence request lists. For each one, I will tell you what a good answer looks like and what raises a red flag.
1. Can you provide monthly revenue by customer for the last 36 months?
Good answer: “Yes. Here is the export from our ERP. It reconciles to our GL within 0.5%.”
Red flag: “We can get you annual. Monthly will take some work.” This tells the buyer your revenue data is aggregated manually and likely contains allocation assumptions they cannot verify.
2. How many active customers do you have? How do you define “active”?
Good answer: “1,247 customers with a transaction in the last 12 months. Here is our definition, and here is the query that produces it.”
Red flag: “It depends on how you count it.” If your sales team, finance team, and operations team each give a different number, the buyer knows there is no single source of truth.
3. What is your customer retention rate and how is it calculated?
Good answer: “Logo retention is 91%. Revenue retention is 108%. Here is the cohort analysis by year.”
Red flag: “We track it in a spreadsheet that marketing maintains.” Retention is one of the most important value drivers in a deal. If it is not in a system with a clear methodology, the buyer will question every growth assumption in the model.
4. Can you reconcile your CRM pipeline to closed revenue?
Good answer: “Yes. Here are the last four quarters showing pipeline-to-close conversion by stage, with the methodology documented.”
Red flag: “The sales team doesn’t always update the CRM.” This is more common than you think. And it tells the buyer the revenue forecast is based on gut feel, not data.
5. How do you track and report EBITDA adjustments?
Good answer: “Here is our adjustment schedule with supporting documentation for each line item. Our controller maintains it monthly.”
Red flag: “Our accountant handles that at year end.” Year-end adjustments without monthly tracking means the buyer cannot see the trend. They will assume the worst.
6. Can you show unit economics by product line or service?
Good answer: “Here is contribution margin by product line for the last eight quarters, including allocated costs and the allocation methodology.”
Red flag: “We price based on market rate and overall margins are good.” This tells the buyer you do not know which products make money and which ones are subsidized. That is a pricing risk they will price into the deal.
7. Where does your data live? How many systems?
Good answer: “Primary systems are NetSuite for finance, HubSpot for CRM, and our proprietary platform for operations. Here is the data flow diagram showing how they connect.”
Red flag: “We have a lot of spreadsheets that tie things together.” Spreadsheets as integration layers mean manual processes, key person risk, and no audit trail.
8. Who builds your reports? What happens if they leave?
Good answer: “Our FP&A team uses documented templates in our BI tool. Three people can produce the board deck independently.”
Red flag: “Dave does that.” Key person risk in reporting is one of the fastest ways to make a buyer nervous. If Dave gets hit by a bus, can the company still tell its story?
9. How quickly can you produce a flash report after month end?
Good answer: “Five business days for a full P&L with KPIs. Preliminary flash in two days.”
Red flag: “Usually three to four weeks, sometimes longer.” Slow close processes signal manual work, reconciliation problems, or both. Buyers want to see operational discipline.
10. Can you segment revenue by new vs. existing customers?
Good answer: “Yes. Here is the breakdown by quarter for the last three years, showing new logo revenue, expansion revenue, and churn.”
Red flag: “We would need to build that analysis.” If you cannot segment revenue by customer type, the buyer cannot validate the growth story. New logo acquisition and expansion are valued very differently.
11. What does your data quality look like? Do you measure it?
Good answer: “We run automated quality checks on key fields monthly. Here are the results for the last six months. Customer address completeness is 94%, email validity is 97%.”
Red flag: “We clean things up when we notice issues.” Reactive data quality means the buyer has no idea what they are inheriting. They will assume it is worse than you think.
12. How do you handle data from acquisitions?
Good answer: “We have a documented integration playbook. Acquired company data is mapped to our schema within 90 days. Here is the status of each acquisition.”
Red flag: “We are still working on integrating the last one.” If you are a buy-and-build and your acquisitions are not integrated, the buyer is buying multiple companies that happen to share a holding company. They will price that accordingly.
13. Can you show customer acquisition cost by channel?
Good answer: “Here is CAC by channel for the last eight quarters, with the methodology for cost allocation documented.”
Red flag: “Marketing handles that, I would need to ask.” If CAC is not something the leadership team monitors regularly, the buyer questions whether growth is intentional or accidental.
14. What is your data architecture? Is it documented?
Good answer: “Here is our data flow diagram showing source systems, transformations, and reporting layers. It was last updated in Q4.”
Red flag: “It’s in people’s heads.” Undocumented architecture means undocumented risk. The buyer will hire someone to reverse-engineer it, and that cost gets priced into the deal.
15. Have you had any data security incidents? How do you handle PII?
Good answer: “No material incidents. Here is our data classification policy and our PII handling procedures. We completed a SOC 2 Type II last year.”
Red flag: “We follow best practices.” Without specifics, this answer means nothing. And if a data incident surfaces during diligence that you did not disclose, the trust damage goes well beyond the incident itself.
How to prepare before diligence starts
The time to prepare is not when the LOI arrives. It is months before. Here is what that preparation looks like in practice.
Start with the questions, not the systems
Print out this list. Sit down with your CFO and your head of data or IT. Try to answer each question right now. Time yourselves.
If you can answer all fifteen in under 48 hours with supporting data, you are in good shape. If several of them require “let me get back to you,” those are your priorities. For a hands-on version of this exercise, see The 48-Hour Test.
Fix the data, not the dashboard
The instinct is to build a pretty dashboard. Resist it. Buyers do not care about your dashboard. They care about whether the data behind it is real. Focus on data reconciliation, consistent definitions, and documented methodologies.
Document everything
The documentation does not need to be beautiful. It needs to exist. Data dictionaries, system diagrams, calculation methodologies, manual process steps. If it is in someone’s head, it is a risk. Get it on paper.
Run a mock diligence
Have someone outside your core team request data as if they were a buyer. See how long it takes. See what breaks. Fix those things. This is the single highest-ROI activity you can do in the six months before an exit. If you want a structured approach, the 7 Data Red Flags That Kill Deals is a good starting point.
Build the timeline backwards
If your target exit is 12 months out, work backwards from there. Months 1 through 3 are assessment. Months 4 through 8 are remediation. Months 9 through 12 are testing and documentation. For a more detailed breakdown, read How Long Does It Take to Fix Data Before Diligence?
The three levels of readiness
Not every company needs to be perfect. But every company needs to be honest about where they stand.
Level 1. Defensible
You can answer the 15 questions. The data exists. It might take a few days to pull together, but it reconciles and the methodology is documented. This is the minimum bar for a clean process.
Level 2. Efficient
You can answer the questions in hours, not days. Reports are automated. Definitions are consistent across systems. The data team can handle diligence requests without pulling everyone off their day jobs.
Level 3. Investor-grade
Your data is a selling point, not a risk. You have real-time dashboards, automated quality checks, and a documented data strategy that shows the buyer what is possible post-close. This level drives premium multiples.
Most mid-market companies are somewhere between Level 1 and Level 2. The goal is not perfection. The goal is to remove data as a reason for the buyer to discount the price or slow down the process.
Common mistakes
Waiting too long. Data cleanup is not a weekend project. Even simple reconciliation work takes months when you factor in discovery, fixes, testing, and documentation. Start early.
Fixing symptoms instead of causes. If your revenue numbers do not match across systems, the fix is not making them match in a spreadsheet. The fix is understanding why they diverge and addressing the root cause.
Hiding problems. Buyers will find them. Every time. It is better to know about an issue, have a plan to fix it, and disclose it proactively than to have it surface during diligence. Proactive disclosure builds trust. Discovery destroys it.
Over-investing in perfection. You do not need a data warehouse. You do not need a data lake. You need clean, reconciled, documented data that answers the questions buyers ask. Do not let a technology project delay your exit readiness.
Ignoring key person risk. If only one person knows how to produce your reporting, that is a diligence finding. Cross-train. Document. Build redundancy before someone asks about it.
What happens when you get it right
Companies that prepare their data for diligence close faster. The diligence phase takes weeks instead of months. There are fewer re-trades. The buyer has confidence in the numbers, which means they have confidence in the price.
I have seen prepared companies close 30 to 60 days faster than unprepared ones. At mid-market deal sizes, the carrying cost of those extra months (legal fees, management distraction, deal fatigue) adds up fast.
More importantly, clean data changes the negotiation dynamic. When a buyer trusts your numbers, they negotiate on strategy and growth. When they do not trust your numbers, they negotiate on risk. You want to be in the first conversation.
The bottom line
Data diligence is not going away. It is getting harder. Buyers are asking more questions, looking deeper, and pricing data risk more aggressively than they did five years ago.
The companies that treat data readiness as a strategic priority, not an IT project, will exit at better multiples with cleaner processes. The ones that wait until the LOI to start worrying about it will leave money on the table.
For a structured checklist approach to exit readiness, read PE Exit Readiness: The Data Checklist Most Teams Miss.
If you want a weekly brief on data diligence, exit readiness, and what actually works at mid-market PE-backed companies, subscribe to Inside the Data Room. One constraint, one framework, one free tool. Every week.