Businesses generate unprecedented volumes of data. They exploit less than 20%. The rest (logs, files, IoT data, transaction histories, customer interactions) sits on servers, costs thousands of euros in storage and represents a growing compliance risk. This massive waste is not inevitable. It is the symptom of an architecture designed to accumulate, not to understand.
The data era paradox: the more we collect, the less we understand
The world now generates 200 zettabytes of data per year. It is a volume that is hard to grasp, yet businesses are the primary contributors. Every ERP, every CRM, every IoT sensor, every collaboration tool produces a continuous flow of information. But this abundance creates a paradox: the more we collect, the less we are able to exploit.
According to IBM, around 80% of enterprise data is « dark data », meaning data that is collected, stored, but never used for any analysis or decision (source: IBM). This figure is corroborated by a Splunk survey of over 1,300 IT leaders, where 60% state that more than half of their data goes unused (source: FirstEigen / Splunk). In a third of organizations, the proportion of unidentified data exceeds 75% (source: Cogent). And the phenomenon is accelerating: the volume of unused data grows by 20% per year, driven by IoT, generative AI and the multiplication of sources (source: DataStackHub).
80% of enterprise data is never exploited (IBM)
60 zettabytes of global storage occupied by dormant data (DataStackHub)
52% of the average enterprise storage budget is spent on unused data (FirstEigen / Veritas)
Why so much data remains in the shadows
If businesses fail to exploit their data, it is not for lack of willingness. It is a structural problem, rooted in the way data architectures have been designed over the past twenty years.
First obstacle: source fragmentation. A typical mid-sized company uses between 10 and 50 different systems: ERP, CRM, business tools, spreadsheets, databases, APIs, IoT feeds. Each system speaks its own language, in its own format. According to DataStackHub, 70% of organizations suffer from fragmentation that prevents any unified source of truth (source: DataStackHub). Without connections between systems, data remains trapped in its silos.
Second obstacle: the technical barrier. Traditional analytics platforms require specialized skills: SQL, Python, data warehouse administration. Only data teams can access them. Business users, who best understand the value of information, are left out. The result: business questions go unanswered, or take weeks to process.
Third obstacle: the ETL model itself. To analyze data in a traditional architecture, you must first extract it, transform it, then load it into a warehouse. This process is slow, expensive and rigid. Every new analytical need requires a new pipeline. Data teams spend most of their time preparing data, not analyzing it.
Fourth obstacle: the cost of exploration. Exploring unknown data is expensive when every query involves moving it, duplicating it and mobilizing computing power. Businesses therefore focus on the 20% of data already identified and let the remaining 80% accumulate in the shadows.
What unused data actually costs
Dark data is not neutral. Even dormant, it costs money, consumes energy and creates risk.
A direct financial cost. Unused data occupies approximately 60 zettabytes of global storage. At the enterprise level, every terabyte stored without being used generates cloud storage fees, backup costs, maintenance and security expenses. Veritas estimates that 52% of the average storage budget is spent on dark data (source: FirstEigen / Veritas). Globally, this represents hundreds of billions of dollars wasted every year.
A growing regulatory risk. Dark data frequently contains unidentified personal information, and therefore information not protected in compliance with GDPR, DORA or NIS2. According to DataStackHub, 26% of data breaches in 2025 originated from forgotten or unprotected storage (source: DataStackHub). Ignored data is not data without consequences.
A massive environmental cost. Data centers consume between 2 and 3% of global electricity, with projections reaching 8% by 2030. Storing useless data ties up servers, cooling and energy for nothing. According to Wikipedia citing the New York Times, 90% of the energy used by data centers is wasted (source: Wikipedia / Dark Data). At a time when digital sobriety is becoming a governance issue, maintaining mountains of dark data is hard to justify.
A security risk. What you cannot see, you cannot protect. And what you do not protect ends up being exploited by others. IBM points out that dark data creates blind spots in cybersecurity, as it often escapes the protection protocols applied to critical data (source: IBM).
Dark data is not waste: it is an untapped resource
The paradox of dark data is that it is not inherently devoid of value. It is simply inaccessible with current tools. Most of it is heterogeneous, multi-format, unstructured, exactly the type of data that traditional architectures cannot process.
Yet this data holds decisive insights. Server logs reveal usage patterns. Support tickets hide weak signals about customer satisfaction. Archived IoT data enables predictive maintenance trend detection. Uncrossed transaction histories contain cross-selling opportunities.
McKinsey has documented it: the most advanced organizations in data exploitation report a contribution of over 20% to their operating income (EBIT) (source: McKinsey). Data-driven companies are also 23 times more likely to acquire new customers and 19 times more likely to be profitable (source: McKinsey Global Institute). The question is not whether this data has value, but how to access it without reproducing the same pattern that made it invisible.
And this is where the architecture problem becomes central again. As long as exploiting data requires moving it to a warehouse, transforming it via an ETL pipeline and mobilizing a technical team, the cost of exploring dark data will remain prohibitive. The model must change.
Changing the model: exploit without centralizing
The solution to the dark data problem is not a better data lake or yet another metadata catalog. These tools are useful, but they do not address the root cause. The real lever is architectural: you need to be able to query data where it resides, without moving it.
Processing at the source eliminates the main barrier that keeps data in the shadows. Instead of requiring a pipeline for every analytical question, intelligence goes directly to where the data lives. No copies, no intermediate storage, no heavy pre-transformation. The marginal cost of exploring a new source drops to near zero.
This is precisely the approach that iD4Connect delivers. The decision-making middleware connects directly to over 70 types of sources (databases, APIs, files, IoT sensors, real-time feeds) and produces analyses without ever duplicating data.
DataCells are autonomous processing units that execute analytical operations directly at the source. Each DataCell operates independently, without requiring duplication.
The DataGraph maps the logical and business relationships between different data sources. It visualizes links between heterogeneous systems without centralizing them, a 360-degree view that reveals previously invisible data.
No-code, business-oriented modeling puts this exploration capability in the hands of functional teams, not just data engineers. This is a fundamental shift: when business users can query data directly, without a Jira ticket or a 6-week pipeline, dark data has no reason to remain in the shadows.
The value lies in understanding, not accumulation
The world does not lack data. It lacks the ability to understand it. The businesses that will come out ahead in 2026 will not be those that store the most, but those that exploit the best, and the fastest.
When 80% of your information assets lie dormant in the shadows, the question is not whether to act. It is how long you can afford not to.
The value is no longer in the data. It is in the ability to understand and exploit it where it resides.
Discover how iD4Connect reveals the hidden value of your data →