Businesses invested $40 billion in AI in 2025. The result: 95% of them have no measurable return on investment. The culprit is not the models, the GPUs, or the vendors. It is the data itself: its quality, its governance, its freshness. As long as businesses feed their AI engines with fragmented, duplicated and ungoverned data, artificial intelligence will remain an expensive promise.
The $40 billion paradox
The figure is stark. According to data compiled by HPCWire and confirmed by multiple analysts, 95% of organizations have no measurable return on investment on their AI pilot projects (source: MIT, The GenAI Divide: State of AI in Business 2025). Not « insufficient. » Not « being measured. » Not measurable.
Meanwhile, McKinsey estimates the cost of global AI infrastructure at $7 trillion by 2030 (source: McKinsey). Seven trillion dollars. For projects that almost universally generate no demonstrable value today.
The question is no longer « should we invest in AI? » It is now: why does AI fail when the data feeding it is broken?
The real bottleneck: data, not the model
Analyses converge on a precise diagnosis. The main blocker for AI projects is not technological. It is a deficit in data quality and governance.
Three figures illustrate this reality better than any speech:
89% of executives say they only trust AI if the underlying data is verified and reliable (source: BigDATAwire / Harris Poll). Trust in data has become the number one criterion for AI adoption in the enterprise.
79% of security teams struggle to classify sensitive data for AI projects (source: Bedrock Security, 2025 Enterprise Data Security Confidence Index). In other words, four out of five security teams do not even know where critical data resides before injecting it into a model.
55% of teams are forced to manually correct AI outputs (source: BigDATAwire / Harris Poll). More than half of the work produced by artificial intelligence has to be redone by hand, which cancels out the promised productivity gains.
The pattern is the same across all industries: businesses buy a powerful AI model, connect it to fragmented, outdated or ungoverned data, then wonder why the results are unusable. AI is not broken. It is starving.
Why traditional architectures fail
To understand the systemic failure, you need to look at how data reaches AI models today in most organizations.
The classic pipeline follows a well-known sequence: extract data from source systems, load it into a warehouse or data lake, transform it in batches, then expose it to analytics tools or AI models. Each step introduces latency, duplication and quality degradation risks.
This model creates four structural problems that AI technology cannot compensate for.
Latency. Batch-processed data is hours or even days behind. A fraud detection model fed with yesterday’s data is structurally blind to ongoing attacks.
Duplication. Every data copy creates potential drift. When the same customer exists in three different versions across CRM, ERP and data lake, the AI model works on a fictional reality.
Post-hoc governance. Quality and compliance rules are applied after storage, not during transit. Incorrect data is already in the pipeline when the problem is discovered.
Exposure surface. Every staging zone, every intermediate copy multiplies potential access points for an attacker or a non-compliant regulatory audit. In a GDPR/DORA/NIS2 context, every copy is a legal risk.
The SYNAPS-I system from the U.S. Department of Energy recently demonstrated that another approach is possible (source: Argonne National Laboratory / DOE): AI integrated in a closed loop with measurement instruments, making decisions during the experiment itself, not in post-processing. This paradigm shift, processing data in real time without intermediate storage, validates an intuition that industry is beginning to adopt.
Governance by design: solving the problem at the source
If the problem is data quality upstream of models, the solution is not to add a validation layer after the fact. It is to govern data during transit, before it reaches the model.
This is precisely what a stateless data orchestration architecture enables. The principle: data is never stored in an intermediate zone. It is processed, transformed, validated and enriched during transit, then delivered directly to consumers: BI tools, AI models, dashboards, alerts. This is the approach that iD4Connect has championed from the start.
This approach structurally resolves the four obstacles identified.
Latency eliminated. Processing happens in real time, during data movement, not after it lands in a warehouse.
Zero duplication. No intermediate copies, no staging zones. Data stays at its source, insights are produced in transit by DataCells.
Native governance. Quality rules, GDPR/DORA/NIS2 compliance, anonymization and traceability are applied during transit, not after.
Exposure surface reduced to zero. No storage means no data to steal, no copies to audit, no legal risk tied to non-compliant materialization.
AI then receives fresh, contextualized and compliant data. Exactly what it needs to produce reliable results. Governance is not a brake on AI. It is its operating condition.
What the major players are doing, and what they cannot do
Snowflake and Databricks have clearly understood the stakes. Both platforms are converging rapidly toward integrated data + AI + governance solutions.
Snowflake has just launched Project SnowWork (source: Snowflake), an agentic control plane designed to coordinate intelligence, data and actions across SaaS applications. Databricks has acquired two startups specializing in data security and launched Lakewatch, an agentic SIEM on lakehouse (source: Constellation Research). Both players have simultaneously made Apache Iceberg v3 generally available, confirming the market’s convergence toward open formats.
But this race to « all-in-one » hits a fundamental architectural limit: these platforms rely on centralized data storage. The data lake, the data warehouse, the lakehouse: whatever the name, the principle is the same. Data is moved, copied and stored before being processed (see iD4Connect’s positioning).
This model creates three problems that software sophistication cannot solve.
Data residency. Data leaves its source to be centralized in a cloud, often American, subject to the Cloud Act. Even with European regions, the operator’s jurisdiction prevails.
Structural latency. Critical real-time use cases (fraud detection, industrial control, security alerts) require response times below 100 ms. The round trip to a centralized cloud does not allow this.
Infrastructure cost. Storing, indexing and governing data copies is expensive. With a global investment projected at $7 trillion, decision-makers will favor solutions that are less infrastructure-intensive.
A revealing fact: Snowflake has just launched a feature called « Resharing, » the on-the-fly transformation of shared data without local materialization, presenting it as an innovation. This is precisely what a stateless architecture has done natively since its inception.
The regulatory context accelerates urgency
European regulatory pressure only reinforces this observation. In April 2026, the CNIL has already received 739 reports related to municipal elections and initiated 4 investigations with sanction procedures (source: CNIL). Priority controls for 2026 target recruitment, electoral data, sports federations and cybersecurity.
Meanwhile, the AI Act is entering its operational support phase, Gaia-X is moving to its operational phase with sovereign data spaces, and the DGFiP generated 2.8 billion euros through data exploration and AI in 2025 (source: economie.gouv.fr), illustrating the rise of data use in the public sector.
The message for businesses is clear: compliance is no longer a cost, it is a technology selection criterion. Solutions whose compliance is native, because they never store data outside its source, have a structural advantage that centralized cloud platforms cannot replicate, even by adding governance layers after the fact (see iD4Connect architecture).
Four questions to ask about your next AI project
Before launching or continuing an AI investment, four questions are worth asking.
1. How fresh is the data feeding your model?
If your data is more than a few minutes behind, your model is working on an outdated reality. Real time is not a luxury. It is the minimum condition for AI to produce usable results.
2. How many copies of your critical data exist in your pipeline?
Every copy is a source of divergence, an infrastructure cost and a GDPR risk. If the answer is « more than one, » your architecture generates inconsistency by construction.
3. Is your governance native or bolted on?
Compliance implemented after deployment depends on the vendor’s goodwill. Governance by design is a property of the architecture itself. It cannot be removed, bypassed or forgotten.
4. Does your team manually correct AI outputs?
If so, the problem is not the model. It is the input data. Investing in a more powerful model will solve nothing. Garbage in, garbage out, regardless of GPU size.
The return on investment of AI does not depend on the model chosen. It depends on the ability to feed that model with fresh, governed and compliant data, in real time, without moving or duplicating it. The 95% failure rate is not inevitable. It is the symptom of an obsolete architecture. The solution exists. It is called stateless orchestration.