Legitimate data access will determine whether European science has a bright or bleak future

The world has become more complex, more dynamic and more interconnected than ever before. The challenges we face – from health to climate, from democratic resilience to economic transformation – are deeply intertwined. And we need new ideas to meet these challenges. 

Europe has never lacked intellectual ambition, but ideas alone aren’t enough. To make real progress, we need breakthrough discoveries. We need evidence of what works. And we need the institutional capacity to test, validate and scale solutions across borders and disciplines.

That’s where science comes in. Yet good science depends on data. And if we want AI to supercharge discovery and transform science, then data becomes even more important.

The ‘datafication’ of society

Digitalisation has led to an unprecedented datafication of society. When citizens engage with government services, visit a doctor, use a mobility platform, shop online or measure their steps and/or sleep through wearable devices, data are generated.

But this datafication doesn’t stop with individual behaviour. It extends deep into the productive fabric of our economies. Manufacturing systems, industrial supply chains, logistics networks, energy grids and robotic production lines are now embedded with sensors, connected devices and intelligent control systems. The implication is profound – data is no longer a by-product of digital services alone. It’s a structural feature of both our digital and physical infrastructures.

The remarkable feature of digital data isn’t merely its volume. It’s its reusability. When done responsibly, data created for one purpose can often be reused for entirely different objectives – including scientific research.

But there’s a fundamental constraint: access. Much of today’s most valuable data remains locked away in institutional stovepipes – within government agencies, universities and private companies. Despite its public value potential, it often remains inaccessible to scientists and public interest actors.

Europe has taken important steps to address this data asymmetry. Open data policies have expanded transparency. The Data Governance Act and the Data Act seek to facilitate data sharing and rebalance power in data markets. Article 40 of the Digital Services Act creates pathways for vetted researchers to access platform data. The European Open Science Cloud seeks to enable the sharing of scientific data. Sectoral data spaces – including those envisioned under the European Health Data Space – and Data Labs aim to provide structured, interoperable infrastructures for data access and use.

Yet instead of a steady expansion of access, we’re now witnessing a ‘data winter.’ Access to private sector data for research has declined in several domains. Open government data initiatives have slowed or been rolled back. Scientific datasets have become restricted or have disappeared. Open science has struggled to scale beyond pilot projects. And broader political retrenchment risks weakening some of the very infrastructures designed to enable responsible reuse.

Generative AI’s rapid expansion has also triggered backlash. Large-scale data scraping for AI training has blurred the line between openness and extraction. Consequently, institutions and content creators have become more protective, sometimes closing access altogether. And without reliable access to diverse, high-quality data, scientific progress risks stagnation.

What should Europe do? Three priorities stand out.

Access shouldn’t be only supply-driven

For too long, data policy has focused on releasing datasets without clearly articulating the questions they’re meant to answer. But the value of data – and increasingly the value of AI – depends directly on the value of the question.

In short, better questions define better discovery.

If we want to unlock meaningful access, we must invest in what might be called ‘question science’ – the systematic identification of high-priority societal questions; the structuring of those questions so they are researchable and actionable; the mapping of those questions to existing or potential data sources; and embedding them into funding frameworks, governance mandates, and institutional strategies.

When demand is vague, access debates remain abstract. When questions are clear, access becomes purposeful. Researchers, policymakers and data holders can align around concrete objectives. This requires structured, participatory processes that bring scientists, communities, funders and regulators together to define and prioritise the questions that matter most.

Strengthening legitimacy by going beyond consent

Traditional consent models are increasingly insufficient in an AI-driven world. Consent is typically individual, transactional and static. Yet data reuse for research and AI is collective and dynamic. The societal impacts of AI models trained on large-scale datasets cannot be reduced to a series of isolated, one-off agreements.

We must complement consent with social license – the process of building trust and legitimacy from ongoing community or stakeholder engagement and accepting how data is being reused. Social license involves understanding societal expectations and preferences; translating those expectations into governance agreements; embedding oversight, enforcement and redress mechanisms; and creating continuous feedback loops with affected communities.

If communities perceive data reuse as extractive – particularly with AI model training – trust erodes and access closes. Legitimacy must be actively maintained and cannot be assumed. 

Investing in human infrastructure

Legal frameworks alone don’t unlock data. Regulations can create rights and obligations, but they cannot by themselves broker trust, negotiate agreements or operationalise complex sharing arrangements. For that, we need institutional capacity.

We also need professional data stewards – individuals and organisations capable of brokering access between data holders and data users; translating governance principles into operational practice; managing risk while enabling innovation; and aligning data supply with societally defined demand.

Without dedicated stewardship roles, even the most carefully designed frameworks risk remaining aspirational rather than actionable. Data spaces may exist on paper. Open data portals may host thousands of datasets. But without intermediaries who connect questions to data, data to researchers, and researchers to communities, the ecosystem will underperform.

AI may accelerate science but unless we invest equally in the ecosystems around it, we risk entering a troubling paradox: more computational power, less accessible data and diminished public value.

Europe has often defined itself through its commitment to science as a public good. In the AI age, that must extend beyond funding laboratories and building supercomputers. It must encompass the design of data ecosystems that are purposeful, legitimate and stewarded with care.

The future of science won’t be determined by algorithms alone. It will be shaped by how intentionally we define our questions, how responsibly we govern and provide access to data, and how effectively we build human institutions capable of connecting the two.

In that sense, the most important scientific infrastructure of the coming decade may not be computational but institutional – the governance frameworks, stewardship capacities and intermediaries that can unlock fragmented data across sectors, ensure and validate its quality and legitimacy, and make it responsibly reusable for scientific discovery and public value.

Stefaan Verhulst, is Co-Founder of The GovLab and The DataTank (Brussels).  This CEPS commentary draws upon his keynote delivered at the closing plenary of the CEPS Ideas Lab 2026.