AI lab TL;DR | Stefaan G. Verhulst - Are we entering a Data Winter?

AI lab TL;DR | Stefaan G. Verhulst - Are we entering a Data Winter?

🔍 In this TL;DR episode, Dr. Stefaan G. Verhulst (The GovLab & The Data Tank) discusses his Frontiers Policy Labs contribution on the urgent need to preserve data access for the public interest with the AI lab


📌 TL;DR Highlights

⏲️[00:00] Intro

⏲️[01:13] Q1-‘Data Winter’:

Can you provide a brief overview of your concept of 'Data Winter' and why you believe we are on the brink of entering one?

⏲️[05:05] Q2-Generative AI-nxiety:

What are some of the most significant challenges currently hindering public access to social media and climate data, and the effects of Generative AI-nxiety?

⏲️[07:49] Q3-‘Decade for Data’:

Could you outline what the “Decade for Data” initiative entails and how it could transform data stewardship and collaboration?

⏲️[12:25] Wrap-up & Outro


💭 Q1-‘Data Winter’


🗣️ At the time of an AI summer, when everyone suddenly is excited about the potential of

generative AI (...) for public interest purposes, (...) we are actually entering a data winter.

🗣️ What I’ve witnessed the last few months, and that’s mainly as a result of advances in artificial intelligence, is that we actually see a backtracking of the progress that we’ve made in society as it relates to opening up data for public interest purposes.

🗣️ Social media platforms such as X, but also Facebook, have closed down access to some of their data for research and for data journalism purposes as well.

🗣️ Science data, such as climate science data, which was typically open science, has now become commercialised and is becoming proprietary data enclosed for many in society.

🗣️ The initial data that was available for training data has now also become much harder to access, a result of concerns that some of that data has been extracted without a return to the data holder.


💭 Q2-Generative AI-nxiety


🗣️ Some of the data that typically was available through APIs has now been closed off, and so some are calling this the post-API environment that we're currently in, where data was easily available through an API now is actually much harder to access unless one pays for it.

🗣️ New licensing is being used to actually shield off the data for public interest purposes as well. So there are a whole range of vehicles that exist to enclose data that actually makes it much harder to access it for reuse.

🗣️ We see a decline in access to Wikipedia, a decline in people accessing Wikipedia, and a decline in people contributing to Wikipedia, mainly because they fear that whatever they contribute will be used as training fodder for generative AI purposes.

🗣️ Initiatives like Wikipedia, which are to a large extent the main source of a lot of the training data of generative AI services, are currently also suffering from AI extraction because they are dependent on voluntary contributions by the audience and the participants.

🗣️ As a result, we are entering a data winter, which if we are not careful (...) may actually affect the AI summer that we currently have as well.


💭 Q3-‘Decade for Data’


🗣️ I’ve been calling for, together with others, such as the United Nations University, a Decade for Data, which is a typical way the United Nations often operates, to feature a problem and then have a well-defined strategy to address that problem.

🗣️ A Decade for Data would have multiple components, one being advancing data collaboration, where you actually have new models of data being shared, including data commons, which can be updated in the current AI environment.

🗣️ We need a new reimagined profession of data stewards that are individuals or teams who have the sophistication and competencies to provide access to data in a systematic, sustainable, and responsible manner.

🗣️ A Decade for Data would also involve rethinking data governance and embedding digital self-determination in data governance to go beyond the current paradox of consent, facilitating access in a way that aligns with perceptions, expectations, and preferences of communities.

🗣️ Establishing a social license for reuse is key, where you understand the preferences and expectations of communities and individuals, translating that into a social license so that data can be reused in a way that is trusted and aligned with community expectations.


📌 About Our Guest

🎙️ Dr. Stefaan G. Verhulst | Co-Founder, The GovLab & The Data Tank

🌐 Frontiers Policy Labs | Are We Entering a Data Winter?

https://policylabs.frontiersin.org/content/commentary-are-we-entering-a-data-winter

🌐 The Data Tank

https://datatank.org

🌐 GovLab

https://thegovlab.org

🌐 Dr. Stefaan G. Verhulst

https://www.linkedin.com/in/stefaan-verhulst


Dr. Stefaan G. Verhulst co-founded several research organisations, including the GovLab (New York) and The DataTank (Brussels). He focuses on using advances in science and technology, including data and AI, to improve decision-making and problem-solving and has been recognized as one of the 10 Most Influential Academics in Digital Government globally.

Episoder(37)

AI lab TL;DR | Joan Barata - Transparency Obligations for All AI Systems

AI lab TL;DR | Joan Barata - Transparency Obligations for All AI Systems

🔍 In this TL;DR episode, Joan explains how Article 50 of the EU AI Act sets out high-level transparency obligations for AI developers and deployers—requiring users to be informed when they interact w...

10 Des 202517min

AI lab TL;DR | Aline Larroyed - The Fallacy Of The File

AI lab TL;DR | Aline Larroyed - The Fallacy Of The File

🔍 In this episode, Caroline and Alene unravel why the popular idea of “AI memorisation” leads policymakers down the wrong path—and how this metaphor obscures what actually happens inside large langua...

27 Nov 20257min

AI lab TL;DR | Anna Mills and Nate Angell - The Mirage of Machine Intelligence

AI lab TL;DR | Anna Mills and Nate Angell - The Mirage of Machine Intelligence

🔍 In this TL;DR episode, Anna and Nate unpack why calling AI outputs “hallucinations” misses the mark—and introduce “AI Mirage” as a sharper, more accurate metaphor. From scoring alternative terms to...

26 Mai 202520min

AI lab TL;DR | Emmie Hine - Can Europe Lead the Open-Source AI Race?

AI lab TL;DR | Emmie Hine - Can Europe Lead the Open-Source AI Race?

🔍 In this TL;DR episode, Emmie Hine (Yale Digital Ethics Center) makes the case for Europe’s leadership in open-source AI—thanks to strong infrastructure, multilingual data, and regulatory clarity. W...

12 Mai 202511min

AI lab TL;DR | Milton Mueller - Why Regulating AI Misses the Point

AI lab TL;DR | Milton Mueller - Why Regulating AI Misses the Point

🔍 In this TL;DR episode, Milton Mueller (the Georgia Institute of Technology School of Public Policy) argues that what we call “AI” is really just part of a broader digital ecosystem. Instead of vagu...

21 Apr 202518min

AI lab TL;DR | Kevin Frazier - How Smarter Copyright Law Can Unlock Fairer AI

AI lab TL;DR | Kevin Frazier - How Smarter Copyright Law Can Unlock Fairer AI

🔍 In this TL;DR episode, Kevin Frazier (University of Texas at Austin school of Law) outlines a proposal to realign U.S. copyright law with its original goal of spreading knowledge. The discussion in...

7 Apr 202516min

AI lab TL;DR | Paul Keller - A Vocabulary for Opting Out of AI Training and TDM

AI lab TL;DR | Paul Keller - A Vocabulary for Opting Out of AI Training and TDM

🔍 In this TL;DR episode, Paul Keller (The Open Future Foundation) outlines a proposal for a common opt-out vocabulary to improve how EU copyright rules apply to AI training. The discussion introduces...

24 Mar 202515min

AI lab TL;DR |  João Pedro Quintais - Untangling AI Copyright and Data Mining in EU Compliance

AI lab TL;DR | João Pedro Quintais - Untangling AI Copyright and Data Mining in EU Compliance

🔍 In this TL;DR episode, João Quintais (Institute for Information Law) explains the interaction between the AI Act and EU copyright law, focusing on text and data mining (TDM). He unpacks key issues ...

3 Mar 202525min

Populært innen Vitenskap

fastlegen
rekommandert
tingenes-tilstand
sinnsyn
forskningno
rss-rekommandert
liberal-halvtime
smart-forklart
jss
tomprat-med-gunnar-tjomlid
villmarksliv
fjellsportpodden
rss-paradigmepodden
dekodet-2
pod-britannia
psykopoden
rss-overskuddsliv
tidlose-historier
aldring-og-helse-podden
nevropodden