S8 Ep34: How good are LLMs at doing our jobs?

S8 Ep34: How good are LLMs at doing our jobs?

 In the second of special series recorded live at the PSE-CEPR Policy Forum 2025, we are asking, how good is AI at doing real-world job task? And how can we measure their capability without resorting to technical benchmarks that may not mean much in the workplace?

Since we all became aware of large language models, LLMs scientists have been attempting to evaluate how good they are at performing expert tasks. The results of those tests can show us whether LLMs can be useful complements to our work, or even replacements for us, as many fear. But setting or grading a test to decide whether an LLM can do a problem-solving job task, rather than solve an abstract problem, isn't easy to do. Maria del Rio-Chanona, a computer scientist at UCL, tells Tim Phillips about her innovative work-in-progress, in which she asks an LLM to set a tricky workplace exam, then tells another LLM to take the test – which a third LLM evaluates.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(462)

S9 Ep35: The success of the embedded state

S9 Ep35: The success of the embedded state

Who kept the courts sitting and the streetlights lit when the state had almost no money to pay anyone?Two hundred years ago, British local government ran on unpaid labour. In a parliamentary survey of...

26 Jun 20min

S9 Ep34: Making defence spending pay

S9 Ep34: Making defence spending pay

Defence spending is rising whether voters like it or not. The UK has committed to 2.5% of national income and aims for nearer 3.5% over the next decade, £30bn a year for each percentage point. What do...

19 Jun 26min

S9 Ep33: Did the sewing machine liberate women?

S9 Ep33: Did the sewing machine liberate women?

In January 1860 the New York Times gave its blessing to a new machine: the sewing machine. These "iron needle-women", it wrote, were the only invention that could be claimed “chiefly for women's benef...

12 Jun 19min

S9 Ep32: The digital money supply

S9 Ep32: The digital money supply

Every day, billions of transactions settle between strangers who have no idea which bank the other uses. That lack of friction is not automatic. Nine-tenths of the money in daily circulation has been ...

5 Jun 27min

S9 Ep31: How well does patent screening work?

S9 Ep31: How well does patent screening work?

Someone once held a patent on the swing. A piece of wood. Two ropes. The US Patent Office granted it. How often does that actually happen, and what does it cost when the system gets it wrong? Or, how ...

29 Mai 32min

S9 Ep30: Redefining the monetary standard

S9 Ep30: Redefining the monetary standard

The fiat money system has survived the Great Inflation, the global financial crisis, and a pandemic. But can it survive digital currencies?Bitcoin and the blockchain solved a genuine problem in comput...

22 Mai 26min

S9 Ep29: Guns and Butter

S9 Ep29: Guns and Butter

Europe's NATO members have pledged 3.5% of GDP to rearmament. The political argument is already about which social programmes will be sacrificed to pay for this, when the government chooses guns inste...

15 Mai 21min

S9 Ep28: Immigration and integration in Europe

S9 Ep28: Immigration and integration in Europe

More than one in eight people living in the EU today was born in another country. In fourteen of the bloc's largest economies, it is closer to one in six. For ten years, the same team of researchers h...

8 Mai 25min

Populært innen Business og økonomi

stopp-verden
dine-penger-pengeradet
lydartikler-fra-aftenposten
rss-penger-polser-og-politikk
e24-podden
rss-borsmorgen-okonominyhetene
rss-skravla-gar
rss-pa-konto
pengesnakk
livet-pa-veien-med-jan-erik-larssen
pengepodden-2
finansredaksjonen
utbytte
tid-er-penger-en-podcast-med-peter-warren
lederpodden
morgenkaffen-med-finansavisen
stormkast-med-valebrokk-stordalen
liberal-halvtime
okonomiamatorene
rss-markedspuls-2