We aren't running out of training data, we are running out of open training data
Interconnects - A podcast by Nathan Lambert
![](https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/68/94/99/68949976-4175-6665-5753-307a7a6dfcff/mza_8943019308595837552.jpg/300x300bb-75.jpg)
Categories:
Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/the-data-wall0:00 We aren't running out of training data, we are running out of open training data2:51 Synthetic data: 1 trillion new tokens per day4:18 Data licensing deals: High costs per token6:33 Better tokens: Search and new frontiers Get full access to Interconnects at www.interconnects.ai/subscribe