We aren't running out of training data, we are running out of open training data

Interconnects - A podcast by Nathan Lambert

Categories:

Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/the-data-wall0:00 We aren't running out of training data, we are running out of open training data2:51 Synthetic data: 1 trillion new tokens per day4:18 Data licensing deals: High costs per token6:33 Better tokens: Search and new frontiers Get full access to Interconnects at www.interconnects.ai/subscribe

Visit the podcast's native language site