Topic: training data

14 chunks · 13 episodes

Topic summary

?
A short read on the topic's time range, peak episode, and strongest associations. Use it as the quick orientation before drilling into examples.
  • training data appears in 14 chunks across 13 episodes, from 2023-11-13 to 2025-11-24.
  • Its densest episode is Bits and Bobs 6/24/24 (2024-06-24), with 2 observations on this topic.
  • Semantically it travels with llms, consistent bias, and ChatGPT, while by chunk count it sits between social network and web page; its yearly rank moved from #69 in 2023 to #128 in 2025.

Over time

?
Raw mentions over time. Use this to see absolute attention, not relative rank among all topics.
Mean 1.1 mentions per episode across the full range2023-11-13: 1 mention2024-03-11: 1 mention2024-06-24: 2 mentions2024-07-22: 1 mention2024-08-12: 1 mention2024-11-25: 1 mention2025-01-27: 1 mention2025-06-30: 1 mention2025-07-21: 1 mention2025-08-25: 1 mention2025-10-13: 1 mention2025-11-10: 1 mention2025-11-24: 1 mention2023-11-13: 12024-03-11: 12024-06-24: 22024-07-22: 12024-08-12: 12024-11-25: 12025-01-27: 12025-06-30: 12025-07-21: 12025-08-25: 12025-10-13: 12025-11-10: 12025-11-24: 12023-11-132025-01-272025-11-24

Observations

?
The primary evidence view for this topic. Sort it chronologically when you want concrete examples behind the larger pattern.

Ben Mathes distilling Babak Nivi:

from Bits and Bobs 10/13/25 ·

Ben Mathes distilling Babak Nivi: "The meaning and soul went into the training data, and it's in us as we read the text. It's not in the LLM anywhere. But we can get it as a result of reading the output."

LLMs don't do a good job with negative space.

from Bits and Bobs 6/24/24 ·

...who has wrestled with generative image models to remove some detail. All of the training data has descriptions of images as they actually are. In that case, why would you describe what's not in the image? You can just describe what is in the i...

AI is a confusing catch-all term.

from Bits and Bobs 3/11/24 ·

...sts, etc. AI 2.0: Deep learning. Supervised learning with bespoke, high-quality training data. AI 3.0: Unsupervised learning. LLMs. Messy, kitchen-sink, highly scaled training data. In any given situation, it's still possible to make a better-...