Another entrant is staking a claim on the training-data bottleneck that's kept token prices climbing for frontier labs. The framing as a 'multi-billion dollar' problem signals the pitch will be industry-wide infrastructure rather than a single licensing deal—worth watching whether it's a nonprofit standard-setter or a thinly-veiled data broker.
Either way, the more players position themselves as bottleneck-solvers, the more it validates that scarce, well-annotated data is the actual constraint on model progress, not compute.
The DATA Foundation Launches to Tackle AI's Multi-Billion Dollar Training Data Bottleneck