When a body like the World Economic Forum starts writing about synthetic data, it's a signal that the scarcity of clean human-generated text is now a boardroom-level concern, not just a labs problem. If synthetic data becomes the default filler for gaps in web-scraped corpora, the real question for the data economy is who prices it: model labs generating their own, or a new tier of synthetic-data vendors selling at a discount to human-annotated sets.
Expect this framing to accelerate deals between frontier labs and synthetic-data specialists as a hedge against licensing costs for scarce human data.
Artificial intelligence and the growth of synthetic data