What makes content valuable in an AI world?
Kevin Indig
JUNE 19, 2023
As I wrote in AI copyright could lead to new Marketing opportunities : Of the 45 terabytes of text GPT-3 was trained on, 60% came from Common Crawl*, 22% from WebText 2 (which is trained on outgoing links from Reddit), 8% on books and 3% on Wikipedia. On top comes that their data is used to train LLMs.
Let's personalize your content