Your LLM Gets Its Data From Where??
Salesforce Marketing Cloud
MARCH 20, 2024
Corpus data Corpus data includes written or spoken data from books, newspapers, articles, websites (including blogs), academic papers, and more. Wikipedia, where anyone can write and edit an entry, is a major data source. It’s estimated that Wikipedia makes up between 3%-5% of the scraped data used to train off-the-shelf LLMs.
Let's personalize your content