
ChatGPT's capabilities come from training on an enormous corpus of text data collected from across the internet and curated sources.
The exact training data composition is not fully disclosed by OpenAI. Lawsuits from publishers and authors allege unauthorized use of copyrighted works.
Reference:
TaskLoco™ — The Sticky Note GOAT