Welcome to Galactic

Welcome to Galactic#

Galactic provides cleaning and curation tools for massive unstructured text datasets. Galactic helps you curate fine-tuning datasets, create document collections for retrieval-augmented generation (RAG), perform deduplication of web-scale datasets for LLM pre-training, and more.

Galactic is made available under the Apache 2.0 license.

Indices and tables#