Welcome to Galactic#
Galactic provides cleaning and curation tools for massive unstructured text datasets. Galactic helps you curate fine-tuning datasets, create document collections for retrieval-augmented generation (RAG), perform deduplication of web-scale datasets for LLM pre-training, and more.
Galactic is made available under the Apache 2.0 license.