Welcome to DGT
High-quality data is the backbone of modern AI development, but acquiring diverse, domain-specific, and scalable datasets remains a major bottleneck. Synthetic data generation addresses this challenge by enabling the creation of tailored datasets that are:
- Cost-effective and privacy-preserving
- Customizable for specific tasks and domains
- Scalable to meet evolving model needs
DGT (Data Generation and Transformation) [pronounced "digit"] is a horizontal framework designed to streamline and scale expert, domain-specific synthetic data generation via simplifying and standardizing essential components.
Features
- ๐ค Standardize interface for ~5+ different LM engines (WatsonX, OpenAI, Azure OpenAI, vLLM, ollama, anthropic etc.) with retry/fallback logic
- ๐ก Support for several domain-specific pipelines for tool calling, time series, question answering and more
- ๐งช Growing list of syntactic validators, deduplicators, LLMaJs (LLM-as-a-Judge)
- ๐ Local execution capabilities for sensitive data and air-gapped environments
- ๐ค Plug-and-play [integrations][integrations] incl. Docling
- ๐ป Simple and convenient CLI