Installation

Before you start, please make sure you have Python 3.10+ available.

Building DGT from source lets you make changes to the code base. To install from source, clone the repository and install with the following commands:

git clone git@github.com:IBM/fms-dgt.git
cd fms-dgt

Now let's set up your virtual environment.

Python venvuv

python3.10 -m venv ssdg_venv
source ssdg_venv/bin/activate

To install packages, we recommend starting off with the following

pip install -e ".[all]"

If you plan on contributing, install the pre-commit hooks to keep code formatting clean

pip install pre-commit
pre-commit install

uv sync --extra all

If you plan on contributing, install the pre-commit hooks to keep code formatting clean

uv pip install pre-commit
uv pre-commit install

Large Language Models (LLMs) Dependencies

DGT uses Large Language Models (LLMs) to generate synthetic data. Following LLM inference engines are supported:

Engine	Additional Installation	Environment Variables	Supported APIs
Ollama	-	-	`completion`, `chat_completion`
WatsonX	-	`WATSONX_API_KEY=""`, `WATSONX_PROJECT_ID=""`	`completion`, `chat_completion`
OpenAI	-	`OPENAI_API_KEY=""`	`completion`, `chat_completion`
Azure OpenAI	-	`AZURE_OPENAI_API_KEY=""`	`completion`, `chat_completion`
Anthropic Claude	-	`ANTHROPIC_API_KEY=""`	`chat_completion`
vLLM	`pip install -e ".[vllm]"`	-	`completion`, `chat_completion`

Most of the aforementioned LLM inference engines use environment variables to specify configuration settings. You can either export those environment variables prior to every run or save them in .env file at base of fms-dgt repository directory.

Warning

vLLM dependencies requires Linux OS and CUDA.