Skip to content

Installation

Before you start, please make sure you have Python 3.10+ available.

Building DGT from source lets you make changes to the code base. To install from source, clone the repository and install with the following commands:

git clone git@github.com:IBM/fms-dgt.git
cd fms-dgt

Now let's set up your virtual environment.

python3.10 -m venv ssdg_venv
source ssdg_venv/bin/activate

To install packages, we recommend starting off with the following

pip install -e ".[all]"

If you plan on contributing, install the pre-commit hooks to keep code formatting clean

pip install pre-commit
pre-commit install
uv sync --extra all

If you plan on contributing, install the pre-commit hooks to keep code formatting clean

uv pip install pre-commit
uv pre-commit install

Large Language Models (LLMs) Dependencies

DGT uses Large Language Models (LLMs) to generate synthetic data. Following LLM inference engines are supported:

Engine Additional Installation Environment Variables Supported APIs
Ollama - - completion, chat_completion
WatsonX - WATSONX_API_KEY="", WATSONX_PROJECT_ID="" completion, chat_completion
OpenAI - OPENAI_API_KEY="" completion, chat_completion
Azure OpenAI - AZURE_OPENAI_API_KEY="" completion, chat_completion
Anthropic Claude - ANTHROPIC_API_KEY="" chat_completion
vLLM pip install -e ".[vllm]" - completion, chat_completion

Most of the aforementioned LLM inference engines use environment variables to specify configuration settings. You can either export those environment variables prior to every run or save them in .env file at base of fms-dgt repository directory.

Warning

vLLM dependencies requires Linux OS and CUDA.