Skip to content

Tools

Generating tool-calling training data requires answering three questions for every scenario:

  • What tools exist? Tool definitions must come from somewhere, be organized, and be available at generation time.
  • Which tools should appear in this scenario? Random selection produces coverage, but targeted selection produces richer training signal.
  • What happens when the model calls a tool? The framework needs to simulate or execute the call and return a result the assistant stage can use.

These three concerns map to three component families with a clear dependency structure:

ToolLoader ──► ToolRegistry ◄── ToolEnrichment
              ┌─────┴──────┐
              ▼            ▼
         ToolSampler   ToolEngine

ToolRegistry is the central store. Loaders populate it; enrichments augment it; samplers and engines consume it. Neither sampler nor engine modifies the registry after construction.

Components at a glance

Component Role When you need it
ToolLoader Reads tool definitions from a source (file, MCP, REST) Always
ToolRegistry Stores, validates, and exposes tool definitions Always
ToolEnrichment Augments tools with output schemas, embeddings, or dataflow edges When using topology-aware or embedding-based samplers
ToolSampler Selects a subset of tools per scenario In every generation stage that builds tool-calling scenarios
ToolEngine Executes or simulates tool calls at runtime In every generation stage that produces tool call/result pairs

YAML integration

The tools: block is a first-class field on any task, at the same level as datastore: and formatter:. Its three sub-keys map directly to the components above:

tools:
  registry:                         # required — one or more loader entries
    - type: file
      path: ${DGT_DATA_DIR}/weather_tools.yaml
      namespace: weather_api
  enrichments:                      # optional — omit if not needed
    - type: output_parameters
      lm_config:
        type: ollama
        model_id_or_path: granite3.3:8b
    - type: dataflow
      model: sentence-transformers/all-mpnet-base-v2
  engines:                          # optional — omit for registry-only tasks
    lm_sim:
      type: lm
      lm_config:
        type: ollama
        model_id_or_path: granite3.3:8b
        temperature: 0.0
        max_new_tokens: 512

At runtime, Task.__init__ builds the registry from the loader entries, runs enrichments in dependency order, and constructs the engine. The resulting task.tool_registry and task.tool_engine are ready for stages to consume.

Reading path

I want to... Go to
Define tools and load them from files or external servers Registry and Loaders
Understand enrichments and when to enable them Enrichments
Choose a sampling strategy for my recipe Samplers
Understand how tool calls are executed or simulated Engines