Skip to content

Observability

DiGiT writes structured telemetry alongside generated data so you can understand what happened during a run without adding any instrumentation to your own code. Every run produces two files in the telemetry/ directory: events.jsonl for lifecycle events and traces.jsonl for performance spans.

Output files

File Contains
telemetry/events.jsonl Structured lifecycle events: run start/finish/error, task start/finish, epoch boundaries, postprocessing, rejected data points
telemetry/traces.jsonl Performance spans: one record per LLM call or pipeline phase with latency, token usage, model info, and optional payload

Both files are append-only JSONL. Each record is one JSON object per line. Files rotate at 100 MB and rotated files older than 14 days are deleted automatically.

Lifecycle events

Every record in events.jsonl carries build_id, run_id, event, timestamp, and a human-readable message. The build_id identifies the experiment; the run_id identifies one attempt (preserved across resume restarts so a resumed run shares the same run_id as the original).

Event When it fires
run_started When execute_tasks() begins. Carries builder_name, task_names, resumed flag
run_finished When all tasks complete successfully
run_errored When an unhandled exception stops the run. Carries the exception message
task_started When a task enters the active set
task_finished When a task completes. Carries reason: complete, stalled_generation, or stalled_postprocessing
epoch_started At the start of each generation epoch. Carries epoch, active_task_names, active_task_count
epoch_finished At the end of each epoch, after postprocessing. Carries epoch, generation_attempts (how many generator batches ran this epoch), task_counts, and finish_reasons
postprocessing_finished After postprocessing completes for an epoch. Carries epoch and task_counts with before/after counts per task, showing how many records were accepted vs. discarded
data_point_rejected When a ValidatorBlock filters a record. Carries block_name, task_name, reason

Example record:

{
  "event": "task_finished",
  "message": "Task 'public/examples/misconceptions' finished.",
  "build_id": "abc123",
  "run_id": "def456",
  "task_name": "public/examples/misconceptions",
  "reason": "complete",
  "timestamp": "2026-01-15T10:23:14.001Z"
}

Spans

DiGiT writes spans for both pipeline phases and individual LLM calls. All spans carry build_id and run_id to link them back to the originating run.

Pipeline spans

span_name What it covers Key fields
dgt.run The entire run from execute_tasks() start to finish builder_name, task_names
dgt.epoch One generation epoch, including the generator batch and the block chain epoch, task_count
dgt.block One invocation of a single block within an epoch block_name, block_type
dgt.postprocessing The postprocessing phase that runs at the end of each epoch epoch, task_count

LLM call spans

Every record in traces.jsonl with span_name: dgt.llm_call corresponds to one batched LLM call. Spans are written by the LMProvider block after each call completes.

Field Description
span_name Always dgt.llm_call
provider Block type: ollama, openai, anthropic, etc.
model_id The model_id_or_path value used
method completion or chat_completion
batch_size Number of requests in this call
duration_ms Pure API latency (excludes semaphore wait time)
semaphore_wait_ms Time spent waiting for a concurrency slot
prompt_tokens Token count from the provider response
completion_tokens Token count from the provider response
task_names Deduplicated list of task names whose data was in this batch. Present when input dicts include task_name
build_id / run_id Links the span to its run

Example record:

{
  "span_name": "dgt.llm_call",
  "provider": "ollama",
  "model_id": "granite4:3b",
  "method": "chat_completion",
  "batch_size": 10,
  "duration_ms": 4821,
  "semaphore_wait_ms": 12,
  "prompt_tokens": 1840,
  "completion_tokens": 312,
  "task_names": ["public/examples/geography_qa"],
  "build_id": "abc123",
  "run_id": "def456"
}

Token attribution with multi-task completion batches

When using completion mode with a multi-task run, a single API call may contain requests from several tasks. In that case task_names lists all of them, but prompt_tokens and completion_tokens are for the entire batch, not per task. For per-task token attribution, prefer single-task runs or chat_completion mode (which always sends one request per task item).

Payload recording

By default, prompts and completions are not written to telemetry files. To include them for debugging, set DGT_TELEMETRY_RECORD_PAYLOADS=1. Each span will then carry prompt or messages and completion fields, truncated to DGT_TELEMETRY_PAYLOAD_MAX_CHARS characters (default 4096). A payload_truncated: true flag is set when truncation occurs.

Payload recording and sensitive data

Prompts and completions may contain seed examples, generated outputs, or document content. Do not enable payload recording in environments where that data is sensitive, and ensure the telemetry/ directory is excluded from version control (it is in .gitignore by default).

Environment variables

Variable Default Description
DGT_TELEMETRY_DIR telemetry/ Directory for telemetry output files
DGT_TELEMETRY_DISABLE (unset) Set to any non-empty value to disable all telemetry file writing
DGT_TELEMETRY_RECORD_PAYLOADS (unset) Set to 1 to include prompts and completions in spans
DGT_TELEMETRY_PAYLOAD_MAX_CHARS 4096 Maximum characters per payload field

Disabling telemetry

DGT_TELEMETRY_DISABLE=1 python -m fms_dgt.public \
  --task-paths ./tasks/public/examples/qa/task.yaml \
  --restart

When disabled, no files are written and no overhead is incurred. All other run behavior is unchanged.