Observability
DiGiT writes structured telemetry alongside generated data so you can understand what happened during a run without adding any instrumentation to your own code. Every run produces two files in the telemetry/ directory: events.jsonl for lifecycle events and traces.jsonl for performance spans.
Output files
| File | Contains |
|---|---|
telemetry/events.jsonl |
Structured lifecycle events: run start/finish/error, task start/finish, epoch boundaries, postprocessing, rejected data points |
telemetry/traces.jsonl |
Performance spans: one record per LLM call or pipeline phase with latency, token usage, model info, and optional payload |
Both files are append-only JSONL. Each record is one JSON object per line. Files rotate at 100 MB and rotated files older than 14 days are deleted automatically.
Lifecycle events
Every record in events.jsonl carries build_id, run_id, event, timestamp, and a human-readable message. The build_id identifies the experiment; the run_id identifies one attempt (preserved across resume restarts so a resumed run shares the same run_id as the original).
| Event | When it fires |
|---|---|
run_started |
When execute_tasks() begins. Carries builder_name, task_names, resumed flag |
run_finished |
When all tasks complete successfully |
run_errored |
When an unhandled exception stops the run. Carries the exception message |
task_started |
When a task enters the active set |
task_finished |
When a task completes. Carries reason: complete, stalled_generation, or stalled_postprocessing |
epoch_started |
At the start of each generation epoch. Carries epoch, active_task_names, active_task_count |
epoch_finished |
At the end of each epoch, after postprocessing. Carries epoch, generation_attempts (how many generator batches ran this epoch), task_counts, and finish_reasons |
postprocessing_finished |
After postprocessing completes for an epoch. Carries epoch and task_counts with before/after counts per task, showing how many records were accepted vs. discarded |
data_point_rejected |
When a ValidatorBlock filters a record. Carries block_name, task_name, reason |
Example record:
{
"event": "task_finished",
"message": "Task 'public/examples/misconceptions' finished.",
"build_id": "abc123",
"run_id": "def456",
"task_name": "public/examples/misconceptions",
"reason": "complete",
"timestamp": "2026-01-15T10:23:14.001Z"
}
Spans
DiGiT writes spans for both pipeline phases and individual LLM calls. All spans carry build_id and run_id to link them back to the originating run.
Pipeline spans
span_name |
What it covers | Key fields |
|---|---|---|
dgt.run |
The entire run from execute_tasks() start to finish |
builder_name, task_names |
dgt.epoch |
One generation epoch, including the generator batch and the block chain | epoch, task_count |
dgt.block |
One invocation of a single block within an epoch | block_name, block_type |
dgt.postprocessing |
The postprocessing phase that runs at the end of each epoch | epoch, task_count |
LLM call spans
Every record in traces.jsonl with span_name: dgt.llm_call corresponds to one batched LLM call. Spans are written by the LMProvider block after each call completes.
| Field | Description |
|---|---|
span_name |
Always dgt.llm_call |
provider |
Block type: ollama, openai, anthropic, etc. |
model_id |
The model_id_or_path value used |
method |
completion or chat_completion |
batch_size |
Number of requests in this call |
duration_ms |
Pure API latency (excludes semaphore wait time) |
semaphore_wait_ms |
Time spent waiting for a concurrency slot |
prompt_tokens |
Token count from the provider response |
completion_tokens |
Token count from the provider response |
task_names |
Deduplicated list of task names whose data was in this batch. Present when input dicts include task_name |
build_id / run_id |
Links the span to its run |
Example record:
{
"span_name": "dgt.llm_call",
"provider": "ollama",
"model_id": "granite4:3b",
"method": "chat_completion",
"batch_size": 10,
"duration_ms": 4821,
"semaphore_wait_ms": 12,
"prompt_tokens": 1840,
"completion_tokens": 312,
"task_names": ["public/examples/geography_qa"],
"build_id": "abc123",
"run_id": "def456"
}
Token attribution with multi-task completion batches
When using completion mode with a multi-task run, a single API call may contain requests from several tasks. In that case task_names lists all of them, but prompt_tokens and completion_tokens are for the entire batch, not per task. For per-task token attribution, prefer single-task runs or chat_completion mode (which always sends one request per task item).
Payload recording
By default, prompts and completions are not written to telemetry files. To include them for debugging, set DGT_TELEMETRY_RECORD_PAYLOADS=1. Each span will then carry prompt or messages and completion fields, truncated to DGT_TELEMETRY_PAYLOAD_MAX_CHARS characters (default 4096). A payload_truncated: true flag is set when truncation occurs.
Payload recording and sensitive data
Prompts and completions may contain seed examples, generated outputs, or document content. Do not enable payload recording in environments where that data is sensitive, and ensure the telemetry/ directory is excluded from version control (it is in .gitignore by default).
Environment variables
| Variable | Default | Description |
|---|---|---|
DGT_TELEMETRY_DIR |
telemetry/ |
Directory for telemetry output files |
DGT_TELEMETRY_DISABLE |
(unset) | Set to any non-empty value to disable all telemetry file writing |
DGT_TELEMETRY_RECORD_PAYLOADS |
(unset) | Set to 1 to include prompts and completions in spans |
DGT_TELEMETRY_PAYLOAD_MAX_CHARS |
4096 |
Maximum characters per payload field |
Disabling telemetry
DGT_TELEMETRY_DISABLE=1 python -m fms_dgt.public \
--task-paths ./tasks/public/examples/qa/task.yaml \
--restart
When disabled, no files are written and no overhead is incurred. All other run behavior is unchanged.