Skip to main content
Use this page if your team is new to model canaries.

Start here (10-minute setup)

  1. Keep your current model as control.
  2. Set your fine-tuned model as canary.
  3. Start at 5% traffic on one repo.
  4. Watch monitoring metrics for 2-7 days.
  5. Promote only if quality and stability stay healthy.

Environment settings

Set these on the apps/www runtime host:
GITHUB_LABELER_USE_OLLAMA=true
GITHUB_LABELER_OLLAMA_BASE_URL=http://127.0.0.1:11434

# Control lane (current stable model)
GITHUB_LABELER_OLLAMA_MODEL=koji-local

# Canary lane (fine-tuned model)
GITHUB_LABELER_CANARY_MODEL=koji-local-ft-v1
GITHUB_LABELER_CANARY_PERCENT=5
GITHUB_LABELER_CANARY_ALLOW_REPOS=ClarkPhan/dojo

GITHUB_LABELER_OLLAMA_TIMEOUT_MS=2500
GITHUB_LABELER_LOG_FILE=/home/clark/.openclaw/workspace/logs/github-label-flywheel.jsonl

How routing works

For each PR event, Dojo computes a stable bucket from repo + prNumber.
  • If bucket is inside GITHUB_LABELER_CANARY_PERCENT, route to canary.
  • Otherwise, route to control.
  • If canary is disabled (CANARY_PERCENT=0), all traffic stays on control.
This keeps assignment stable and avoids random flapping.

What gets logged

Each JSONL event stores:
  • where the decision came from (source, repoFullName, prNumber)
  • which model lane handled it (modelLane, modelName, canaryBucket)
  • deterministic vs model outputs
  • final selected labels and apply changes
  • disagreement signal (modelOverrodeDeterministic)

Analyze data

pnpm labeler:report \
  --file /home/clark/.openclaw/workspace/logs/github-label-flywheel.jsonl \
  --out-json /home/clark/.openclaw/workspace/logs/github-label-flywheel-report.json \
  --out-md /home/clark/.openclaw/workspace/logs/github-label-flywheel-report.md

Promotion and rollback

Default rollout ladder:
  1. 5%
  2. 15%
  3. 30%
  4. 50%
  5. 100% only after stable outcomes and human review
Instant rollback:
GITHUB_LABELER_CANARY_PERCENT=0

Data lake note

For training, ship JSONL logs to object storage (S3/R2/GCS), partition by date and repo. The most valuable training signal is human correction history. Related:
  • Dojo → Label flywheel monitoring