Daily checklist (5 minutes)
- Is canary traffic close to target percent?
- Are error and timeout rates within limits?
- Is disagreement stable (not spiking)?
- Any obvious quality regressions from reviewers?
Weekly checklist (15 minutes)
- Compare canary vs control on quality metrics
- Review top disagreement examples manually
- Decide: promote one step, hold, or roll back
Core metrics and targets
| Metric | Target |
|---|---|
| Canary traffic share | near configured percent |
| Error rate | < 1% |
| Timeout rate | < 2% |
| Override rate | stable vs control |
| Type disagreement | stable vs control |
| Risk disagreement | stable vs control |
Alert policy
Create alerts for:- error rate > 1% for 15m
- timeout rate > 2% for 15m
- disagreement rate > 2x 7-day baseline
- data ingestion completeness < 99.5% daily
Promotion ladder
- 5% for 2-7 days
- 15% for 2-7 days
- 30% for 2-7 days
- 50% for 2-7 days
- 100% only after stable metrics + human review
Sample Metabase SQL (starter set)
Assume table:analytics.label_flywheel_events
1) Canary traffic share by hour
2) Error + timeout rate by lane
3) Override + disagreement by lane
Keep this maintainable
- Keep one canonical telemetry table/view for dashboard queries.
- Version your schema before adding/removing fields.
- Keep promotion thresholds in this page only (single source of truth).
- Save one weekly decision note: promote / hold / rollback + reason.
- Dojo → Label flywheel canary