I’m currently working on a production ML project, so I can’t share specific details about the domain or data.
We have a deployed model where performance looks stable in offline evaluation, but in real usage we suspect gradual drift. The challenge is that reliable ground truth only becomes available weeks or months later, which makes continuous validation difficult.
I’m trying to understand practical approaches teams use in this situation:
- How do you monitor model health before labels arrive?
- What signals have you found most useful as early indicators of drift?
- How do you balance reacting early vs avoiding false alarms?
Looking for general patterns, tooling approaches, or lessons learned rather than domain-specific solutions.
