How teams handle model drift in production when ground truth arrives late?

Maitrik
Updated 7 days ago in

I’m currently working on a production ML project, so I can’t share specific details about the domain or data.

We have a deployed model where performance looks stable in offline evaluation, but in real usage we suspect gradual drift. The challenge is that reliable ground truth only becomes available weeks or months later, which makes continuous validation difficult.

I’m trying to understand practical approaches teams use in this situation:

  • How do you monitor model health before labels arrive?
  • What signals have you found most useful as early indicators of drift?
  • How do you balance reacting early vs avoiding false alarms?

Looking for general patterns, tooling approaches, or lessons learned rather than domain-specific solutions.

  • 1
  • 20
  • 7 days ago
 
5 days ago

This kind of drop is common when moving from random splits to time-based splits and often indicates that the original setup benefited from leakage or unrealistically easy correlations. Random splits allow the model to see future patterns indirectly, which inflates performance.

Tree-based models can struggle when feature distributions shift over time, so it’s worth checking feature drift and target stability. Monitoring feature importance changes and score distributions can help confirm this.

In most cases, the time-based result is the more honest signal. From there, techniques like rolling validation, feature decay, or retraining schedules usually matter more than model choice.

  • Liked by
Reply
Cancel
Loading more replies