A drop in performance after moving to a time-based split is very common and, in many cases, expected.
Random splits often make models look better than they will perform in reality because past and future data get mixed. This can hide subtle leakage or allow the model to rely on patterns that don’t hold once time is respected. When you switch to a time-based split, you’re forcing the model to predict on a genuinely new distribution, which is closer to how it will behave in production.
A few observations from practice:
-
The gap often indicates that the random split benefited from information leakage or overly stable correlations.
-
Tree-based models like Random Forests are not inherently worse here, but they do tend to pick up short-term patterns that may shift over time.
-
Time-based validation exposes concept drift and feature instability that random splits simply don’t surface.
To diagnose what’s happening, teams usually:
-
Compare feature distributions between training and test periods
-
Check whether any features implicitly encode future information
-
Use rolling or expanding window validation instead of a single split
In most cases, the lower AUC from the time-based split is the more honest metric. It doesn’t mean the model got worse. It means the evaluation is now aligned with real-world conditions
