This is pretty common when switching to a time-based split. A random split often hides issues because information from the future can leak into training, even indirectly. Once you respect time, the problem usually becomes harder and performance drops.
In my experience, this often points to either subtle feature leakage (features that wouldn’t exist at prediction time) or genuine concept drift where patterns change over time. Tree-based models aren’t uniquely sensitive, but they can amplify these effects if the data distribution shifts.
A few things that helped me diagnose this: checking feature stability over time, comparing feature importance across periods, and validating with rolling or expanding windows instead of a single split. The lower AUC doesn’t necessarily mean the model is worse—it’s often a more realistic estimate of real-world performance.

Be the first to post a comment.