RE: Why does model performance drop when using time-based train-test splits?

Naomi Teng

Jan 31st 2026

RE: Why does model performance drop when using time-based train-test splits?

This is actually a pretty common moment when you switch to time-based splits, and it usually means you’re getting a more realistic signal rather than doing something wrong.

Random splits tend to make models look better than they really are because past and future data get mixed. That can hide leakage or let the model rely on patterns that only exist when time isn’t respected. Once you move to a time split, those shortcuts disappear and performance drops.

A few things I’ve seen in similar setups:

The gap often points to subtle leakage in the random split, even from features that don’t obviously look “time aware.”
Random Forests aren’t especially bad here, but they do pick up short-lived correlations, so they can struggle when the data distribution shifts over time.
To understand what’s happening, it helps to compare feature distributions between train and test, and to run rolling or expanding window validation to see how performance changes over time.

For time-dependent problems, I usually treat the time-based score as the real baseline and use random splits only for quick iteration or debugging. The lower number is often closer to what you’ll see in production.