Using a date parameter to control data volume Dev, UAT, and Prod is this a reasonable?

Javid Jaffer
Updated 10 hours ago in

I’m designing a pipeline where the same dataset needs to flow through different environments: Dev, UAT, and Prod. The challenge is that the production dataset is huge, but in Dev and UAT, I only need a subset of the data to test transformations and run analytics efficiently.

My idea is to use a date parameter (e.g., start_date/end_date) to limit the data volume in non-prod environments, so Dev and UAT only process a smaller, manageable slice of the dataset.

I’m wondering:

  • Is using a date parameter a common or recommended practice for this?
  • Are there risks in this approach that I should be aware of, such as skewed test results or missed edge cases?
  • Are there better strategies for controlling data volume across environments while maintaining meaningful test coverage?

I’d love to hear how others handle large datasets across multiple environments in a practical, maintainable way.

  • 0
  • 12
  • 10 hours ago
 
Loading more replies