What tech stacks are teams using for scalable AI agent systems in production?

I’ve been exploring how organizations are structuring production-ready AI workflows beyond just model experimentation, particularly around orchestration, retrieval pipelines, memory handling, monitoring, and multi-agent coordination.

There are now so many combinations being used across:
• LLM frameworks
• vector databases
• orchestration layers
• observability tools
• retrieval systems
• agent frameworks
• cloud infrastructure

The challenge is that many stacks work well in prototypes, but reliability, scalability, governance, and operational complexity become very different conversations once systems move into real enterprise environments.

Curious to hear from teams already building or deploying AI agents in production:
What stack combinations are working well for you, and what trade-offs have you encountered so far?