alexanderlevin 2 hours ago
We ended up building a Hybrid Architecture that splits the logic:
1. MARL Agents act as "Fleet Managers" that handle high-level strategy (when to dispatch, which cluster to serve). 2. Linear Programming acts as a "Bin Packer" to enforce strict physical constraints on the final route.
The article details the architecture, the specific reward shaping we used to encourage LTL (Less-Than-Truckload) consolidation, and how we normalized the observation space to achieve zero-shot generalization across different warehouse sizes.
Happy to answer questions about the stack or the specific failure/success cases we ran into.A