Hybrid MARL + Linear Programming Architecture for Logistics Scheduling

Posted by alexanderlevin |2 hours ago |1 comments

alexanderlevin 2 hours ago

We spent two years optimizing vehicle routing for a huge line-haul delivery network. We found that standard OR solvers (such as Google OR-Tools) struggled with the dynamic nature of the requests, while pure Reinforcement Learning agents would not converge.

We ended up building a Hybrid Architecture that splits the logic:

1. MARL Agents act as "Fleet Managers" that handle high-level strategy (when to dispatch, which cluster to serve). 2. Linear Programming acts as a "Bin Packer" to enforce strict physical constraints on the final route.

The article details the architecture, the specific reward shaping we used to encourage LTL (Less-Than-Truckload) consolidation, and how we normalized the observation space to achieve zero-shot generalization across different warehouse sizes.

Happy to answer questions about the stack or the specific failure/success cases we ran into.A