The field of artificial intelligence is moving towards more scalable and efficient systems, with a focus on heterogeneous infrastructure and distributed dataflow. Recent developments have shown that pipeline parallelism and distributed transfer dock strategies can significantly improve the throughput and utilization of AI systems. Additionally, there is a growing trend towards unified scheduling for large language model training and inference, as well as efficient and scalable deployment of AI agents on heterogeneous compute infrastructure. Noteworthy papers include: PPipe, which achieves 41.1% - 65.5% higher utilization of low-class GPUs while maintaining high utilization of high-class GPUs. MindSpeed RL, which increases the throughput by 1.42 ~ 3.97 times compared to existing state-of-the-art systems. TokenSmith, which streamlines data editing, search, and inspection for large-scale language model training and interpretability. MegatronApp, which provides efficient and comprehensive management on distributed LLM training. LeMix, which improves throughput by up to 3.53x and reduces inference loss by up to 0.61x. G-Core, which provides a simple, scalable, and balanced RLHF training framework.