The field of reinforcement learning and optimization is moving towards more efficient and scalable methods. Researchers are exploring new techniques to improve sample efficiency, such as hindsight regularization and reparameterization policy gradients. Additionally, there is a growing interest in multi-objective reinforcement learning and generalized planning using graph neural networks.
Noteworthy papers include: GCHR, which proposes a new technique for sample-efficient reinforcement learning using hindsight goal-conditioned regularization. Reparameterization Proximal Policy Optimization, which establishes a connection between reparameterization policy gradients and proximal policy optimization, enabling stable and sample-efficient training. ParBalans, which introduces a parallel multi-armed bandits-based adaptive large neighborhood search for mixed-integer programming problems, achieving competitive performance compared to state-of-the-art commercial solvers. GDBA Revisited, which proposes a novel guided local search framework for distributed constraint optimization problems, demonstrating great superiority over state-of-the-art baselines. Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning, which addresses the issue of sample efficiency in multi-objective reinforcement learning by implementing variance-reduction techniques. Scaling Up without Fading Out, which proposes a sparse, goal-aware graph neural network representation for generalized planning, effectively scaling to larger grid sizes and improving policy generalization and success rates.