The field of federated learning is moving towards addressing the challenges of communication efficiency, client drift, and data heterogeneity. Researchers are exploring innovative optimization techniques, such as matrix orthogonalization and momentum aggregation, to improve the convergence rate and test accuracy of federated learning models. Theoretical analyses are also being conducted to understand the limitations of federated optimization and provide principled explanations for performance degradation in non-iid settings. Noteworthy papers include: FedMuon, which introduces a novel optimizer that incorporates matrix orthogonalization and momentum aggregation to achieve a linear speedup convergence rate. FedAdamW, which proposes a communication-efficient optimizer that aligns local updates with the global update using a local correction mechanism and decoupled weight decay.