The field of long context modeling is moving towards more efficient and scalable solutions, with a focus on reducing the quadratic complexity of self-attention mechanisms. Researchers are exploring novel compression techniques, such as sequence-level compression and soft context compression, to improve the performance of large language models on long-context tasks. Additionally, there is a growing interest in developing linear attention architectures that can efficiently transfer the capabilities of pre-trained transformers. These advancements have the potential to enable real-time memory savings, faster prefill speeds, and improved performance on long-range dependency modeling tasks. Noteworthy papers include: UniGist, which introduces a sequence-level long-context compression framework that preserves context information, and LAWCAT, which proposes a novel linearization framework for efficient distillation from quadratic to linear attention. CompLLM is also notable for its soft compression technique designed for practical deployment, and ExPe introduces a novel approach to position embeddings that enables extrapolation to sequences longer than those seen during training.