The field of visual autoregressive generation is moving towards improving efficiency and reducing computational costs. Researchers are exploring various approaches, including hybrid-grained caching, dynamic activation frameworks, and partial verification skipping, to accelerate generation while maintaining visual quality. These innovative methods are advancing the field by reducing inference latency, computational overhead, and memory usage. Noteworthy papers include:
- ActVAR, which achieves up to 21.2% FLOPs reduction with minimal performance degradation.
- VVS, which reduces the number of target model forward passes by a factor of 2.8x relative to vanilla AR decoding while maintaining competitive generation quality.
- AMS-KV, which reduces KV cache usage by up to 84.83% and self-attention latency by 60.48%.
- VARiant, which reduces memory consumption by 40-65% and achieves 3.5 times speedup and 80% memory reduction at moderate quality cost.