The field of video and image generation is moving towards more efficient models, with a focus on reducing computational cost and memory usage. Recent research has explored various methods to achieve this, including knowledge distillation, post-training quantization, and novel tokenization techniques. These advancements have led to significant improvements in model compression and inference acceleration, while maintaining or even surpassing the performance of full models. Notable papers in this area include V.I.P., which proposes an effective distillation method for efficient video diffusion models, and LRQ-DiT, which introduces a log-based quantization method for diffusion transformers. Additionally, S2Q-VDiT and WeTok have made significant contributions to quantized video diffusion transformers and discrete tokenization for high-fidelity visual reconstruction, respectively.