The field of vision transformers is moving towards efficient deployment, with a focus on reducing computational complexity and memory demands. Researchers are exploring innovative methods to accelerate model inference, including exploiting information redundancy in attention maps and optimizing neural networks with learnable non-linear activation functions. These advancements have the potential to enable the deployment of vision transformers at the edge, where energy efficiency and low latency are crucial. Notable papers in this area include:
- One that proposes Entropy Attention Maps (EAM) to achieve similar or higher accuracy at low sparsity in attention maps.
- Another that presents a reconfigurable lookup architecture with edge FPGAs to minimize energy-intensive arithmetic operations while preserving activation fidelity.
- A third that designs a compression method to reduce the storage and transmission costs of neural network weights in low-precision formats.
- A fourth that proposes a low-bit, model-specialized accelerator to support the deployment of vision transformers, achieving high accuracy and power efficiency on various benchmarks.