Ge²mS-T 多维分组脉冲Transformer架构方法论。通过时间、空间和网络结构三维分组计算,解决 S-ViT 的内存、准确率和能耗三难问题。包含 Grouped-Exponential-Coding IF (ExpG-IF) 模型和 Group-wise Spiking Self-Attention (GW-SSA)。Activation: Ge²mS-T, spiking transformer, multidimensional grouping, SNN vision transformer, energy efficient vision.
A novel spiking vision transformer architecture implementing grouped computation across temporal, spatial, and network structure dimensions to resolve the triad of memory overhead, learning capability, and energy budget in Spiking Vision Transformers (S-ViTs).
Spiking Vision Transformers (S-ViTs) suffer from inherent limitations in existing paradigms (ANN-SNN Conversion and Spatial-Temporal Backpropagation). Ge²mS-T addresses these issues through multi-dimensional grouped computation, achieving superior performance with ultra-high energy efficiency.
Key Innovation: First work to systematically establish multi-dimensional grouped computation for S-ViTs.
Function:
Advantages:
Function:
Mechanism:
Time-step Grouping:
Benefits:
Token Grouping:
Implementation:
# Multi-scale token grouping
spatial_groups = [
group_size=4, # Fine-grained
group_size=8, # Medium
group_size=16 # Coarse
]
Channel/Head Grouping:
Benefits:
Neuron Dynamics:
V[t] = V[t-1] + Σ w_i * s_i[t] - θ * s_out[t-1]
if V[t] ≥ θ: s_out[t] = 1, V[t] = V_reset