Transformer Self-Attention Visualization

Input

Sentence

Attention Heads 4

Transformer Layers 2

Embedding Dim 16

Temperature 1.00

Learning Rate 0.010

Training Speed 1x

Active Head

Active Layer

Arc Opacity 1.0

Token Scale 1.0

Attention Heatmap (Head 1, Layer 1)

Head 1 (cyan)
Head 2 (magenta)
Head 3 (gold)
Head 4 (mint)
Arc brightness = attention weight
Token glow = embedding magnitude