Transformer
Self-Attention
Epoch
0
| Loss
1.000
| Tokens
6
| Heads
4
Input
Sentence
The cat sat on the mat
Attention is all you need
She sells sea shells by the shore
To be or not to be
I think therefore I am
The quick brown fox jumps
Architecture
Attention Heads
4
Transformer Layers
2
Embedding Dim
16
Training
Temperature
1.00
Learning Rate
0.010
Training Speed
1x
Train
Pause
Reset
Display
Active Head
All Heads
Active Layer
All Layers
Arc Opacity
1.0
Token Scale
1.0
Q/K/V
Pos Enc
Arcs
Attention Heatmap (Head 1, Layer 1)
Head 1 (cyan)
Head 2 (magenta)
Head 3 (gold)
Head 4 (mint)
Arc brightness = attention weight
Token glow = embedding magnitude
W
A
S
D
move
Space
/
Shift
up/down Right-click + drag to look Scroll to zoom