29
Attention
Transformer
Self-Attention
Neural Networks
Attention & Transformer Layers
The Transformer architecture, based on self-attention, revolutionized NLP and now dominates many ML domains. This example demonstrates Deepbox's two attention layers: MultiheadAttention (computes scaled dot-product attention across multiple heads, allowing the model to attend to different positions simultaneously) and TransformerEncoderLayer (a full encoder block combining self-attention, feedforward network, layer normalization, and residual connections). You create both layers, pass sequence tensors through them, and inspect the output shapes. The example explains the query/key/value paradigm and how multi-head attention enables the model to learn different types of relationships.
Deepbox Modules Used
deepbox/ndarraydeepbox/nnWhat You Will Learn
- MultiheadAttention splits dModel into nHeads — each head learns different patterns
- Query/Key/Value are projections of the input — self-attention uses the same input for all three
- TransformerEncoderLayer = SelfAttention + FFN + LayerNorm + Residual connections
- Attention output preserves sequence length and model dimension
Source Code
29-attention-transformer/index.ts
1import { randn } from "deepbox/ndarray";2import { MultiheadAttention, TransformerEncoderLayer } from "deepbox/nn";34console.log("=== Attention & Transformer ===\n");56// MultiheadAttention: dModel=64, nHeads=87const mha = new MultiheadAttention(64, 8);8const q = randn([2, 10, 64]); // [batch, seq, dModel]9const k = randn([2, 10, 64]);10const v = randn([2, 10, 64]);11const attnOut = mha.forward(q, k, v);12console.log("MHA input:", q.shape);13console.log("MHA output:", attnOut.shape); // [2, 10, 64]1415// TransformerEncoderLayer16const encoder = new TransformerEncoderLayer(64, 8, { dimFeedforward: 256 });17const src = randn([2, 10, 64]);18const encOut = encoder.forward(src);19console.log("\nEncoder input:", src.shape);20console.log("Encoder output:", encOut.shape); // [2, 10, 64]Console Output
$ npx tsx 29-attention-transformer/index.ts
=== Attention & Transformer ===
MHA input: [2, 10, 64]
MHA output: [2, 10, 64]
Encoder input: [2, 10, 64]
Encoder output: [2, 10, 64]