deepbox/nn
Attention & Transformer
Attention mechanisms and Transformer building blocks for sequence-to-sequence tasks, NLP, and beyond.
MultiheadAttention
extends Module
Multi-head scaled dot-product attention. Splits queries, keys, and values into multiple heads, applies attention in parallel, then concatenates. Core building block of Transformers.
TransformerEncoderLayer
extends Module
Single Transformer encoder layer: multi-head self-attention → add & norm → feedforward → add & norm. Stack multiple for a full Transformer encoder.
Scaled Dot-Product Attention
Attention(Q,K,V) = softmax(QKᵀ / √dₖ) · V
Where:
- dₖ = Key dimension (scaling factor)
attention.ts
import { MultiheadAttention, TransformerEncoderLayer } from "deepbox/nn";import { tensor } from "deepbox/ndarray";// Multi-head attentionconst mha = new MultiheadAttention(512, 8); // embedDim=512, numHeads=8// Q, K, V all shape: (batch, seqLen, embedDim)// Transformer encoder layerconst encoder = new TransformerEncoderLayer({ dModel: 512, nHead: 8, dimFeedforward: 2048, dropout: 0.1,});