Activation Functions
ReLU
Where:
- x = Input value
Sigmoid
Where:
- σ = Output in (0, 1)
Softmax
Where:
- xᵢ = Input element along axis
GELU
Where:
- Φ = Standard normal CDF
Mish
Where:
- softplus(x) = ln(1 + e^x)
Swish (SiLU)
Where:
- σ = Sigmoid function
ELU
Where:
- α = Scale for negative values (default: 1.0)
Leaky ReLU
Where:
- α = Negative slope (default: 0.01)
Rectified Linear Unit. Zeroes out all negative values. The most common activation in deep learning. Cheap to compute and helps with gradient flow.
Squashes values to (0, 1). Used for binary classification output layers and gating mechanisms (LSTM, GRU).
Converts logits to probability distribution that sums to 1 along the given axis. Used as the final layer for multi-class classification. Supports arbitrary dimensions.
Numerically stable log(softmax(x)). Preferred over log(softmax(x)) to avoid underflow. Used with NLLLoss for classification.
Gaussian Error Linear Unit. Used in Transformer models (BERT, GPT). Smoother than ReLU with non-zero gradient for negative values.
Self-regularizing non-monotonic activation. Smooth and non-monotonic. Shown to outperform ReLU and Swish in some benchmarks.
Also known as SiLU (Sigmoid Linear Unit). Self-gated activation discovered by neural architecture search. Used in EfficientNet.
Exponential Linear Unit. Like ReLU but with smooth negative part controlled by alpha. Helps with vanishing gradient problem.
alpha: number - Scale for negative values (default: 1.0)Leaky ReLU allows a small gradient for negative values, preventing 'dying ReLU' problem.
alpha: number - Negative slope (default: 0.01)Smooth approximation to ReLU: log(1 + e^x). Always positive. Used where strictly positive output is needed.
import { tensor, relu, sigmoid, softmax, gelu, mish, elu, leakyRelu } from "deepbox/ndarray";const t = tensor([-2, -1, 0, 1, 2]);relu(t); // [0, 0, 0, 1, 2]sigmoid(t); // [0.119, 0.269, 0.5, 0.731, 0.881]gelu(t); // [-0.045, -0.159, 0, 0.841, 1.955]mish(t); // [-0.254, -0.303, 0, 0.865, 1.944]elu(t, 1.0); // [-0.865, -0.632, 0, 1, 2]leakyRelu(t, 0.1); // [-0.2, -0.1, 0, 1, 2]// Softmax converts logits to probabilitiesconst logits = tensor([[2.0, 1.0, 0.1]]);const probs = softmax(logits, 1); // sums to 1.0 along axis 1