GitHub
deepbox/ndarray

Activation Functions

Element-wise non-linear activation functions used in neural networks and signal processing. All operate on regular Tensors and return new Tensors.

ReLU

f(x) = max(0, x)

Where:

  • x = Input value

Sigmoid

σ(x) = 1 / (1 + e^(−x))

Where:

  • σ = Output in (0, 1)

Softmax

softmax(xᵢ) = e^(xᵢ) / Σⱼ e^(xⱼ)

Where:

  • xᵢ = Input element along axis

GELU

GELU(x) = x · Φ(x)

Where:

  • Φ = Standard normal CDF

Mish

mish(x) = x · tanh(softplus(x))

Where:

  • softplus(x) = ln(1 + e^x)

Swish (SiLU)

swish(x) = x · σ(x)

Where:

  • σ = Sigmoid function

ELU

ELU(x) = x if x > 0, else α(e^x − 1)

Where:

  • α = Scale for negative values (default: 1.0)

Leaky ReLU

f(x) = x if x > 0, else αx

Where:

  • α = Negative slope (default: 0.01)
relu
relu(t: Tensor): Tensor

Rectified Linear Unit. Zeroes out all negative values. The most common activation in deep learning. Cheap to compute and helps with gradient flow.

sigmoid
sigmoid(t: Tensor): Tensor

Squashes values to (0, 1). Used for binary classification output layers and gating mechanisms (LSTM, GRU).

softmax
softmax(t: Tensor, axis?: number): Tensor

Converts logits to probability distribution that sums to 1 along the given axis. Used as the final layer for multi-class classification. Supports arbitrary dimensions.

logSoftmax
logSoftmax(t: Tensor, axis?: number): Tensor

Numerically stable log(softmax(x)). Preferred over log(softmax(x)) to avoid underflow. Used with NLLLoss for classification.

gelu
gelu(t: Tensor): Tensor

Gaussian Error Linear Unit. Used in Transformer models (BERT, GPT). Smoother than ReLU with non-zero gradient for negative values.

mish
mish(t: Tensor): Tensor

Self-regularizing non-monotonic activation. Smooth and non-monotonic. Shown to outperform ReLU and Swish in some benchmarks.

swish
swish(t: Tensor): Tensor

Also known as SiLU (Sigmoid Linear Unit). Self-gated activation discovered by neural architecture search. Used in EfficientNet.

elu
elu(t: Tensor, alpha?: number): Tensor

Exponential Linear Unit. Like ReLU but with smooth negative part controlled by alpha. Helps with vanishing gradient problem.

Parameters:
alpha: number - Scale for negative values (default: 1.0)
leakyRelu
leakyRelu(t: Tensor, alpha?: number): Tensor

Leaky ReLU allows a small gradient for negative values, preventing 'dying ReLU' problem.

Parameters:
alpha: number - Negative slope (default: 0.01)
softplus
softplus(t: Tensor): Tensor

Smooth approximation to ReLU: log(1 + e^x). Always positive. Used where strictly positive output is needed.

activations.ts
import { tensor, relu, sigmoid, softmax, gelu, mish, elu, leakyRelu } from "deepbox/ndarray";const t = tensor([-2, -1, 0, 1, 2]);relu(t);           // [0, 0, 0, 1, 2]sigmoid(t);        // [0.119, 0.269, 0.5, 0.731, 0.881]gelu(t);           // [-0.045, -0.159, 0, 0.841, 1.955]mish(t);           // [-0.254, -0.303, 0, 0.865, 1.944]elu(t, 1.0);       // [-0.865, -0.632, 0, 1, 2]leakyRelu(t, 0.1); // [-0.2, -0.1, 0, 1, 2]// Softmax converts logits to probabilitiesconst logits = tensor([[2.0, 1.0, 0.1]]);const probs = softmax(logits, 1); // sums to 1.0 along axis 1