sgd

Creates a delegate that can be used to perform a step using the stochastic gradient descent update rule.

This function relies on automatic differentiation, so the objective (which must have a volume of 1) must be differentiable w.r.t. all elements of wrt. The returned delegate performs minimisation.

sgd
(
Operation[] outputs
,
Operation[] wrt
,
Projection[Operation] projs
,
Operation learningRate = float32([], [0.01f])
,
Operation momentumRate = float32([], [0.0f])
,
bool nesterov = false
)

Parameters

wrt Operation[]

an array of Operations that we want the derivative of objective with respect to.

learningRate Operation

the value used to scale the size of the gradient used in the update rule

momentumRate Operation

scaling factor for the previous update

nesterov bool

indicates whether Nesterov's accelerated gradient should be used

Return Value

Type: Updater

A delegate that is used to actually perform the update steps. The optimised values are stored in the "default" attributes of the elements of wrt.

Examples

import std.random : uniform;

//Generate some points
auto xdata = new float[100];
auto ydata = new float[100];

foreach(i; 0 .. 100)
{
    xdata[i] = uniform(-10.0f, 10.0f);
    ydata[i] = 3.0f * xdata[i] + 2.0f;
}

//Create the model
auto x = float32([]);
auto m = float32([]);
auto c = float32([]);

auto yhat = m * x + c;
auto y = float32([]);

//Create an SGD updater
auto updater = sgd([(yhat - y) * (yhat - y)], [m, c], null, float32([], [0.001f]), float32([], [0.9f]));

//Iterate for a while
float loss;

for(size_t i = 0; i < 300; i++)
{
    size_t j = i % 100;

    loss = updater([
        x: buffer(xdata[j .. j + 1]),
        y: buffer(ydata[j .. j + 1])
    ])[0].get!float[0];
}

//Print the loss after 500 iterations. Let the user decide whether it's good enough to be considered a pass.
import std.stdio : writeln;
writeln(
    "SGD loss: ", loss, "    ",
    "m=", m.value.get!float[0], ", ",
    "c=", c.value.get!float[0], "    ",
    "(expected m=3, c=2)");

Meta