You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 1, 2023. It is now read-only.
@dominikgrewe pointed out that K-FAC needs to access the model's non-differentiable internal state from Optimizer.update(_:along:). My initial idea is to change the optimizer API to the following:
publicprotocolOptimizer{associatedtypeModel:LayerassociatedtypeScalar:FloatingPointvarlearningRate:Scalar{get}mutatingfunc update(_ variables:inoutModel, along gradient:Model.CotangentVector)}
But this comes at the cost of complicating concrete optimizers' implementation of update(_:along:):
Too long: model.allDifferentiableVariables.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self).
Too too long: model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon).
And we obviously can't assign secondMoments.allDifferentiableVariables to a local variable because we need setter access. Something like inout var modelVariables = model.allDifferentiableVariables is not possible in Swift yet until the ownership model and related things get fleshed out.
Another issue is that making model be inout is not semantically accurate: we don't want to mutate a model's non-differentiable states in an optimizer.
Maybe what we really need is optimizer-specific protocols that require model states, which models will implement. The specific optimizers will have such a generic constraint, and take these states as initializer parameters.