AbstractReversibleSolver + ReversibleAdjoint #603

sammccallum · 2025-03-14T12:01:58Z

Re-opening #593.

Implements AbstractReversibleSolver base class and ReversibleAdjoint for reversible back propagation.

This updates SemiImplicitEuler, LeapfrogMidpoint and ReversibleHeun to subclass AbstractReversibleSolver.

Implementation

AbstractReversibleSolver subclasses AbstractSolver and adds a backward_step method:

@abc.abstractmethod
def backward_step(
    self,
    terms: PyTree[AbstractTerm],
    t0: RealScalarLike,
    t1: RealScalarLike,
    y1: Y,
    args: Args,
    solver_state: _SolverState,
    made_jump: BoolScalarLike,
) -> tuple[Y, DenseInfo, _SolverState]:

This method should reconstruct y0, solver_state at t0 from y1, solver_state at t1. See the aforementioned solvers for examples.

When backpropagating, ReversibleAdjoint uses this backward_step to reconstruct state. We then take a vjp through a local forward step and accumulate gradients.

ReversibleAdjoint now also pulls back gradients from any interpolated values, so we can use SaveAt(ts=...)!

We allow arbitrary solver_state (provided it can be reconstructed reversibly) and calculate gradients w.r.t. solver_state. Finally, we pull back these gradients onto y0, args, terms using the solver.init method.

* _integrate.py * Added new test checking gradient of vmapped diffeqsolve * Import optimistix * Fixed issue * added .any() * diffrax root finder

in python-poetry ~=3.9 is interpreted as >=3.9<3.10 [2], though it should be >=3.9,<4.0 [2] https://python-poetry.org/docs/dependency-specification/

merge changes from AbstractReversibleSolver

sammccallum · 2025-03-14T12:04:58Z

I've also added the Reversible RK solvers here which just subclasses AbstractReversibleAdjoint. Let me know what you think of this and I can add some documentation when it's good to go!

patrick-kidger

Okay, gosh, this one took far too long for me to get around. Thank you for your patience! If I can I'd like this to be the next big thing I focus on getting in to Diffrax.

patrick-kidger · 2025-05-03T10:45:49Z

diffrax/__init__.py

@@ -105,6 +106,7 @@
    MultiButcherTableau as MultiButcherTableau,
    QUICSORT as QUICSORT,
    Ralston as Ralston,
+    Reversible as Reversible,


Let's call this something more specific to help distinguish what it is! I think here it's not clear that it's a solver, and even if it was ReversibleSolver then that still wouldn't disambiguate amongst the various kinds of reversibility it's possible to cook up.

What do you call this yourself / in your paper? Its structure is Hamiltonian/coupled/etc so maybe a word of that sort is suitable.

The call will look like this:

solver = diffrax.Reversible(diffrax.Tsit5())

with the idea being that you are "reversifying" Tsit5. So it isn't a solver in itself, but a wrapper. The boring name could be something like ReversibleWrapper and the fun name could be something like Reversify. Thoughts?

Sure, but in the future we may cook up some other way of reversifying a solver! We should pick a name for this one that leaves that possibility open.

James has gone for U-Reversible (after the Uno reverse card :). The analogy is that we take a step forward from z0 to y1, then reverse and pull back from y1 onto z1.

diffrax/_solver/base.py

diffrax/_solver/reversible.py

diffrax/_solver/base.py

diffrax/_adjoint.py

patrick-kidger · 2025-05-03T11:42:30Z

diffrax/_integrate.py

+                reversible_save_index + 1, tprev, reversible_ts
+            )
+            reversible_save_index = reversible_save_index + jnp.where(keep_step, 1, 0)
+


A very minor bug here: if it just so happens that we run with t0 == t1 then we'll end up with reversible_ts = [t0 inf inf inf ...], which will not produce desired results in the backward solve.

We have a special branch to handle the saving in the t0 == t1 case, we should add a line handling the state.reversible_ts is not None case there.

patrick-kidger · 2025-05-03T11:50:35Z

diffrax/_adjoint.py

+
+    # Pull solver_state gradients back onto y0, args, terms.
+
+    _, init_vjp = eqx.filter_vjp(solver.init, terms, ts[0], ts[1], y0, args)


It's not super clear to me that ts[0], ts[1] are the correct values here. It looks like the saving routine is storing tprev, which is not necessarily the same as state.tnext, and the latter is what solver.init was originally called with.

In principle the step size controller could return anything at all; in practice it is possible for tprev to be 2 ULPs greater than state.tnext when passing to the other side of a discontinuity.

The ts here are reversible_ts which follows the same logic as SaveAt(steps=True).

Is it not the case that the state.tnext identified in diffeqsolve (and used for solver.init) has to be the first step that the solver took? I appreciate that they can be different at later points in the solve, but my understanding was that the first step was set in diffeqsolve?

So I think in this case you're saving the tprev of the second step, not the tnext of the first.

Yep, you're right.

We now return the tprev and tnext passed to solver.init as a residual in the reversible loop and use these to get the vjp.

patrick-kidger · 2025-05-03T11:56:35Z

diffrax/_integrate.py

+    # Reversible info
+    if max_steps is None:
+        reversible_ts = None
+        reversible_save_index = None
+    else:
+        reversible_ts = jnp.full(max_steps + 1, jnp.inf, dtype=time_dtype)
+        reversible_save_index = 0


I've thought of an alternative for this extra buffer, btw: ReversibleAdjoint.loop could intercept saveat and add an SubSaveAt(steps=True, save_fn=lambda t, y, args: None) to record the extra times. Then peel it off again when returning the final state.

I think that (a) might be doable without making any changes to _integrate.py and (b) would allow for also supporting SaveAt(steps=True). (As in that case we can just skip adding the extra SubSaveAt.) And (c) would avoid a few of the subtle issues I've commented on above about exactly which tprev/tnext-like value is actually being saved, because you can trust in the rest of the existing diffeqsolve to do that for you.

It's not a strong suggestion though.

Yeah, this was the original idea I tried but I couldn't get around a leaked tracer error! I'm willing to give it another go if you start feeling strongly about it though ;)

Maybe let's nail everything else down and then consider this. Reflecting on this, I do suspect it will make the code much easier to maintain in the long run.

patrick-kidger · 2025-05-11T19:13:06Z

diffrax/_solver/reversible.py

+        y1 = (self.coupling_parameter * (ω(y0) - ω(z0)) + ω(step_z0)).ω
+
+        step_y1, y_error, _, _, result2 = self.solver.step(
+            terms, t1, t0, y1, args, original_solver_state, True


I've just spotted this has t0 and t1 back-to-front, which I think may in general mess with our solving logic as described previously. Is this intended / is it possible not to?

Yeah, this is intended and is essential to the solver!

patrick-kidger

Not sure if you were ready for a review on this yet, but I took a look over anyway 😁 We're making really good progress! In particular now that we're settled on just AbstractERK then I think all our complicated state-reconstruction concerns go away, so the chance of footgunning ourselves has gone way down 😁

diffrax/_solver/base.py

diffrax/_solver/leapfrog_midpoint.py

patrick-kidger · 2025-05-29T16:02:38Z

diffrax/_solver/leapfrog_midpoint.py

+        # (i.e. the state used on the forward). Otherwise, in `ReversibleAdjoint`,
+        # we would take a local forward step from an incorrect `solver_state`.
+        solver_state = jax.lax.cond(
+            tm1 > 0, lambda _: (tm1, ym1, dt), lambda _: (t0, y0, dt), None


I think this predicate is assuming we're solving over the time interval of the form [0, T]? But in practice we might have that the left endpoint is nonzero.

This aside it's (a) possible to use jnp.where, which is lower-overhead for small stuff like this, but also (b) I think lax.cond(pred, lambda: foo, lambda: bar) should work, without arguments.

patrick-kidger · 2025-05-29T16:15:22Z

diffrax/_solver/leapfrog_midpoint.py

+        # We pre-compute the step size to avoid numerical instability during the
+        # backward_step. This is okay (albeit slightly ugly) as `LeapfrogMidpoint` can't
+        # be used with adaptive step sizes.
+        dt = t1 - t0


I think this will mean we silently get the wrong results if used with diffrax.StepTo with non-constant step size.

I think we might be able to save this by adjusting the backward step logic, so that it computes y0 in the step that returns it, rather than in the step before:

def backward_step(...): t2, y2 = solver_state control = terms.contr(t0, t2) y0 = (y2**ω - terms.vf_prod(t0, y1, args, control) ** ω).ω solver_state = (t1, y1) return y0, ...

note that this is also essentially the same idea as what we currently do in the forward step. I think this might require some detail around how to handle the first and last step, still.

That's much nicer! This should work, but as you say, requires special handling of the first and last step. I've had a think about how to do this (and the point above about assuming [0, T]) and I believe it's not possible without knowing where the start and end point of the solve is (or at least when we're at the first/last step)!

This feels like it's questioning the attempt to use a single-step api to represent multi-step solvers - hinting at this line:

diffrax/diffrax/_solver/leapfrog_midpoint.py

Line 18 in d3c1430

# TODO: support arbitrary linear multistep methods

I have a few ideas for how to expand the leapfrog midpoint api to support this (e.g. have first_step, last_step flags or hold t0, t1 as attributes) but I'm not sure these are elegant enough for diffrax lol... wdyt?

patrick-kidger · 2025-05-29T16:16:09Z

diffrax/_solver/reversible.py

+        ```
+    """
+
+    solver: AbstractERK


Now that we're special-casing to just this, then I am much more comfortable with the logic below!

patrick-kidger · 2025-05-29T16:18:21Z

diffrax/_solver/reversible.py

+            raise ValueError(
+                "`UReversible` is only compatible with `AbstractERK` base solvers."
+            )
+        original_solver_init = self.solver.init(terms, t0, t1, y0, args)


So IIUC we're always going to be using the non-FSAL version of AbstractERK here?

If so then we get to sidstep all the painful solver_state difficulties that we've been debating back-and-forth because this will just always be None?

In which case can we have an assert original_solver_init is None here, and likewise elide it throughout the rest of the logic below? (If need be we can also add a flag to AbstractRungeKutta to force non-FSAL-ness, and set that here.)

Yes, this is correct. We are always using an AbstractERK with made_jump=True but we haven't explicitly turned off the fsal flag. Before we switched to made_jump we were turning off fsal by:

object.__setattr__(self.solver.tableau, "fsal", False)

But this modifies solver in place which is not ideal. This is to say that original_solver_init is not None in the current implementation. It would be nice to set fsal=False so that we can pass around the None throughout.

I think something like this should do the trick then:

def __init__(self, solver: AbstractERK): self.solver = eqx.tree_at(lambda s: s.disable_fsal, solver, True)

And then add this flag to AbstractRungeKutta here:

diffrax/diffrax/_solver/runge_kutta.py

Line 356 in 482de90

scan_kind: None | Literal["lax", "checkpointed", "bounded"] = None

Which takes effect here:

diffrax/diffrax/_solver/runge_kutta.py

Line 398 in 482de90

fsal = fsal and not vf_expensive

diffrax/_adjoint.py

ricor07 and others added 30 commits February 8, 2025 22:44

Allowing args into grad_f for ULD

679d68c

Test fixes for v0.5.0 + args for langevin

0217f92

Fix for making vmap over diffeqsolve possible (patrick-kidger#578)

36a6b00

* _integrate.py * Added new test checking gradient of vmapped diffeqsolve * Import optimistix * Fixed issue * added .any() * diffrax root finder

Tweak test name

25d25a8

Update pyproject.toml to meet poetry conventions

287fff3

in python-poetry ~=3.9 is interpreted as >=3.9<3.10 [2], though it should be >=3.9,<4.0 [2] https://python-poetry.org/docs/dependency-specification/

Fixed a major source of bugs: ControlTerms no longer broadcast.

5aa502c

Now using jaxtyping.Real for prettier documentation.

211f1de

Bumped minimum version of Python to 3.10

44154e1

Investigating if we can drop the typeguard dependency.

a1f3c6d

Split out jump/step clipping in stepsize controllers.

dc78156

Reworked JumpStepWrapper.

31a887d

Removes typeguard dependency

ff301da

Update citation-handling to work with nested step size controllers

50c9ff1

AbstractReversibleSolver + ReversibleAdjoint

aacd3c8

add reversible

28b83f6

bsky

56adef2

bsky2

a67b629

testing

fb85d1c

AbstractReversibleSolver + ReversibleAdjoint

9078bed

allow arbitrary interpolation

78c9858

unpacking over indexing

efa3765

jax while loop

67baf5d

collapse saveat ValueErrors

01a7cc3

remove statonovich solver condition

61bbe3c

remove unused returns from AbstractReversibleSolver backward_step

529910e

testing

3f46c24

Merge branch 'AbstractReversibleSolver' into add_reversible

10097b4

merge changes from AbstractReversibleSolver

add test and remove messy benchmark

e63dacc

add wrapped solver + tests

2fbd6ee

made_jump=True for both solver steps

39075ea

sammccallum added 3 commits March 14, 2025 11:44

improve docstrings

d158c11

AbstractSolver and docstring note about SDEs

de6037f

merge main

ea75eeb

patrick-kidger reviewed May 3, 2025

View reviewed changes

sammccallum added 3 commits May 11, 2025 08:27

add AbstractReversibleSolver to public API

dbf8434

newline in docstrings

e983b46

return RESULTS from reversible backward_step

9024d2f

patrick-kidger reviewed May 11, 2025

View reviewed changes

restrict Reversible to AbstractERK and check result in adjoint

3a26ac3

sammccallum force-pushed the AbstractReversibleSolver branch from 0cfd4ec to 3a26ac3 Compare May 14, 2025 10:14

sammccallum added 3 commits May 14, 2025 11:48

correct tprev and tnext of solver init

a27015e

switch to linear interpolation and y0,y1 dense_info

74cfa8f

name UReversible

661825a

patrick-kidger reviewed May 29, 2025

View reviewed changes

sammccallum added 3 commits May 30, 2025 11:23

various doc formatting changes

52542d9

AbstractReversibleSolver check

446e748

add disable_fsal property to AbstractRungeKutta and use in UReversible

4e2c011

patrick-kidger mentioned this pull request Jun 4, 2025

Stateful Controls #604

Open


		# Pull solver_state gradients back onto y0, args, terms.

		_, init_vjp = eqx.filter_vjp(solver.init, terms, ts[0], ts[1], y0, args)

Uh oh!

AbstractReversibleSolver + ReversibleAdjoint #603

Are you sure you want to change the base?

AbstractReversibleSolver + ReversibleAdjoint #603

Uh oh!

Conversation

sammccallum commented Mar 14, 2025

Implementation

Uh oh!

sammccallum commented Mar 14, 2025

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!