Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Commit d5fcd98

Browse files
larroyapeforest
authored andcommitted
[DOC] refine autograd docs (#15109)
* refine autograd docs * CR comments * Fix examples * CR comments * Followup CR * CR
1 parent 42a47b1 commit d5fcd98

File tree

3 files changed

+74
-6
lines changed

3 files changed

+74
-6
lines changed

docs/api/python/autograd/autograd.md

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,28 @@ to allocate space for the gradient. Then, start a `with autograd.record()` block
4242
and do some computation. Finally, call `backward()` on the result:
4343

4444
```python
45-
>>> x = mx.nd.array([1,2,3,4])
46-
>>> x.attach_grad()
47-
>>> with mx.autograd.record():
48-
... y = x * x + 1
49-
>>> y.backward()
50-
>>> print(x.grad)
45+
import mxnet as mx
46+
x = mx.nd.array([1,2,3,4])
47+
x.attach_grad()
48+
with mx.autograd.record():
49+
y = x * x + 1
50+
y.backward()
51+
print(x.grad)
52+
```
53+
54+
Which outputs:
55+
56+
```
5157
[ 2. 4. 6. 8.]
5258
<NDArray 4 @cpu(0)>
5359
```
5460

61+
Gradient recording is enabled during the scope of the `with mx.autograd.record():` statement, then
62+
disabled when we go out of that scope.
63+
64+
It can be also set manually by executing `mx.autograd.set_recording(True)`, and turning it off after
65+
we no longer want to record operations with `mx.autograd.set_recording(False)`.
66+
5567

5668
## Train mode and Predict Mode
5769

@@ -76,8 +88,59 @@ Detailed tutorials are available in Part 1 of
7688
[the MXNet gluon book](http://gluon.mxnet.io/).
7789

7890

91+
# Higher order gradient
92+
93+
Some operators support higher order gradients. Some operators support differentiating multiple
94+
times, and others two, most just once.
95+
96+
For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording
97+
and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that
98+
the first call uses `create_graph=True` and `retain_graph=True` and the second call uses
99+
`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If
100+
we would be to recreate the graph in the second call, we would end up with a graph of just the
101+
backward nodes, not the full initial graph that includes the forward nodes.
102+
103+
The pattern to calculate higher order gradients is the following:
104+
105+
```python
106+
from mxnet import ndarray as nd
107+
from mxnet import autograd as ag
108+
x = nd.array([1,2,3])
109+
x.attach_grad()
110+
def f(x):
111+
# Any function which supports higher oder gradient
112+
return nd.log(x)
113+
```
114+
115+
If the operators used in `f` don't support higher order gradients you will get an error like
116+
`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means
117+
that it doesn't support getting the gradient of the gradient. Which is, running backward on
118+
the backward graph.
119+
120+
Using mxnet.autograd.grad multiple times:
121+
122+
```python
123+
with ag.record():
124+
y = f(x)
125+
x_grad = ag.grad(heads=y, variables=x, create_graph=True, retain_graph=True)[0]
126+
x_grad_grad = ag.grad(heads=x_grad, variables=x, create_graph=False, retain_graph=False)[0]
127+
```
128+
129+
Running backward on the backward graph:
130+
131+
```python
132+
with ag.record():
133+
y = f(x)
134+
x_grad = ag.grad(heads=y, variables=x, create_graph=True, retain_graph=True)[0]
135+
x_grad.backward()
136+
x_grad_grad = x.grad
137+
```
79138

139+
Both methods are equivalent, except that in the second case, retain_graph on running backward is set
140+
to False by default. But both calls are running a backward pass as on the graph as usual to get the
141+
gradient of the first gradient `x_grad` with respect to `x` evaluated at the value of `x`.
80142

143+
For more examples, check the [higher order gradient unit tests](https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_higher_order_grad.py).
81144

82145

83146
<script type="text/javascript" src='../../../_static/js/auto_module_index.js'></script>

python/mxnet/autograd.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,9 @@ def predict_mode():
197197
def mark_variables(variables, gradients, grad_reqs='write'):
198198
"""Mark NDArrays as variables to compute gradient for autograd.
199199
200+
This is equivalent to the function .attach_grad() in a variable, but with this
201+
call we can set the gradient to any value.
202+
200203
Parameters
201204
----------
202205
variables: NDArray or list of NDArray

python/mxnet/ndarray/ndarray.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2243,6 +2243,8 @@ def attach_grad(self, grad_req='write', stype=None):
22432243
"""Attach a gradient buffer to this NDArray, so that `backward`
22442244
can compute gradient with respect to it.
22452245
2246+
The gradient is initialized to zeros.
2247+
22462248
Parameters
22472249
----------
22482250
grad_req : {'write', 'add', 'null'}

0 commit comments

Comments
 (0)