@@ -42,16 +42,28 @@ to allocate space for the gradient. Then, start a `with autograd.record()` block
42
42
and do some computation. Finally, call ` backward() ` on the result:
43
43
44
44
``` python
45
- >> > x = mx.nd.array([1 ,2 ,3 ,4 ])
46
- >> > x.attach_grad()
47
- >> > with mx.autograd.record():
48
- ... y = x * x + 1
49
- >> > y.backward()
50
- >> > print (x.grad)
45
+ import mxnet as mx
46
+ x = mx.nd.array([1 ,2 ,3 ,4 ])
47
+ x.attach_grad()
48
+ with mx.autograd.record():
49
+ y = x * x + 1
50
+ y.backward()
51
+ print (x.grad)
52
+ ```
53
+
54
+ Which outputs:
55
+
56
+ ```
51
57
[ 2. 4. 6. 8.]
52
58
<NDArray 4 @cpu(0)>
53
59
```
54
60
61
+ Gradient recording is enabled during the scope of the ` with mx.autograd.record(): ` statement, then
62
+ disabled when we go out of that scope.
63
+
64
+ It can be also set manually by executing ` mx.autograd.set_recording(True) ` , and turning it off after
65
+ we no longer want to record operations with ` mx.autograd.set_recording(False) ` .
66
+
55
67
56
68
## Train mode and Predict Mode
57
69
@@ -76,8 +88,59 @@ Detailed tutorials are available in Part 1 of
76
88
[ the MXNet gluon book] ( http://gluon.mxnet.io/ ) .
77
89
78
90
91
+ # Higher order gradient
92
+
93
+ Some operators support higher order gradients. Some operators support differentiating multiple
94
+ times, and others two, most just once.
95
+
96
+ For calculating higher order gradients, we can use the ` mx.autograd.grad ` function while recording
97
+ and then call backward, or call ` mx.autograd.grad ` two times. If we do the latter, is important that
98
+ the first call uses ` create_graph=True ` and ` retain_graph=True ` and the second call uses
99
+ ` create_graph=False ` and ` retain_graph=True ` . Otherwise we will not get the results that we want. If
100
+ we would be to recreate the graph in the second call, we would end up with a graph of just the
101
+ backward nodes, not the full initial graph that includes the forward nodes.
102
+
103
+ The pattern to calculate higher order gradients is the following:
104
+
105
+ ``` python
106
+ from mxnet import ndarray as nd
107
+ from mxnet import autograd as ag
108
+ x = nd.array([1 ,2 ,3 ])
109
+ x.attach_grad()
110
+ def f (x ):
111
+ # Any function which supports higher oder gradient
112
+ return nd.log(x)
113
+ ```
114
+
115
+ If the operators used in ` f ` don't support higher order gradients you will get an error like
116
+ ` operator ... is non-differentiable because it didn't register FGradient attribute. ` . This means
117
+ that it doesn't support getting the gradient of the gradient. Which is, running backward on
118
+ the backward graph.
119
+
120
+ Using mxnet.autograd.grad multiple times:
121
+
122
+ ``` python
123
+ with ag.record():
124
+ y = f(x)
125
+ x_grad = ag.grad(heads = y, variables = x, create_graph = True , retain_graph = True )[0 ]
126
+ x_grad_grad = ag.grad(heads = x_grad, variables = x, create_graph = False , retain_graph = False )[0 ]
127
+ ```
128
+
129
+ Running backward on the backward graph:
130
+
131
+ ``` python
132
+ with ag.record():
133
+ y = f(x)
134
+ x_grad = ag.grad(heads = y, variables = x, create_graph = True , retain_graph = True )[0 ]
135
+ x_grad.backward()
136
+ x_grad_grad = x.grad
137
+ ```
79
138
139
+ Both methods are equivalent, except that in the second case, retain_graph on running backward is set
140
+ to False by default. But both calls are running a backward pass as on the graph as usual to get the
141
+ gradient of the first gradient ` x_grad ` with respect to ` x ` evaluated at the value of ` x ` .
80
142
143
+ For more examples, check the [ higher order gradient unit tests] ( https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_higher_order_grad.py ) .
81
144
82
145
83
146
<script type =" text/javascript " src =' ../../../_static/js/auto_module_index.js ' ></script >
0 commit comments