Skip to content

Commit 47939cb

Browse files
laurilaatujmitrevs
andauthored
Depthwise convolution for oneAPI (#1131)
* snapshot adding oneapi * fix reduce constexpr * further updates * update the bridge and testbench * fix issues discovered when compiling * update bridge writing files * build library (but not tested) * fix a bug in testbench * snapshot after some debugging * remove forgotten debug printing * add build * pre-commit fixes * fix more pre-commit * fix more pre-commit errors * snapshot of work before reworking types * Use using to decide array type, some preliminary updates * snapshot unifying types * fix the testbench and bridge * snapshot updating nnet_utils (not finished) * define array in nnet_types for oneAPI * fix parallel conv2d * add back the streaming versions of algs, most unconverted * tentatively complete streaming for dense but not functional * first version that compiles streaming * change how the pipe value type is extracted * fix pre-commit error * always treat elu as ELU class * fix batchnorm * snapshot towards fixing conv * snapshot fixing test for streaming * fix conv1d * fix conv2d * fix reshape and flatten for oneAPI * initial oneAPI tests * remove nnet_dense_compressed from oneAPI * add merge functionality (untested) * fix merge for oneAPI * fix merge for oneAPI (missing commit) * add zeropadding * standardize paralellization spelling * fix pointwise for oneAPI * remove references to quartus * more replace quartus with oneapi * snapshot on the way towards implementing pooling * fix io_stream pooling for oneAPI * add fix for Conv2DBatchnorm * accidentally committed CMakeLists.txt in my debug setup * reshaping, not fully tested * fix cloning of streams * fix pytest library loading * remove unused template * fix some activation bugs * fix the overwriting of directories in the pytest * update version of test repository * try to fix docker issue * bump hls4ml-testing tag to 0.5.2 * try not restricting tensorflow-model-optimizatoin * Update to 0.5.3 for testing * bump to docker image 0.5.4, suggested by Ben * fix pre-commit warning * dial down N_TESTS_PER_YAML to 4 * revert tensorflow-model-optimization change * fix issue of saving in "obsolete" h5 format * fix embedding for oneAPI * First attempt at adding RNNs to oneAPI * fix bug in array size * fix order or indices * make queues static in bridge * fix logic error in repack stream * changing the style, but functionally identical * update pointwise optimizer for oneAPI * add oneAPI to test_multi_dense.py * fix updating weight types * initial changes of templates, for testing * fix weight naming, product selection * make im2col the default; fix winograd size * fix up streaming dense and convolution * fix prelu, some batchnorm * fix weight array of exponential types * move ACExponentialPrecisionDefinition to oneapi_types * attempt to fix batchnorm and recurrent * fixed BatchNormalizationQuantizedTanhConfigTemplate template selection * fix embedding_stream * fix lstm and simple rnn * fix GRU * fix winograd, and also disable it by default * fix threshold name * split bn_quant to be backend-specific * add type inference to oneAPI * add oneAPI to pytorch tests * fix pooling with padding for oneAPI and Quartus * Compilation for larger models enabled by increasing -fconstexpr-steps * add oneapi clone tests; remove reduntand multi_clone test * remove some attributes to avoid overwrite warnings * make extra handling for oneAPI like others (as in PR #1067) * remove warnings for extra optimizers that are not scheduled on purpose * update parametrized activations * intial depthconv2d implementation * intial depthconv2d implementation * Rename to depthconv, add strides and add tests * Remove class for DepthwiseConv2D * Remove Separable convolution template * Remove layer optimizer for sepconv * Loop unroll * Pre-commit format * Fix spelling * depthconv1d, channel order in loop, product * Gather result to accum --------- Co-authored-by: Jovan Mitrevski <[email protected]> Co-authored-by: Jovan Mitrevski <[email protected]>
1 parent 3c63e27 commit 47939cb

File tree

12 files changed

+222
-10
lines changed

12 files changed

+222
-10
lines changed

hls4ml/backends/fpga/fpga_backend.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def __init__(self, name):
9494
attrs.append(ConfigurableAttribute('reuse_factor', default=1, description=descriptions.reuse_factor))
9595
self.attribute_map[layer] = attrs
9696

97-
# seperable is kind of special because it is effectively two layers that will be split
97+
# separable is kind of special because it is effectively two layers that will be split
9898
for layer in (SeparableConv1D, SeparableConv2D):
9999
attrs = self.attribute_map.get(layer, [])
100100
attrs.append(TypeAttribute('depthwise_accum'))

hls4ml/backends/oneapi/passes/convolution_templates.py

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from hls4ml.backends.backend import get_backend
22
from hls4ml.backends.oneapi.oneapi_template import StreamFunctionCallTemplate, TaskSequenceTemplate
33
from hls4ml.backends.template import FunctionCallTemplate, LayerConfigTemplate
4-
from hls4ml.model.layers import Conv1D, Conv2D, Conv2DBatchnorm
4+
from hls4ml.model.layers import Conv1D, Conv2D, Conv2DBatchnorm, DepthwiseConv1D, DepthwiseConv2D
55

66
# TODO - Dilation rate ?
77

@@ -70,9 +70,20 @@
7070
conv1d_include_list = ['nnet_utils/nnet_conv1d.h', 'nnet_utils/nnet_conv1d_stream.h']
7171

7272

73+
depthconv1d_function_template = (
74+
'nnet::depthwise_conv_1d_{data_format}<{input_t}, {output_t}, {config}>({input}, {output}, {w}, {b});'
75+
)
76+
depthconv1d_include_list = [
77+
'nnet_utils/nnet_conv1d.h',
78+
'nnet_utils/nnet_conv1d_resource.h',
79+
'nnet_utils/nnet_depthconv1d.h',
80+
'nnet_utils/nnet_depthconv1d_resource.h',
81+
]
82+
83+
7384
class Conv1DConfigTemplate(LayerConfigTemplate):
7485
def __init__(self):
75-
super().__init__(Conv1D)
86+
super().__init__((Conv1D, DepthwiseConv1D))
7687
self.template = conv1d_config_template
7788
self.mult_template = conv_mult_config_template
7889

@@ -137,6 +148,12 @@ def format(self, node):
137148
return self.template.format(**params)
138149

139150

151+
class DepthwiseConv1DFunctionTemplate(Conv1DFunctionTemplate):
152+
def __init__(self):
153+
super(Conv1DFunctionTemplate, self).__init__(DepthwiseConv1D, include_header=depthconv1d_include_list)
154+
self.template = depthconv1d_function_template
155+
156+
140157
''' 2D Conv '''
141158
conv2d_config_template = """struct config{index} : nnet::conv2d_config {{
142159
static const unsigned in_height = {in_height};
@@ -183,7 +200,7 @@ def format(self, node):
183200

184201
class Conv2DConfigTemplate(LayerConfigTemplate):
185202
def __init__(self):
186-
super().__init__((Conv2D, Conv2DBatchnorm))
203+
super().__init__((Conv2D, Conv2DBatchnorm, DepthwiseConv2D))
187204
self.template = conv2d_config_template
188205
self.mult_template = conv_mult_config_template
189206

@@ -233,3 +250,20 @@ def format(self, node):
233250
raise RuntimeError('channels_first not supported on oneAPI')
234251
params['data_format'] = 'cl'
235252
return self.template.format(**params)
253+
254+
255+
depthconv2d_function_template = (
256+
'nnet::depthwise_conv_2d_{data_format}<{input_t}, {output_t}, {config}>({input}, {output}, {w}, {b});'
257+
)
258+
depthconv2d_include_list = [
259+
'nnet_utils/nnet_conv2d.h',
260+
'nnet_utils/nnet_conv2d_resource.h',
261+
'nnet_utils/nnet_depthconv2d.h',
262+
'nnet_utils/nnet_depthconv2d_resource.h',
263+
]
264+
265+
266+
class DepthwiseConv2DFunctionTemplate(Conv2DFunctionTemplate):
267+
def __init__(self):
268+
super(Conv2DFunctionTemplate, self).__init__(DepthwiseConv2D, include_header=depthconv2d_include_list)
269+
self.template = depthconv2d_function_template

hls4ml/model/optimizer/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
'convert',
6060
[
6161
'channels_last_converter',
62-
'seperable_to_depthwise_and_conv',
62+
'separable_to_depthwise_and_conv',
6363
'remove_transpose_before_flatten',
6464
'remove_nop_transpose',
6565
'remove_single_channel_transpose',

hls4ml/model/optimizer/passes/seperable_to_dw_conv.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
This optimizer converts a seperable convolution to a depthwise followed by a regular convolution.
2+
This optimizer converts a separable convolution to a depthwise followed by a regular convolution.
33
For backends with a custom pointwise implementations the regular convolution will subsequently
44
be converted to a pointwise convolution by a different optimizer.
55
"""
@@ -10,8 +10,8 @@
1010
from hls4ml.model.optimizer import OptimizerPass
1111

1212

13-
class SeperableToDepthwiseAndConv(OptimizerPass):
14-
"""Convert Seperable to DepthwiseConv + Conv (potentially later Pointwise)"""
13+
class SeparableToDepthwiseAndConv(OptimizerPass):
14+
"""Convert Separable to DepthwiseConv + Conv (potentially later Pointwise)"""
1515

1616
_dw_attributes = (
1717
'in_width',
@@ -70,7 +70,7 @@ def transform(self, model, node):
7070
model.config.parse_name_config(dw_name, dw_layer_config)
7171

7272
# creating the attributes
73-
dw_attributes = {k: node.attributes[k] for k in SeperableToDepthwiseAndConv._dw_attributes if k in node.attributes}
73+
dw_attributes = {k: node.attributes[k] for k in SeparableToDepthwiseAndConv._dw_attributes if k in node.attributes}
7474
dw_attributes['n_filt'] = dw_attributes['n_chan'] * dw_attributes['depth_multiplier']
7575
dw_attributes['use_bias'] = False
7676

@@ -100,7 +100,7 @@ def transform(self, model, node):
100100
model.config.parse_name_config(pw_name, pw_layer_config)
101101

102102
# creating the attributes
103-
pw_attributes = {k: node.attributes[k] for k in SeperableToDepthwiseAndConv._pw_attributes if k in node.attributes}
103+
pw_attributes = {k: node.attributes[k] for k in SeparableToDepthwiseAndConv._pw_attributes if k in node.attributes}
104104
pw_attributes['filt_width'] = 1
105105
pw_attributes['filt_height'] = 1
106106
pw_attributes['stride_width'] = 1
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#ifndef NNET_DEPTH_CONV1D_H_
2+
#define NNET_DEPTH_CONV1D_H_
3+
4+
#include "nnet_common.h"
5+
#include "nnet_conv1d.h"
6+
#include "nnet_depthconv1d_resource.h"
7+
8+
namespace nnet {
9+
10+
template <class data_T, class res_T, typename CONFIG_T>
11+
void depthwise_conv_1d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
12+
const typename CONFIG_T::bias_t &biases) {
13+
14+
depthwise_conv_1d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
15+
}
16+
17+
} // namespace nnet
18+
19+
#endif
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#ifndef NNET_DEPTH_CONV1D_LATENCY_H_
2+
#define NNET_DEPTH_CONV1D_LATENCY_H_
3+
4+
#include "nnet_common.h"
5+
#include "nnet_conv1d_resource.h"
6+
#include "nnet_mult.h"
7+
8+
namespace nnet {
9+
10+
template <class data_T, class res_T, typename CONFIG_T>
11+
void depthwise_conv_1d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
12+
const typename CONFIG_T::bias_t &biases) {
13+
14+
int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan;
15+
[[intel::fpga_register]] int res_idx = 0;
16+
17+
[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::n_filt];
18+
19+
DM_LOOP:
20+
#pragma unroll
21+
for (int dm = 0; dm < depth_multiplier; dm++) {
22+
23+
WIDTH_LOOP:
24+
#pragma unroll
25+
for (int w = 0; w < CONFIG_T::out_width; w++) {
26+
27+
CHAN_LOOP:
28+
#pragma unroll
29+
for (int c = 0; c < CONFIG_T::n_chan; c++) {
30+
31+
res_idx = (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm;
32+
33+
acc[res_idx] = biases[c * depth_multiplier + dm];
34+
35+
KERNEL_W_LOOP:
36+
#pragma unroll
37+
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) {
38+
39+
int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left;
40+
41+
if ((w_in >= 0) && (w_in < CONFIG_T::in_width)) {
42+
43+
acc[res_idx] += CONFIG_T::mult_config::
44+
template product<typename data_T::value_type, typename CONFIG_T::weight_t::value_type>::product(
45+
data[(w_in)*CONFIG_T::n_chan + c],
46+
weights[(dm * CONFIG_T::filt_width * CONFIG_T::n_chan) + (kw * CONFIG_T::n_chan) + c]);
47+
}
48+
}
49+
}
50+
}
51+
}
52+
53+
RESULT:
54+
#pragma unroll
55+
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::n_filt; ires++) {
56+
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]);
57+
}
58+
}
59+
} // namespace nnet
60+
#endif
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#ifndef NNET_DEPTH_CONV2D_H_
2+
#define NNET_DEPTH_CONV2D_H_
3+
4+
#include "nnet_common.h"
5+
#include "nnet_conv2d.h"
6+
#include "nnet_depthconv2d_resource.h"
7+
8+
namespace nnet {
9+
10+
template <class data_T, class res_T, typename CONFIG_T>
11+
void depthwise_conv_2d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
12+
const typename CONFIG_T::bias_t &biases) {
13+
14+
depthwise_conv_2d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
15+
}
16+
17+
} // namespace nnet
18+
19+
#endif
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
#ifndef NNET_SEPARABLE_CONV2D_LATENCY_H_
2+
#define NNET_SEPARABLE_CONV2D_LATENCY_H_
3+
4+
#include "nnet_common.h"
5+
#include "nnet_conv2d_resource.h"
6+
#include "nnet_mult.h"
7+
8+
namespace nnet {
9+
10+
template <class data_T, class res_T, typename CONFIG_T>
11+
void depthwise_conv_2d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
12+
const typename CONFIG_T::bias_t &biases) {
13+
14+
int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan;
15+
[[intel::fpga_register]] int res_idx = 0;
16+
17+
[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt];
18+
19+
DM_LOOP:
20+
#pragma unroll
21+
for (int dm = 0; dm < depth_multiplier; dm++) {
22+
23+
HEIGHT_LOOP:
24+
#pragma unroll
25+
for (int h = 0; h < CONFIG_T::out_height; h++) {
26+
WIDTH_LOOP:
27+
#pragma unroll
28+
for (int w = 0; w < CONFIG_T::out_width; w++) {
29+
30+
CHAN_LOOP:
31+
#pragma unroll
32+
for (int c = 0; c < CONFIG_T::n_chan; c++) {
33+
34+
res_idx =
35+
(h * CONFIG_T::out_width * CONFIG_T::n_filt) + (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm;
36+
37+
acc[res_idx] = biases[c * depth_multiplier + dm];
38+
39+
KERNEL_H_LOOP:
40+
#pragma unroll
41+
for (int kh = 0; kh < CONFIG_T::filt_height; kh++) {
42+
KERNEL_W_LOOP:
43+
#pragma unroll
44+
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) {
45+
46+
int h_in = h * CONFIG_T::stride_height + kh - CONFIG_T::pad_top;
47+
int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left;
48+
49+
if ((h_in >= 0) && (h_in < CONFIG_T::in_height) && (w_in >= 0) && (w_in < CONFIG_T::in_width)) {
50+
51+
acc[res_idx] +=
52+
CONFIG_T::mult_config::template product<typename data_T::value_type,
53+
typename CONFIG_T::weight_t::value_type>::
54+
product(
55+
data[(h_in)*CONFIG_T::in_width * CONFIG_T::n_chan + (w_in)*CONFIG_T::n_chan + c],
56+
weights[(dm * CONFIG_T::filt_height * CONFIG_T::filt_width * CONFIG_T::n_chan) +
57+
(kh * CONFIG_T::filt_width * CONFIG_T::n_chan) +
58+
(kw * CONFIG_T::n_chan) + c]);
59+
60+
;
61+
}
62+
}
63+
}
64+
}
65+
}
66+
}
67+
}
68+
69+
RESULT:
70+
#pragma unroll
71+
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt; ires++) {
72+
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]);
73+
}
74+
}
75+
} // namespace nnet
76+
#endif

test/pytest/test_depthconv1d.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
@pytest.mark.parametrize(
2424
'backend, io_type',
2525
[
26+
('oneAPI', 'io_parallel'),
2627
('Vivado', 'io_parallel'),
2728
('Vitis', 'io_parallel'),
2829
('Vivado', 'io_stream'),

test/pytest/test_depthconv2d.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
@pytest.mark.parametrize(
2525
'backend, io_type',
2626
[
27+
('oneAPI', 'io_parallel'),
2728
('Vivado', 'io_parallel'),
2829
('Vitis', 'io_parallel'),
2930
('Vivado', 'io_stream'),

test/pytest/test_sepconv1d.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
@pytest.mark.parametrize(
2424
'backend, io_type',
2525
[
26+
('oneAPI', 'io_parallel'),
2627
('Vivado', 'io_parallel'),
2728
('Vitis', 'io_parallel'),
2829
('Vivado', 'io_stream'),

test/pytest/test_sepconv2d.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
@pytest.mark.parametrize(
2424
'backend, io_type',
2525
[
26+
('oneAPI', 'io_parallel'),
2627
('Vivado', 'io_parallel'),
2728
('Vitis', 'io_parallel'),
2829
('Vivado', 'io_stream'),

0 commit comments

Comments
 (0)