-
Notifications
You must be signed in to change notification settings - Fork 461
Depthwise convolution for oneAPI #1131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
JanFSchulte
merged 129 commits into
fastmachinelearning:main
from
laurilaatu:oneapi_separableconv
Dec 18, 2024
Merged
Changes from all commits
Commits
Show all changes
129 commits
Select commit
Hold shift + click to select a range
ce287f0
snapshot adding oneapi
jmitrevs cd0a2b8
fix reduce constexpr
jmitrevs 3b3d40d
further updates
jmitrevs b742901
update the bridge and testbench
jmitrevs 8f6ef78
fix issues discovered when compiling
jmitrevs 2e56be4
update bridge writing files
jmitrevs db780f0
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs b90021f
build library (but not tested)
jmitrevs f086aa2
fix a bug in testbench
jmitrevs 1f28cbf
snapshot after some debugging
jmitrevs 3e69b9a
remove forgotten debug printing
jmitrevs 17e6856
add build
jmitrevs 2766a6e
pre-commit fixes
jmitrevs c4ce138
fix more pre-commit
jmitrevs 354d708
fix more pre-commit errors
jmitrevs 8119029
snapshot of work before reworking types
jmitrevs cae1a8a
Use using to decide array type, some preliminary updates
jmitrevs 06a8c27
snapshot unifying types
jmitrevs 8f58778
fix the testbench and bridge
jmitrevs 86b0f4b
snapshot updating nnet_utils (not finished)
jmitrevs 62c5ecb
define array in nnet_types for oneAPI
jmitrevs d203b42
fix parallel conv2d
jmitrevs f983ece
add back the streaming versions of algs, most unconverted
jmitrevs 5dd9282
tentatively complete streaming for dense but not functional
jmitrevs 09b9513
first version that compiles streaming
jmitrevs 0e3f9ba
change how the pipe value type is extracted
jmitrevs e9f49ad
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs 99038eb
fix pre-commit error
jmitrevs 3d555ac
always treat elu as ELU class
jmitrevs 68c6a51
fix batchnorm
jmitrevs a3f5b3c
snapshot towards fixing conv
jmitrevs 0cbf5be
snapshot fixing test for streaming
jmitrevs 75c9301
fix conv1d
jmitrevs ba2e283
fix conv2d
jmitrevs a7c08d3
fix reshape and flatten for oneAPI
jmitrevs fa05c8b
initial oneAPI tests
jmitrevs 36d5c85
remove nnet_dense_compressed from oneAPI
jmitrevs de8b76d
add merge functionality (untested)
jmitrevs 058adb4
fix merge for oneAPI
jmitrevs 0a7c761
fix merge for oneAPI (missing commit)
jmitrevs 4c847b2
add zeropadding
jmitrevs f690c98
standardize paralellization spelling
jmitrevs 262bc0c
fix pointwise for oneAPI
jmitrevs a8da30e
remove references to quartus
jmitrevs 46ccc1d
more replace quartus with oneapi
jmitrevs 8c9313b
snapshot on the way towards implementing pooling
jmitrevs 0498d44
fix io_stream pooling for oneAPI
jmitrevs 7bd7ba5
add fix for Conv2DBatchnorm
jmitrevs 4ff035f
accidentally committed CMakeLists.txt in my debug setup
jmitrevs b754f76
reshaping, not fully tested
jmitrevs 2e5a05e
fix cloning of streams
jmitrevs 77c5672
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs 8470a6c
fix pytest library loading
jmitrevs 20128bb
remove unused template
jmitrevs efb6a7a
fix some activation bugs
jmitrevs 6f439d5
fix the overwriting of directories in the pytest
jmitrevs 637e192
update version of test repository
jmitrevs 0f12c96
try to fix docker issue
jmitrevs a5aac2a
bump hls4ml-testing tag to 0.5.2
jmitrevs 412bd43
try not restricting tensorflow-model-optimizatoin
jmitrevs 5cffadf
Update to 0.5.3 for testing
jmitrevs d156339
bump to docker image 0.5.4, suggested by Ben
jmitrevs 924af07
fix pre-commit warning
jmitrevs 7ded550
dial down N_TESTS_PER_YAML to 4
jmitrevs e966b18
revert tensorflow-model-optimization change
jmitrevs e649f34
fix issue of saving in "obsolete" h5 format
jmitrevs 4743a5d
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs bf68958
fix embedding for oneAPI
jmitrevs d07985d
First attempt at adding RNNs to oneAPI
jmitrevs a58e4f5
fix bug in array size
jmitrevs eb9575a
fix order or indices
jmitrevs 04e0fcf
Merge branch 'main' into oneapi_backend
jmitrevs b4ed5bc
make queues static in bridge
jmitrevs ba55211
fix logic error in repack stream
jmitrevs 9b790c5
changing the style, but functionally identical
jmitrevs b4e8873
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs 60fe56b
Merge branch 'main' into oneapi_backend
jmitrevs 056765e
update pointwise optimizer for oneAPI
jmitrevs ee6817d
add oneAPI to test_multi_dense.py
jmitrevs 5a5b015
fix updating weight types
jmitrevs 1d72aa8
initial changes of templates, for testing
jmitrevs 106d578
fix weight naming, product selection
jmitrevs 80902d7
make im2col the default; fix winograd size
jmitrevs ea213a3
fix up streaming dense and convolution
jmitrevs 5ba9a29
fix prelu, some batchnorm
jmitrevs fdd0baf
fix weight array of exponential types
jmitrevs 3ff54a9
move ACExponentialPrecisionDefinition to oneapi_types
jmitrevs d6604f0
attempt to fix batchnorm and recurrent
jmitrevs 0f74122
Merge branch 'main' into oneapi_backend
jmitrevs 9ffd18e
fixed BatchNormalizationQuantizedTanhConfigTemplate template selection
jmitrevs be08ad0
fix embedding_stream
jmitrevs c06beda
fix lstm and simple rnn
jmitrevs 5452fab
fix GRU
jmitrevs e39e867
fix winograd, and also disable it by default
jmitrevs cfe229f
fix threshold name
jmitrevs 70617e1
split bn_quant to be backend-specific
jmitrevs 5bc6cbe
add type inference to oneAPI
jmitrevs c0cf580
add oneAPI to pytorch tests
jmitrevs 8c827b8
fix pooling with padding for oneAPI and Quartus
jmitrevs a4f4bd9
Merge branch 'main' into oneapi_backend
jmitrevs f1c0301
Merge branch 'main' into oneapi_backend
jmitrevs 7e0a8ca
Compilation for larger models enabled by increasing -fconstexpr-steps
laurilaatu acdc363
Merge pull request #6 from laurilaatu/oneapi_constexpr_fix
jmitrevs d1e14de
add oneapi clone tests; remove reduntand multi_clone test
jmitrevs 1b78e57
remove some attributes to avoid overwrite warnings
jmitrevs 865e2c8
Merge branch 'main' into oneapi_backend
jmitrevs f9a71f1
make extra handling for oneAPI like others (as in PR #1067)
jmitrevs 320615d
remove warnings for extra optimizers that are not scheduled on purpose
jmitrevs 5d13de5
update parametrized activations
jmitrevs 09c5d5b
intial depthconv2d implementation
laurilaatu c92091b
intial depthconv2d implementation
laurilaatu 8403348
Merge remote-tracking branch 'refs/remotes/origin/oneapi_separablecon…
laurilaatu c596f30
Rename to depthconv, add strides and add tests
laurilaatu bcd8c70
Remove class for DepthwiseConv2D
laurilaatu 8981112
Remove Separable convolution template
laurilaatu 5ad1188
Remove layer optimizer for sepconv
laurilaatu 3c5b633
Merge branch 'main' into oneapi_separableconv
laurilaatu 6b9bf0c
Loop unroll
laurilaatu 09013a1
Merge remote-tracking branch 'origin' into oneapi_separableconv
laurilaatu 21f21fc
Pre-commit format
laurilaatu 9536248
Fix spelling
laurilaatu 8ebdf22
Merge branch 'fastmachinelearning:main' into oneapi_separableconv
laurilaatu 0fb0997
depthconv1d, channel order in loop, product
laurilaatu d34876d
Gather result to accum
laurilaatu 7d9ec3a
Merge branch 'main' into oneapi_separableconv
laurilaatu d1c10ca
Merge branch 'main' into oneapi_separableconv
laurilaatu 6de4043
Merge branch 'main' into oneapi_separableconv
laurilaatu 326b188
Merge branch 'main' into oneapi_separableconv
laurilaatu c58db99
Merge branch 'main' into oneapi_separableconv
laurilaatu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
19 changes: 19 additions & 0 deletions
19
hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv1d.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#ifndef NNET_DEPTH_CONV1D_H_ | ||
#define NNET_DEPTH_CONV1D_H_ | ||
|
||
#include "nnet_common.h" | ||
#include "nnet_conv1d.h" | ||
#include "nnet_depthconv1d_resource.h" | ||
|
||
namespace nnet { | ||
|
||
template <class data_T, class res_T, typename CONFIG_T> | ||
void depthwise_conv_1d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights, | ||
const typename CONFIG_T::bias_t &biases) { | ||
|
||
depthwise_conv_1d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases); | ||
} | ||
|
||
} // namespace nnet | ||
|
||
#endif |
60 changes: 60 additions & 0 deletions
60
hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv1d_resource.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
#ifndef NNET_DEPTH_CONV1D_LATENCY_H_ | ||
#define NNET_DEPTH_CONV1D_LATENCY_H_ | ||
|
||
#include "nnet_common.h" | ||
#include "nnet_conv1d_resource.h" | ||
#include "nnet_mult.h" | ||
|
||
namespace nnet { | ||
|
||
template <class data_T, class res_T, typename CONFIG_T> | ||
void depthwise_conv_1d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights, | ||
const typename CONFIG_T::bias_t &biases) { | ||
|
||
int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan; | ||
[[intel::fpga_register]] int res_idx = 0; | ||
|
||
[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::n_filt]; | ||
|
||
DM_LOOP: | ||
#pragma unroll | ||
for (int dm = 0; dm < depth_multiplier; dm++) { | ||
|
||
WIDTH_LOOP: | ||
#pragma unroll | ||
for (int w = 0; w < CONFIG_T::out_width; w++) { | ||
|
||
CHAN_LOOP: | ||
#pragma unroll | ||
for (int c = 0; c < CONFIG_T::n_chan; c++) { | ||
|
||
res_idx = (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm; | ||
|
||
acc[res_idx] = biases[c * depth_multiplier + dm]; | ||
|
||
KERNEL_W_LOOP: | ||
#pragma unroll | ||
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) { | ||
|
||
int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left; | ||
|
||
if ((w_in >= 0) && (w_in < CONFIG_T::in_width)) { | ||
|
||
acc[res_idx] += CONFIG_T::mult_config:: | ||
template product<typename data_T::value_type, typename CONFIG_T::weight_t::value_type>::product( | ||
data[(w_in)*CONFIG_T::n_chan + c], | ||
weights[(dm * CONFIG_T::filt_width * CONFIG_T::n_chan) + (kw * CONFIG_T::n_chan) + c]); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
RESULT: | ||
#pragma unroll | ||
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::n_filt; ires++) { | ||
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]); | ||
} | ||
} | ||
} // namespace nnet | ||
#endif |
19 changes: 19 additions & 0 deletions
19
hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv2d.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#ifndef NNET_DEPTH_CONV2D_H_ | ||
#define NNET_DEPTH_CONV2D_H_ | ||
|
||
#include "nnet_common.h" | ||
#include "nnet_conv2d.h" | ||
#include "nnet_depthconv2d_resource.h" | ||
|
||
namespace nnet { | ||
|
||
template <class data_T, class res_T, typename CONFIG_T> | ||
void depthwise_conv_2d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights, | ||
const typename CONFIG_T::bias_t &biases) { | ||
|
||
depthwise_conv_2d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases); | ||
} | ||
|
||
} // namespace nnet | ||
|
||
#endif |
76 changes: 76 additions & 0 deletions
76
hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv2d_resource.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
#ifndef NNET_SEPARABLE_CONV2D_LATENCY_H_ | ||
#define NNET_SEPARABLE_CONV2D_LATENCY_H_ | ||
|
||
#include "nnet_common.h" | ||
#include "nnet_conv2d_resource.h" | ||
#include "nnet_mult.h" | ||
|
||
namespace nnet { | ||
|
||
template <class data_T, class res_T, typename CONFIG_T> | ||
void depthwise_conv_2d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights, | ||
const typename CONFIG_T::bias_t &biases) { | ||
|
||
int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan; | ||
[[intel::fpga_register]] int res_idx = 0; | ||
|
||
[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt]; | ||
|
||
DM_LOOP: | ||
#pragma unroll | ||
for (int dm = 0; dm < depth_multiplier; dm++) { | ||
|
||
HEIGHT_LOOP: | ||
#pragma unroll | ||
for (int h = 0; h < CONFIG_T::out_height; h++) { | ||
WIDTH_LOOP: | ||
#pragma unroll | ||
for (int w = 0; w < CONFIG_T::out_width; w++) { | ||
|
||
CHAN_LOOP: | ||
#pragma unroll | ||
for (int c = 0; c < CONFIG_T::n_chan; c++) { | ||
|
||
res_idx = | ||
(h * CONFIG_T::out_width * CONFIG_T::n_filt) + (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm; | ||
|
||
acc[res_idx] = biases[c * depth_multiplier + dm]; | ||
|
||
KERNEL_H_LOOP: | ||
#pragma unroll | ||
for (int kh = 0; kh < CONFIG_T::filt_height; kh++) { | ||
KERNEL_W_LOOP: | ||
#pragma unroll | ||
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) { | ||
|
||
int h_in = h * CONFIG_T::stride_height + kh - CONFIG_T::pad_top; | ||
int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left; | ||
|
||
if ((h_in >= 0) && (h_in < CONFIG_T::in_height) && (w_in >= 0) && (w_in < CONFIG_T::in_width)) { | ||
|
||
acc[res_idx] += | ||
CONFIG_T::mult_config::template product<typename data_T::value_type, | ||
typename CONFIG_T::weight_t::value_type>:: | ||
product( | ||
data[(h_in)*CONFIG_T::in_width * CONFIG_T::n_chan + (w_in)*CONFIG_T::n_chan + c], | ||
weights[(dm * CONFIG_T::filt_height * CONFIG_T::filt_width * CONFIG_T::n_chan) + | ||
(kh * CONFIG_T::filt_width * CONFIG_T::n_chan) + | ||
(kw * CONFIG_T::n_chan) + c]); | ||
|
||
; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
RESULT: | ||
#pragma unroll | ||
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt; ires++) { | ||
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]); | ||
} | ||
} | ||
} // namespace nnet | ||
#endif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.