Closed
Description
Prerequisites
Please make sure to check off these prerequisites before submitting a bug report.
- Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
- Check that the issue hasn't already been reported, by checking the currently open issues.
- If there are steps to reproduce the problem, make sure to write them down below.
- If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.
Quick summary
I try to convert quantized LSTM from QKeras via HLS4ML, but the hls_model.compile()
would fail.
Details
- Tensorflow version: 2.14.1
- QKeras version: 0.9.0
- HLS4ML version: updated from the main, installed by
pip install -e .
The error message is as following
2025-01-20 20:25:43.718528: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-01-20 20:25:43.718666: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-01-20 20:25:43.734997: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
/home/robin/miniconda3/envs/tf214/lib/python3.10/site-packages/keras/src/constraints.py:365: UserWarning: The `keras.constraints.serialize()` API should only be used for objects of type `keras.constraints.Constraint`. Found an instance of type <class 'qkeras.quantizers.quantized_bits'>, which may lead to improper serialization.
warnings.warn(
Interpreting Sequential
Topology:
Layer name: qlstm_input, layer type: InputLayer, input shapes: [[None, 5, 2]], output shape: [None, 5, 2]
Layer name: qlstm, layer type: QLSTM, input shapes: [[None, 5, 2]], output shape: [None, 4]
Interpreting Sequential
Topology:
Layer name: qlstm_input, layer type: InputLayer, input shapes: [[None, 5, 2]], output shape: [None, 5, 2]
Layer name: qlstm, layer type: QLSTM, input shapes: [[None, 5, 2]], output shape: [None, 4]
Creating HLS model
Writing HLS project
Done
firmware/myproject.cpp: In function ‘void myproject(input_t*, result_t*)’:
firmware/myproject.cpp:36:49: error: no matching function for call to ‘lstm_stack<input_t, layer2_t, config2>(input_t*&, layer2_t [4], qlstm_weight_t [32], qlstm_recurrent_weight_t [64], qlstm_bias_t [16], qlstm_recurrent_bias_t [16])’
36 | nnet::lstm_stack<input_t, layer2_t, config2>(qlstm_input, layer2_out, w2, wr2, b2, br2); // qlstm
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from firmware/parameters.h:12,
from firmware/myproject.cpp:4:
firmware/nnet_utils/nnet_recurrent.h:190:6: note: candidate: ‘void nnet::lstm_stack(data_T*, res_T*, typename CONFIG_T::weight_t*, typename CONFIG_T::weight_t*, typename CONFIG_T::bias_t*, typename CONFIG_T::bias_t*) [with data_T = ap_fixed<8, 1>; res_T = ap_fixed<8, 1>; CONFIG_T = config2; typename CONFIG_T::weight_t = ap_fixed<4, 1>; typename CONFIG_T::bias_t = ap_fixed<4, 1>]’
190 | void lstm_stack(data_T data[CONFIG_T::n_sequence * CONFIG_T::n_in], res_T res[CONFIG_T::n_sequence_out * CONFIG_T::n_state],
| ^~~~~~~~~~
firmware/nnet_utils/nnet_recurrent.h:192:45: note: no known conversion for argument 4 from ‘qlstm_recurrent_weight_t [64]’ {aka ‘ap_fixed<8, 1> [64]’} to ‘config2::weight_t*’ {aka ‘ap_fixed<4, 1>*’}
192 | typename CONFIG_T::weight_t param_r[CONFIG_T::n_state * 4 * CONFIG_T::n_state],
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
firmware/nnet_utils/nnet_recurrent.h:235:6: note: candidate: ‘template<class data_T, class data2_T, class data3_T, class res_T, class CONFIG_T> void nnet::lstm_stack(data_T*, data2_T*, data3_T*, res_T*, typename CONFIG_T::weight_t*, typename CONFIG_T::weight_t*, typename CONFIG_T::bias_t*, typename CONFIG_T::bias_t*)’
235 | void lstm_stack(data_T data[CONFIG_T::n_sequence * CONFIG_T::n_in], data2_T h_newstate[CONFIG_T::n_state],
| ^~~~~~~~~~
firmware/nnet_utils/nnet_recurrent.h:235:6: note: template argument deduction/substitution failed:
firmware/myproject.cpp:36:49: note: candidate expects 8 arguments, 6 provided
36 | nnet::lstm_stack<input_t, layer2_t, config2>(qlstm_input, layer2_out, w2, wr2, b2, br2); // qlstm
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from firmware/parameters.h:12,
from firmware/myproject.cpp:4:
firmware/nnet_utils/nnet_recurrent.h:270:6: note: candidate: ‘void nnet::lstm_stack(hls::stream<srcType>&, hls::stream<dstType>&, typename CONFIG_T::weight_t*, typename CONFIG_T::weight_t*, typename CONFIG_T::bias_t*, typename CONFIG_T::bias_t*) [with data_T = ap_fixed<8, 1>; res_T = ap_fixed<8, 1>; CONFIG_T = config2; typename CONFIG_T::weight_t = ap_fixed<4, 1>; typename CONFIG_T::bias_t = ap_fixed<4, 1>]’
270 | void lstm_stack(hls::stream<data_T> &data_stream, hls::stream<res_T> &res_stream,
| ^~~~~~~~~~
firmware/nnet_utils/nnet_recurrent.h:270:38: note: no known conversion for argument 1 from ‘input_t*’ {aka ‘ap_fixed<8, 1>*’} to ‘hls::stream<ap_fixed<8, 1> >&’
270 | void lstm_stack(hls::stream<data_T> &data_stream, hls::stream<res_T> &res_stream,
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
- Diving into the issue, the datatype of
wr2
is remained the default datatypefixed<8,1>
. But on the function definition, thekernel
andrecurrent_weight
share the same datatypeweight_t
. So, if the quantizer of thekernel_quantizer
is different from the default, there would no function can be found.
Steps to Reproduce
- Clone the hls4ml repository
- Checkout the master branch, with commit hash: 92c8880
- My testing script is copied from testing provided from HLS4ML, QLSTM
def test():
X = np.linspace(-0.5, 0.5, 5)
X = np.stack([X, X], axis=1).reshape(1, 5, 2)
model = Sequential()
model.add(
QLSTM(
4,
input_shape=(5, 2),
kernel_quantizer='quantized_bits(4,0, alpha=1)',
recurrent_quantizer='quantized_bits(4,0, alpha=1)',
bias_quantizer='quantized_bits(4,0, alpha=1)',
state_quantizer='quantized_bits(4,0, alpha=1)',
activation='tanh',
recurrent_activation='sigmoid',
)
)
model.compile()
config = hls4ml.utils.config_from_keras_model(
model, granularity='name', default_precision="ap_fixed<8,1>", backend=backend
)
output_dir = str(f'hls4mlprj_qkeras_qsimplernn_{backend}')
hls_model = hls4ml.converters.convert_from_keras_model(model, hls_config=config, output_dir=output_dir, backend=backend,
part='xcu50-fsvh2104-2-e')
hls_model.compile()
y_qkeras = model.predict(X)
y_hls4ml = hls_model.predict(X)
np.testing.assert_allclose(y_qkeras, y_hls4ml.reshape(y_qkeras.shape), atol=0.1)
hls_model.build(csim = False, vsynth = True)
test()
Expected behavior
Compilation and build can pass.
Actual behavior
hls_model.compile()
fails