Skip to content

Fixes for quantised RNNs in data type inconsistencies #1171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

bo3z
Copy link
Contributor

@bo3z bo3z commented Jan 22, 2025

Description

📝 Various fixes for data type inconsistencies for quantised LSTM / GRU (see below for detailed explanation)
First discovered by issue #1166; further showed much more fundamental issues with the RNN layer implementations with Vivado / Vitis backend

Type of change

  • Bug fix (non-breaking change that fixes an issue)

Tests

TODO: Add additional tests; so far it has only been confirmed that this fixes issue #1166. However, our current tests don't have different data-types between weights and recurrent weights

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

Description

  1. There was a mismatch between the parsed quantizer from QKeras and the weight type & name for recurrent weights, causing the recurrent weights to always default to the global default precision. This has been fixed in: hls4ml/converters/keras/qkeras.py. This was the original error discovered by Datatype does not match in HLS converted quantized lstm from QKeras #1166
  2. However, this error has further shown that the HLS implementation assumes that weight and reccurent weight (as well as bias and reccurent bias) have the same datatype, which is not necessarily true. QKeras let's us have different quantisers for weight and reccurent weights (although recurrent bias doesn't have a custom quantiser). For this the HLS template has been modified and the corresponding Vivado / Vitis backend pass.
  3. The final issue discovered by this is the inconsistency in activation data types. Namely, a lot of the activation calls had incorrect templates (defaulting to data_T or weight_T), when in fact they should be accum_T. I traced all the activations and they should now match the variables that are being passed by them. However, this would be worth double checking by someone else.

Additional TODOs:

  • Verify whether these issues are present for oneAPI and Catapult. I had some mysterious issues when compiling the code, I guess my environment is not up to date to use these backends. If someone who has a bit more expertise with oneAPI and Catapult could check this PR and the associated issue out, it would be great.
  • Add a test that confirms the fix works in "heterogeneous" quantisation.

@bo3z bo3z added the please test Trigger testing by creating local PR branch label Jan 22, 2025
@bo3z bo3z added this to the v1.1.0 milestone Jan 22, 2025
@JanFSchulte
Copy link
Contributor

pre-commit.ci autofix

@JanFSchulte
Copy link
Contributor

Looks good to me. Not sure we want to wait for someone to check oneAPI and Catapult before merging this, so with the added test it might be good to go already.

@vloncar vloncar removed the please test Trigger testing by creating local PR branch label Feb 6, 2025
@vloncar vloncar added the please test Trigger testing by creating local PR branch label Feb 6, 2025
@vloncar
Copy link
Contributor

vloncar commented Feb 6, 2025

Looks good to me, we can follow up with Intel/Catapult PR. Let's see what the tests say before merging.

@JanFSchulte
Copy link
Contributor

Test look good, I'll merge.

@JanFSchulte JanFSchulte merged commit 90ada94 into fastmachinelearning:main Feb 7, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants