fix(chunker): correctly determine chunk midpoint when empty chunks are present #1800
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
Problem:
["foo", '', "bar", 'baz']
is token counted as'foobarbaz'
rather than'foo bar baz'
when getting the midpoint index:griptape/griptape/chunkers/base_chunker.py
Line 106 in 41ad7f5
This leads to an incorrect midpoint index which results in an incorrect chunk split. In certain cases this can lead to hitting recursive max depth.
Solution:
Join the chunks on the separator that we originally split them on:
griptape/griptape/chunkers/base_chunker.py
Line 56 in 41ad7f5
griptape/griptape/chunkers/base_chunker.py
Line 106 in 4b7bb05
This correctly calculates the midpoint index which results in a correct chunk split.
Other changes in the PR are updates to the tests because chunk boundaries have changed slightly.
Issue ticket number and link
Closes #1796