Skip to content

Feature request: "Multi" prefix extractor support #12824

Open
@zaidoon1

Description

@zaidoon1

say my key format is <account_id>:<user_id>:<some dynamic value>

today, we can create a prefix extractor/bloom on <account_id>:<user_id> to help with queries that start with some known <account_id>:<user_id>, HOWEVER, what we can't do today is ALSO setup a prefix extractor on <account_id> this way, I can use bloom filters on queries that happen to know the account id + user id combination as well as the queries that only happen to have an account id. Effectively, in db/sql terminology, this is like being able to create multiple indexes on the "columns" to optimize queries like: select * from blah where account_id = 123 & select * from blah where account_id = 345 and user_id = 678

As far as I know, today we can only have one prefix extractor/bloom per cf so we have the following workarounds which are not ideal:

  1. create another cf that duplicates the data, so that one cf has <account_id>:<user_id> prefix extractor and the other has <account_id> prefix extractor and depending on the query/what we already know, we will lookup the kv from the corresponding cf. The issue here is we need to use more disk space to store the duplicate data

  2. Given <account_id> is common between both prefix extractors (in this use case) and we always have this, we use this as the prefix extractor, however, we miss on the opportunity to optimize queries that also have <user_id>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions