Skip to content

Proposal to create logical plans for operations #2006

Open
@Blajda

Description

@Blajda

Description

Propose further work that I'd like to perform regarding the creation reusable logical relations. Also helps with identifying relations we would need with substrait.

Delta Find Files
Purpose: Identify files that contain records that satisfy a predicate.

This relation will generate a record batch stream with a single column called path. path will then map to an Add action in the Delta table.
This relation will also maintain a list of files that satisfy the predicate which can be passed sideways to relations downstream.

Delta Scan
Purpose: Scan the Delta Table

Update DeltaScan to take an optional input stream that contains paths of files to be scanned. This will enable DeltaScan to consume output of DeltaFindFile.
Currently when using find files, we must wait for the entire operation to complete and then we build the scan. The change enables Delta Scan to start when the first candidate file is identified.
I think this will require some significant work since it will involve refactoring the current DeltaScan implementation.

Delta Write
Purpose: Write records to storage, conflict resolution, and commit creation

Takes an single input stream of data that matches that tables schema and creates Add actions for each new file.
Information can be passed sideways to include additional delta actions to add to the commit. E.G DeltaDelete can provide a stream of Remove actions.

Delta Delete
Purpose: Delete Records from the table.

Given a predicate delete records from the Delta table.
Delta Delete can take an optional stream of records and will output records that do NOT satisfy the predicate.
It will maintain a stream of Remove actions can be passed sideways to other operations downstream.

The input stream is optional since there are cases where delete determine which files to remove without a need for a scan. An optimization phase can help determine when this is the case.

Diagram

High level diagram of how these relation will connect.

               ┌───────────────────────┐
               │   Delta Find Files    │
               │                       │
               │  Predicate:           │
           ┌───┤    Version:           │
           │   │                       │
           │   └──────────┬────────────┘
           │              │
           │              ▼
   Files   │   ┌───────────────────────┐
  Matched  │   │     Delta Scan        │
   List    │   │                       │
           │   │   Version:            │
           │   │                       │
           │   │                       │
           │   └──────────┬────────────┘
           │              │
           │              ▼
           │   ┌───────────────────────┐
           └──►│     Delta Delete      │
               │                       │
               │  Predicate:           │
           ┌───┤                       │
           │   └──────────┬────────────┘
  Remove   │              │
  Actions  │              ▼
           │   ┌───────────────────────┐
           │   │     Delta Write       │
           └──►│                       │
               │                       │
               │                       │
               └───────────────────────┘

Converting the ReplaceWhere operation to a logical view can look something like this


               ┌───────────────────────┐
               │   Delta Find Files    │
               │                       │
               │  Predicate:           │
           ┌───┤    Version:           │
           │   │                       │
           │   └──────────┬────────────┘
           │              │
           │              ▼                     ┌────────────────────────────┐
   Files   │   ┌───────────────────────┐        │        Data  Source        │
  Matched  │   │     Delta Scan        │        │                            │
   List    │   │                       │        │                            │
           │   │   Version:            │        └────────────┬───────────────┘
           │   │                       │                     │
           │   │                       │                     ▼
           │   └──────────┬────────────┘        ┌────────────────────────────┐
           │              │                     │    Delta Constraint Check  │
           │              ▼                     │                            │
           │   ┌───────────────────────┐        └────────────┬───────────────┘
           └──►│     Delta Delete      │                     │
               │                       │                     │
               │  Predicate:           │                     │
           ┌───┤                       │                     │
           │   └──────────┬────────────┘                     │
  Remove   │              │                                  │
  Actions  │              └────────────────┐   ┌─────────────┘
           │                               ▼   ▼
           │       ┌──────────────────────────────────────────────────────────┐
           │       │                      Union                               │
           │       │                                                          │
           │       └─────────────────────────┬────────────────────────────────┘
           │                                 │
           │                                 ▼
           │                      ┌───────────────────────┐
           │                      │     Delta Write       │
           └──────────────────────┤                       │
                                  │                       │
                                  │                       │
                                  └───────────────────────┘

Use Case

Once we have logical plans for Update and Delete we can expose new Datafusion SQL statements for them
May help with reuse of Delete & Update other for logical plans.

Related Issue(s)

Metadata

Metadata

Assignees

Labels

binding/rustIssues for the Rust crateenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions