-
Notifications
You must be signed in to change notification settings - Fork 1
[hooli-data-eng] componentize dbt #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Your pull request at commit
|
51b2772
to
0498942
Compare
we don't have snapshot tests in place so will have to take some care before landing, but this is at a spot to get feedback on the approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excited about how this feels and would be show-able to a team -- especially the YAML in terms of "after a one-time setup an analytics engineer could add whatever selections they want. I'll think about the YAML interface a bit in terms of whether there is anything else I'd want to see exposed or exposed in a different way, but what a great way to start the weekend!
Would also love to get @izzye84's feedback in particular on the YAML here
@@ -12,22 +13,13 @@ class ScheduledJobComponent(Component, Resolvable): | |||
|
|||
# added fields here will define yaml schema via Model | |||
cron_schedule: str | |||
dagster_selection: str | |||
asset_selection: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is 💯 better -- when I first wrote this it was "dbt_selection" but really it's just "asset_selection".
|
||
- selection: "orders_cleaned users_cleaned orders_augmented" | ||
partitioning: "daily" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow I love this so much! I need to spend some time thinking about this a bit more like if this is how we'd want to expose things vs. not, but the idea of showing someone how they would add a new dbt selection via YAML (partitioned or no), along with adding a schedule, is really really compelling.
attributes: | ||
job_name: dbt_components_job | ||
cron_schedule: "0 8 * * *" | ||
asset_selection: 'tag:"all_cleaned_models"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously this was my code not yours, but @alangenfeld is there any way to avoid the slightly ugly single quote double quote thing for asset selection? Not the end of the world but it annoyed me when I was writing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya i just copied it but looks like its still valid if you omit it
|
||
--- | ||
|
||
type: hooli_data_eng.components.ScheduledPartitionedJobComponent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how it's working but I do find it a bit strange that there is nothing about the partition defined in the YAML of ScheduledPartitionedJobComponent
0498942
to
eff115d
Compare
df3e803
to
7438b78
Compare
Attempt to move all the dbt definitions in to a custom component, leaving the important bits of distinction configured in yaml