Skip to content

[hooli-data-eng] componentize dbt #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alangenfeld
Copy link
Member

Attempt to move all the dbt definitions in to a custom component, leaving the important bits of distinction configured in yaml

Copy link
Member Author

alangenfeld commented May 30, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

github-actions bot commented May 30, 2025

Your pull request at commit 04989420cd933cab4bc992c391ba36a3300591dd is automatically being deployed to Dagster Cloud.

Location Status Link Updated
data-eng-pipeline View in Cloud May 30, 2025 at 07:59 PM (UTC)
snowflake_insights Building... May 30, 2025 at 07:56 PM (UTC)
basics Building... May 30, 2025 at 07:56 PM (UTC)
batch_enrichment Building... May 30, 2025 at 07:56 PM (UTC)
hooli_data_ingest Building... May 30, 2025 at 07:56 PM (UTC)
hooli_bi Building... May 30, 2025 at 07:56 PM (UTC)
hooli_airlift Building... May 30, 2025 at 07:56 PM (UTC)

@alangenfeld alangenfeld force-pushed the al/05-30-_hooli-data-eng_componentize_dbt branch from 51b2772 to 0498942 Compare May 30, 2025 19:54
@alangenfeld
Copy link
Member Author

we don't have snapshot tests in place so will have to take some care before landing, but this is at a spot to get feedback on the approach

Copy link
Contributor

@cnolanminich cnolanminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really excited about how this feels and would be show-able to a team -- especially the YAML in terms of "after a one-time setup an analytics engineer could add whatever selections they want. I'll think about the YAML interface a bit in terms of whether there is anything else I'd want to see exposed or exposed in a different way, but what a great way to start the weekend!

Would also love to get @izzye84's feedback in particular on the YAML here

@@ -12,22 +13,13 @@ class ScheduledJobComponent(Component, Resolvable):

# added fields here will define yaml schema via Model
cron_schedule: str
dagster_selection: str
asset_selection: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is 💯 better -- when I first wrote this it was "dbt_selection" but really it's just "asset_selection".


- selection: "orders_cleaned users_cleaned orders_augmented"
partitioning: "daily"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow I love this so much! I need to spend some time thinking about this a bit more like if this is how we'd want to expose things vs. not, but the idea of showing someone how they would add a new dbt selection via YAML (partitioned or no), along with adding a schedule, is really really compelling.

attributes:
job_name: dbt_components_job
cron_schedule: "0 8 * * *"
asset_selection: 'tag:"all_cleaned_models"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously this was my code not yours, but @alangenfeld is there any way to avoid the slightly ugly single quote double quote thing for asset selection? Not the end of the world but it annoyed me when I was writing it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya i just copied it but looks like its still valid if you omit it


---

type: hooli_data_eng.components.ScheduledPartitionedJobComponent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how it's working but I do find it a bit strange that there is nothing about the partition defined in the YAML of ScheduledPartitionedJobComponent

@alangenfeld alangenfeld changed the base branch from al/05-30-_hooli-data-eng_move_lib_to_components to graphite-base/162 May 30, 2025 20:37
@alangenfeld alangenfeld force-pushed the al/05-30-_hooli-data-eng_componentize_dbt branch from 0498942 to eff115d Compare May 30, 2025 20:37
@alangenfeld alangenfeld changed the base branch from graphite-base/162 to master May 30, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants