Skip to content

mlops-club/metaflow-extensions-openlineage

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenLineage

I want to get openlineage data into AWS DataZone from Python.

But I'm using Marquez as an OL backend first, to get it working locally, since trying to log OL data to AWS out of the gate would confound the learning curve of OL.

I followed the marquez tutorial here.

Step 1 - Start marquez locally

To start up a marquez server, do

cd ./marquez
bash ./docker/up.sh --db-port 12345 --api-port 9000 --no-volumes --seed

Then go to http://localhost:3000 to see the UI

Step 2 - Log some sample lineage events (no extension)

OPENLINEAGE__TRANSPORT__TYPE=http \
OPENLINEAGE__TRANSPORT__URL=http://localhost:9000 \
OPENLINEAGE__TRANSPORT__ENDPOINT=/api/v1/lineage \
OPENLINEAGE__TRANSPORT__COMPRESSION=gzip \
    uv run ./examples/log_lineage.py

Note, no API key is needed to send data to marquez, hence not specifying an auth type in these environment variables ^^^.

Step 3 - Run a metaflow flow

OPENLINEAGE__TRANSPORT__TYPE=http \
OPENLINEAGE__TRANSPORT__URL=http://localhost:9000 \
OPENLINEAGE__TRANSPORT__ENDPOINT=/api/v1/lineage \
OPENLINEAGE__TRANSPORT__COMPRESSION=gzip \
OPENLINEAGE_CLIENT_LOGGING=DEBUG \
    uv run ./examples/lineage_flow.py run

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 65.1%
  • Python 32.4%
  • Dockerfile 2.5%