Skip to content

PR: Fix Duplicate Metric Logging in MLFlowLogger to Prevent MLflow Database Errors #20871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

KAVYANSHTYAGI
Copy link
Contributor

@KAVYANSHTYAGI KAVYANSHTYAGI commented Jun 2, 2025

What does this PR do?

This PR fixes a long standing issue in PyTorch Lightning’s MLFlowLogger where logging the same metric (with the same name and step) more than once in a run causes a unique constraint violation on certain MLflow backends (e.g., PostgreSQL).
Now, MLFlowLogger tracks (metric, step) pairs and skips any duplicate metric logs within a run, preventing database errors and improving robustness.

This change also updates the class docstring to document this new behavior and adds a unit test to verify that duplicate metric logs are ignored as expected.

Fixes #20865

Motivation and Context

Some MLflow tracking servers (such as those backed by PostgreSQL) enforce a unique constraint on metrics.

If the same metric (with identical name and step) is logged more than once, MLflow returns an error and metric logging fails, potentially halting training.

This situation often arises when users call .log() in multiple hooks or callbacks.

The deduplication logic ensures only the first log of a metric per (name, step) is recorded per run.

Dependencies

No new dependencies are introduced.

Does your PR introduce any breaking changes?

No breaking changes .... existing behavior is preserved except that duplicate metric logs are now silently skipped (users may see a log message if a duplicate is skipped).

Other Checklist Items

Documentation updated- yes(see class docstring in MLFlowLogger)

New test added for deduplication- yes

Fun fact:
This change will help Lightning users avoid subtle training failures, especially with remote or production MLflow tracking servers!


📚 Documentation preview 📚: https://pytorch-lightning--20871.org.readthedocs.build/en/20871/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mlflow logging LR duplicate key issue with PostgreSQL DB #190
1 participant