Skip to content

feat: Azure prediction provider #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Apr 8, 2025
Merged

Conversation

cau-git
Copy link
Contributor

@cau-git cau-git commented Mar 25, 2025

This branch is a selection of units from branch praveenmidde/table-evaluation. It is in sync with the latest API on main.

Features picked

  • AzureDocIntelligencePredictionProvider class
  • New Azure dependencies in pyproject.toml

Features not included

  • COS source
  • Table dataset builders (they are already implemented in main)

PeterStaar-IBM and others added 30 commits February 12, 2025 12:48
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Need to work on image dimensions as the visualizations seem to be off
- Save the hyperscaler API jsons
- Save the docling document formats
- Temporary files are stored as below:
   - `intermediate_files` -- for parquet files
   - `microsoft` - Root folder that contains API output files
      - `docling_document` - docling document output of the MS outputs
   - `visualizations` - Output of table visualizations
Signed-off-by: Christoph Auer <[email protected]>
nikos-livathinos and others added 11 commits March 21, 2025 11:44
Signed-off-by: Nikos Livathinos <[email protected]>
Signed-off-by: Nikos Livathinos <[email protected]>
… accept external predictions.

And utility adapters.

Signed-off-by: Nikos Livathinos <[email protected]>
- New implementation for S3 source
- Save the intermediate files - prediction and doclingDocuments for pred+groundTruth for debugging
- Implement the run for TEDS metric for (limited) Fintabnet dataset from cos; If the API predictions are available, will use them instead of calling the API
- Needed to pin library versions of pydantic, url + s3 library
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git requested a review from praveenmidde March 25, 2025 18:32
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git changed the base branch from praveenmidde/table-evaluation to main April 7, 2025 15:46
@cau-git cau-git force-pushed the cau/table-evaluation-updates branch from 23036df to be637c7 Compare April 7, 2025 18:21
Copy link
Contributor

@praveenmidde praveenmidde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review and add more..

@cau-git cau-git changed the title Updates for table-evaluation branch feat: Azure prediction provider Apr 8, 2025
1. Cleanup poetry of unwanted libraries for this PR
2. Address passing in the images bytes for Azure API
3. Added log statements for filename; Also moved azure api logs to warning.
4. Remove commented out code as discussed
5. pytest - set "ignore_missing_predictions" to True to skip the missing files and move forward.
praveenmidde
praveenmidde previously approved these changes Apr 8, 2025
Copy link
Contributor

@praveenmidde praveenmidde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Approving..

cau-git added 3 commits April 8, 2025 13:38
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Copy link
Contributor

@praveenmidde praveenmidde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the test with latest code.. Test is successful

@cau-git cau-git marked this pull request as ready for review April 8, 2025 12:22
@cau-git cau-git merged commit 683de7a into main Apr 8, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants