-
Notifications
You must be signed in to change notification settings - Fork 6
feat: Azure prediction provider #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Need to work on image dimensions as the visualizations seem to be off
- Save the hyperscaler API jsons - Save the docling document formats - Temporary files are stored as below: - `intermediate_files` -- for parquet files - `microsoft` - Root folder that contains API output files - `docling_document` - docling document output of the MS outputs - `visualizations` - Output of table visualizations
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Nikos Livathinos <[email protected]>
Signed-off-by: Nikos Livathinos <[email protected]>
… accept external predictions. And utility adapters. Signed-off-by: Nikos Livathinos <[email protected]>
- New implementation for S3 source - Save the intermediate files - prediction and doclingDocuments for pred+groundTruth for debugging - Implement the run for TEDS metric for (limited) Fintabnet dataset from cos; If the API predictions are available, will use them instead of calling the API - Needed to pin library versions of pydantic, url + s3 library
…etRecord Signed-off-by: Christoph Auer <[email protected]>
…into cau/new-class-design
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
23036df
to
be637c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review and add more..
1. Cleanup poetry of unwanted libraries for this PR 2. Address passing in the images bytes for Azure API 3. Added log statements for filename; Also moved azure api logs to warning. 4. Remove commented out code as discussed 5. pytest - set "ignore_missing_predictions" to True to skip the missing files and move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Approving..
- Tested the code by running
pytest test_tables_azure.py --log-cli-level=INFO
from the tests folder; - I could see the results of TEDS for fintabnet dataset for 4 files (Note: end_index was set to 5; and the 3rd file in the dataset has a height of 42 pixels, which the Azure API will fail to process - Refer https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/model-overview?view=doc-intel-4.0.0#input-requirements)
- Note:
PubTabNetDatasetBuilder
has some problem with the conversion - that shall be looked later.
Signed-off-by: Christoph Auer <[email protected]>
…ng-eval into cau/table-evaluation-updates
Signed-off-by: Christoph Auer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the test with latest code.. Test is successful
This branch is a selection of units from branch
praveenmidde/table-evaluation
. It is in sync with the latest API onmain
.Features picked
AzureDocIntelligencePredictionProvider
classpyproject.toml
Features not included
main
)