Skip to content

feat: OCR evaluator #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 22, 2025
Merged

feat: OCR evaluator #63

merged 16 commits into from
Apr 22, 2025

Conversation

cau-git
Copy link
Contributor

@cau-git cau-git commented Apr 8, 2025

Adds an OCR evaluator which computes CER.

Picked out and updated from branch integrate-ocr-benchmarks, see #46

⚠️ This code is currently not fitting the docling-eval API and must be re-written.

TODO

  • Rewrite OCREvaluator to fit the BaseEvaluator interface (especially arguments and return types of __call__)
  • Match the design of the other evaluator classes. No output_dir, and no writing to the file system anywhere.
  • Remove custom code for reading from JSONL, and custom visualization
  • Update and test with the generic visualize facility (see here).
  • Prove that it works as intended with completed test unit. It must be able to run at least with PixParse OCR dataset and docling predictions (see feat: PixParse OCR dataset builder #61)

@cau-git cau-git requested a review from samiuc April 8, 2025 12:24
@samiuc samiuc marked this pull request as ready for review April 21, 2025 13:09
@samiuc samiuc requested a review from PeterStaar-IBM April 21, 2025 13:10
@samiuc
Copy link
Contributor

samiuc commented Apr 21, 2025

@cau-git I've completed all the to-do list items you mentioned in the description above. Just a quick note, we now calculate both CER and Character Accuracy. I've also added the tests for each hyperscaler for OCR and the code now follows the same design pattern as the other modules.

Copy link
Contributor Author

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samiuc thanks for checking off the TODOs!
Below are a few remaining remarks from my end.

PeterStaar-IBM
PeterStaar-IBM previously approved these changes Apr 22, 2025
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cau-git cau-git requested a review from dolfim-ibm April 22, 2025 09:30
dolfim-ibm
dolfim-ibm previously approved these changes Apr 22, 2025
@cau-git cau-git merged commit be62102 into main Apr 22, 2025
8 of 11 checks passed
@cau-git cau-git deleted the dev/ocr-evaluator branch April 22, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants