1
+ <p align =" center " >
2
+ <a href =" https://github.com/docling-project/docling-eval " >
3
+ <img loading="lazy" alt="Docling" src="docs/assets/docling-eval-pic.png" width="40%"/>
4
+ </a >
5
+ </p >
6
+
1
7
# Docling-eval
2
8
3
9
@@ -19,28 +25,23 @@ Evaluate [Docling](https://github.com/DS4SD/docling) on various datasets.
19
25
20
26
Evaluate docling on various datasets. You can use the cli
21
27
22
- ``` sh
23
- docling-eval % poetry run evaluate --help
24
-
25
- Usage: python -m docling_eval.cli.main [OPTIONS]
26
-
27
- ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
28
- │ * --task -t [create| evaluate| visualize] Evaluation task [default: None] [required] │
29
- │ * --modality -m [end-to-end| layout| table_structure| code_transcription| math_transcription| reading_order| markdown_text| ca Evaluation modality [default: None] [required] │
30
- │ ptioning| bboxes_text] │
31
- │ * --benchmark -b [DPBench| OmniDocBench| WordScape| PubLayNet| DocLayNetV1| DocLayNetV2| FUNSD| Pub1M| PubTabNet| FinTabNet| WikiT Benchmark name [default: None] [required] │
32
- │ abNet] │
33
- │ * --output-dir -o PATH Output directory [default: None] [required] │
34
- │ --input-dir -i PATH Input directory [default: None] │
35
- │ --converter_type -c [Docling| SmolDocling] Type of document converter [default: Docling] │
36
- │ --split -s TEXT Dataset split [default: test] │
37
- │ --artifacts-path -a PATH Load artifacts from local path [default: None] │
38
- │ --begin_index -bi INTEGER Begin converting from the given sample index (inclusive). Zero based. [default: 0] │
39
- │ --end_index -ei INTEGER End converting to the given sample index (exclusive). Zero based. -1 indicates to take all │
40
- │ [default: 1000] │
41
- │ --debug --no-debug Enable debugging [default: no-debug] │
42
- │ --help Show this message and exit. │
43
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
28
+ ``` shell
29
+ terminal %> poetry run docling_eval --help
30
+
31
+ Usage: docling_eval [OPTIONS] COMMAND [ARGS]...
32
+
33
+ Docling Evaluation CLI for benchmarking document processing tasks.
34
+
35
+ ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
36
+ │ --help Show this message and exit. │
37
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
38
+ ╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
39
+ │ create Create both ground truth and evaluation datasets in one step. │
40
+ │ create-eval Create evaluation dataset from existing ground truth. │
41
+ │ create-gt Create ground truth dataset only. │
42
+ │ evaluate Evaluate predictions against ground truth. │
43
+ │ visualize Visualize evaluation results. │
44
+ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
44
45
45
46
46
47
```
@@ -62,7 +63,9 @@ On our list for next benchmarks:
62
63
- [ OmniOCR] ( getomni-ai/ocr-benchmark )
63
64
- Hyperscalers
64
65
- [ CoMix] ( https://github.com/emanuelevivoli/CoMix/tree/main/docs/datasets )
65
-
66
+ - [ DocVQA] ( https://huggingface.co/datasets/lmms-lab/DocVQA )
67
+ - [ rd-tablebench] ( https://huggingface.co/datasets/reducto/rd-tablebench )
68
+
66
69
## Contributing
67
70
68
71
Please read [ Contributing to Docling] ( https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md ) for details.
0 commit comments