Skip to content

[Docs] Fix documentation #15477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 5, 2025
Merged

[Docs] Fix documentation #15477

merged 18 commits into from
Jun 5, 2025

Conversation

Sam-gsj
Copy link
Contributor

@Sam-gsj Sam-gsj commented May 29, 2025

No description provided.

Copy link

paddle-bot bot commented May 29, 2025

Thanks for your contribution!

@@ -44,15 +44,15 @@ comments: true
<td>103.08 / 103.08</td>
<td>197.99 / 197.99</td>
<td>6.9 M</td>
<td rowspan="1">SLANet 是百度飞桨视觉团队自研的表格结构识别模型。该模型通过采用 CPU 友好型轻量级骨干网络 PP-LCNet、高低层特征融合模块 CSP-PAN、结构与位置信息对齐的特征解码模块 SLA Head,大幅提升了表格结构识别的精度和推理速度。</td>
<td rowspan="1">SLANet 是百度飞桨视觉团队自研的表格结构识别模型。该模型通过采用 CPU 友好型轻量级骨干网络 PP-LCNet、高低层特征融合模块 CSP-PAN、结构与位置信息对齐的特征解码模块SLA Head,大幅提升了表格结构识别的精度和推理速度。</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不应该删除空格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

</tr>
<tr>
<td>SLANet_plus</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/SLANet_plus_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANet_plus_pretrained.pdparams">训练模型</a></td>
<td>63.69</td>
<td>140.29 / 140.29</td>
<td>195.39 / 195.39</td>
<td>6.9 M</td>
<td rowspan="1">SLANet_plus 是百度飞桨视觉团队自研的表格结构识别模型 SLANet 的增强版。相较于 SLANetSLANet_plus 对无线表、复杂表格的识别能力得到了大幅提升,并降低了模型对表格定位准确性的敏感度,即使表格定位出现偏移,也能够较准确地进行识别。
<td rowspan="1">SLANet_plus 是百度飞桨视觉团队自研的表格结构识别模型 SLANet 的增强版。相较于 SLANet, SLANet_plus 对无线表、复杂表格的识别能力得到了大幅提升,并降低了模型对表格定位准确性的敏感度,即使表格定位出现偏移,也能够较准确地进行识别。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是中文句子,应该用中文标点

@@ -956,7 +956,7 @@ paddleocr table_recognition_v2 -i ./table_recognition_v2.jpg --device gpu
<tr>
<td><code>text_det_limit_type</code></td>
<td>文本检测的图像边长限制类型。
支持 <code>min</code> 和 <code>max</code><code>min</code> 表示保证图像最短边不小于 <code>det_limit_side_len</code><code>max</code> 表示保证图像最长边不大于 <code>limit_side_len</code>。如果不设置,将默认使用产线初始化的该参数值,初始化为 <code>max</code>。
支持 <code>min</code> 和 <code>max</code>, <code>min</code> 表示保证图像最短边不小于 <code>det_limit_side_len</code>, <code>max</code> 表示保证图像最长边不大于 <code>limit_side_len</code>。如果不设置,将默认使用产线初始化的该参数值,初始化为 <code>max</code>。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文句子需要用中文标点。中英混杂的情况可能需要手工校对一下。

- `doc_preprocessor_res`: `(Dict[str, Union[str, Dict[str, bool], int]])` 文档预处理子产线的输出结果。仅当`use_doc_preprocessor=True`时存在
- `input_path`: `(Union[str, None])` 图像预处理子产线接受的图像路径,当输入为`numpy.ndarray`时,保存为`None`
- `model_settings`: `(Dict)` 预处理子产线的模型配置参数
- `use_doc_orientation_classify`: `(bool)` 控制是否启用文档方向分类
- `use_doc_unwarping`: `(bool)` 控制是否启用文本图像矫正
- `angle`: `(int)` 文档方向分类的预测结果。启用时取值为[0,1,2,3],分别对应[0°,90°,180°,270°];未启用时为-1
- `angle`: `(int)` 文档方向分类的预测结果。启用时取值为[0, 1, 2, 3],分别对应[0°,90°,180°,270°];未启用时为-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种的逗号后面不加空格会更紧凑一些。另外,应该保持风格一致,比如这个句子里两个列表一个加空格另一个不加,看起来有些不一致~

@Sam-gsj
Copy link
Contributor Author

Sam-gsj commented May 29, 2025

done

Copy link
Member

@Bobholamovic Bobholamovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

除了留的评论外,似乎没有更新所有的“模块“、”功能“用词,需要排查所有产线文档

@@ -1674,7 +1674,7 @@ for res in output:
- `boxes`: `(List[Dict])` 版面印章区域的检测框列表,每个列表中的元素,包含以下字段
- `cls_id`: `(int)` 检测框的印章类别id
- `score`: `(float)` 检测框的置信度
- `coordinate`: `(List[float])` 检测框的四个顶点坐标,顺序为x1,y1,x2,y2表示左上角的x坐标,左上角的y坐标,右下角x坐标,右下角的y坐标
- `coordinate`: `(List[float])` 检测框的四个顶点坐标,顺序为x1, y1, x2, y2表示左上角的x坐标,左上角的y坐标,右下角x坐标,右下角的y坐标
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该用中文逗号

@@ -315,7 +315,7 @@ comments: true
<td rowspan="2">--</td>
<td rowspan="2">--</td>
<td rowspan="2">351M</td>
<td rowspan="2">SLANeXt 系列是百度飞桨视觉团队自研的新一代表格结构识别模型。相较于 SLANet 和 SLANet_plusSLANeXt 专注于对表格结构进行识别,并且对有线表格(wired)和无线表格(wireless)的识别分别训练了专用的权重,对各类型表格的识别能力都得到了明显提高,特别是对有线表格的识别能力得到了大幅提升。</td>
<td rowspan="2">SLANeXt 系列是百度飞桨视觉团队自研的新一代表格结构识别模型。相较于 SLANet 和 SLANet_plus, SLANeXt 专注于对表格结构进行识别,并且对有线表格(wired)和无线表格(wireless)的识别分别训练了专用的权重,对各类型表格的识别能力都得到了明显提高,特别是对有线表格的识别能力得到了大幅提升。</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文逗号

@@ -2171,7 +2171,7 @@ for item in markdown_images:
<tr>
<td><code>layout_unclip_ratio</code></td>
<td>与实例化时的参数相同。</td>
<td><code>float|Tuple[float,float]|dict</code></td>
<td><code>float|Tuple[float, float]|dict</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不需要空格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Bobholamovic Bobholamovic changed the title modify docs0529 [Docs] Fix documentation May 30, 2025
@@ -1619,19 +1619,19 @@ for item in markdown_images:
<td>版面区域检测模型检测框的扩张系数。
<ul>
<li><b>float</b>:任意大于 <code>0</code> 浮点数;</li>
<li><b>Tuple[float, float]</b>:在横纵两个方向各自的扩张系数;</li>
<li><b>Tuple[float,float]</b>:在横纵两个方向各自的扩张系数;</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多了一个冒号

@Sam-gsj
Copy link
Contributor Author

Sam-gsj commented May 30, 2025

done

@@ -1352,7 +1356,7 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
<td>版面模型得分阈值。
<ul>
<li><b>float</b>:<code>0-1</code> 之间的任意浮点数;</li>
<li><b>dict</b>: <code>{0:0.1}</code> key为类别ID,value为该类别的阈值;</li>
<li><b>字典</b>: <code>{0:0.1}</code> key为类别ID,value为该类别的阈值;</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict不用翻译成字典,建议用dict,和前面的“float”保持一致

</td>
<td><code>str</code></td>
<td></td>
</tr>
<tr>
<td><code>ocr_version</code></td>
<td>OCR 版本。
<td>OCR 版本,注意不是每个ocr_version都支持所有的lang
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以提示用户看下面的表格。此外,注意ocr_versionlang是参数名,需要用<code></code>包围。

<li><b>ka</b>:卡纳达文;
<li><b>ta</b>:泰米尔文;
</ul>如果不设置,将默认使用<code>ch</code>。
<a href="#languages">支持的语言列表</a>,如果不设置,将默认使用<code>ch</code>。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

表格内使用href在mkdocs生成的文档里好像不会生效,建议不给出链接,直接说参考下文的xxx,在这里这样做应该也足够清晰。

@Bobholamovic
Copy link
Member

因代码更新优先级更高,本PR涉及的代码更新移动到 #15561 ,与另一部分代码更新一起提交

@Sam-gsj
Copy link
Contributor Author

Sam-gsj commented Jun 5, 2025

done

@Sam-gsj
Copy link
Contributor Author

Sam-gsj commented Jun 5, 2025

done

@Bobholamovic Bobholamovic merged commit 1b265d0 into PaddlePaddle:main Jun 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants