You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In unstructured/partition/md.py, the partition_md method converts Markdown to HTML using a fixed extension list: html = markdown.markdown(text, extensions=["tables"]). This causes an issue when processing code blocks containing # comments, as they get incorrectly parsed as <h1> tags.
Describe the solution you'd like
Modify partition_md to accept custom Markdown extensions via kwargs:
Read extensions parameter from method kwargs
Default to ["tables"] if not specified (backward compatible)
Pass extensions to markdown.markdown()
Allows users to handle special cases (e.g. add "fenced_code" for code blocks)
--- Element 0 ---
Type: Text
Category: UncategorizedText
Text: '```bash'
--- Element 1 ---
Type: Title
Category: Title
Text: 'create the container'
--- Element 2 ---
Type: Text
Category: UncategorizedText
Text: 'docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest ```'
Is your feature request related to a problem? Please describe.
In
unstructured/partition/md.py
, thepartition_md
method converts Markdown to HTML using a fixed extension list:html = markdown.markdown(text, extensions=["tables"])
. This causes an issue when processing code blocks containing#
comments, as they get incorrectly parsed as<h1>
tags.Describe the solution you'd like
Modify
partition_md
to accept custom Markdown extensions via kwargs:extensions
parameter from method kwargs["tables"]
if not specified (backward compatible)markdown.markdown()
"fenced_code"
for code blocks)Describe alternatives you've considered
Security Impact: None (parameter addition only)
Backward Compatibility:
extensions
kwarg remain unchangedextensions
in kwargs (unlikely given current usage)Recommended Implementation Approach:
Additional context
The following is the test markdown text
# create the container docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest
And the test code before modified
And the outputs:
Here is the test code after modified
And the outputs:
The text was updated successfully, but these errors were encountered: