Closed
Description
I wonder if there is an easy way to extract comments embedded inside an HTML document.
I tried using html5ever with Floki and using the default parser comments are present in the parsed document as
{:comment, "My Comment"}
but when I switch the parser to html5ever they are just stripped. This can also be verified running:
html = """
<html><title>Some Title</title><body><!-- some comment --></body></html>
"""
Floki.parse_document(html)
|> IO.inspect()
Floki.parse_document(html, html_parser: Floki.HTMLParser.Html5ever)
|> IO.inspect()
that results in this output:
{:ok,
[
{"html", [],
[{"title", [], ["Some Title"]}, {"body", [], [comment: " some comment "]}]}
]}
{:ok,
[
{"html", [],
[{"head", [], [{"title", [], ["Some Title"]}]}, {"body", [], ["\n"]}]}
]}
Metadata
Metadata
Assignees
Labels
No labels