Skip to content

Parsing non-UTF-8 pages #6

Open
@edevil

Description

@edevil

Parsing pages not written in UTF-8 currently produces errors:

> %HTTPoison.Response{body: body} = HTTPoison.get!("http://manybooks.net/index.xml")
> Html5ever.parse(body)

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 4070 }', src/libcore/result.rs:859
note: Run with `RUST_BACKTRACE=1` for a backtrace.
{:error, "called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 4070 }"}

In this case this XML feed has the encoding in the xml preeamble:

<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
...

Can I get around this problem or can the library be fixed to handle this situation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions