Skip to content

Base64 MIME variant does not ignore white space chars as per RFC2045 #414

Closed
@tmoschou

Description

@tmoschou

Hello,

The Base64 Mime variant is non compliant as per RFC2045 Section 6.8. Base64 Content-Transfer-Encoding, which specifies

Any characters outside of the base64 alphabet are to be ignored in
base64-encoded data.

With this regard, any character including newline characters and whitespace should not be required to fall on 4 character boundaries.

Sample test case that demonstrates the issues (Note I'm using the Yaml dataformat, since base64 mime encoded documents in json strings with (escaped) newlines seems completely broken regardless of whether newlines fall on 4 char boundary or not).

Input document from "Example 2.23. Various Explicit Tags" from spec (1.1 and 1.2) - http://www.yaml.org/spec/1.2/spec.html

---
picture: !!binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=
@Test
public void testBinaryDecoder() throws IOException {
    final ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
    mapper.setBase64Variant(Base64Variants.MIME);
    try (final InputStream inputStream = getClass().getResourceAsStream("yaml1.3-example2.23.yaml")) {
        final JsonNode bean = mapper.readTree(inputStream);
        final JsonNode picture = bean.get("picture");
        java.util.Base64.getMimeDecoder().decode(picture.asText()); // works fine
        final byte[] gif = picture.binaryValue(); // fails
        assertEquals(65, gif.length);
        final byte[] actualFileHeader = Arrays.copyOfRange(gif, 0, 6);
        final byte[] expectedFileHeader = new byte[]{'G', 'I', 'F', '8', '9', 'a'};
        assertArrayEquals(expectedFileHeader, actualFileHeader);
    }
}

Note that java.util.Base64.getMimeDecoder() has no issues decoding this document.

Also note that technically speaking the YAML codec should be not using the MIME decoder but another variant, not covered in core. http://yaml.org/type/binary.html specifies that

Binary data is serialized using the base64 format as defined by RFC2045 (MIME), with the following notes:

  • The content is not restricted to lines of 76 characters or less.
  • Characters other than the base64 alphabet, line breaks and white space are considered an error.

However the current API does not allow specifying such a custom variant as far as I am aware. This should not have any baring on the test case provided (since the document only consists on the base64 alphabet/whitespace and the document lines are not more than 76 characters).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions