|
1 | 1 | <!DOCTYPE html><html lang="en"><meta charset="utf-8">
|
2 |
| -<title>The "data" URL scheme</title> |
3 |
| -<link href="http://www.whatwg.org/style/specification" rel="stylesheet"> |
| 2 | +<title>Ambiguities in the "data" URL scheme</title> |
| 3 | +<link href="https://www.whatwg.org/style/specification" rel="stylesheet"> |
4 | 4 |
|
5 |
| -<div class="head"> |
6 |
| - <h1>The <code>data</code> URL scheme</h1> |
| 5 | +<h1>Ambiguities in the <code>data</code> URL scheme</h1> |
7 | 6 |
|
8 |
| - <dl> |
9 |
| - <dt>Latest version: |
10 |
| - <dd><a href="http://simonsapin.github.com/data-urls/">http://simonsapin.github.com/data-urls/</a> |
| 7 | +<p>Last updated on 11 November 2014. |
11 | 8 |
|
12 |
| - <dt>This version: |
13 |
| - <dd>Last updated on 13 March 2013. |
| 9 | +<dl> |
| 10 | + <dt>Feedback:</dt> |
| 11 | + <dd><a href="https://github.com/SimonSapin/data-urls/issues">File an issue</a> |
| 12 | + <dd><a href="https://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a> |
| 13 | +</dl> |
14 | 14 |
|
15 |
| - <dt>Participate:</dt> |
16 |
| - <dd><a href="https://github.com/SimonSapin/data-urls/issues">File a bug</a> |
17 |
| - <dd><a href="http://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a> |
18 |
| - |
19 |
| - <dt>Version History: |
20 |
| - <dd><a href="https://github.com/SimonSapin/data-urls/commits">https://github.com/SimonSapin/data-urls/commits</a> |
21 |
| - |
22 |
| - <dt>Editor: |
23 |
| - <dd><a href="http://exyr.org/">Simon Sapin</a> |
24 |
| - </dl> |
25 |
| - |
26 |
| - <p class="copyright"> |
27 |
| - <a href="http://creativecommons.org/publicdomain/zero/1.0/" rel="license"> |
28 |
| - <img alt="CC0" src="http://i.creativecommons.org/p/zero/1.0/80x15.png"></a> |
29 |
| - To the extent possible under law, the editors have waived all copyright and |
30 |
| - related or neighboring rights to this work. |
31 |
| - |
32 |
| -</div> |
33 |
| - |
34 |
| - |
35 |
| -<h2 id="introduction"><span class="secno">1 </span>Introduction</h2> |
36 | 15 |
|
37 | 16 | <p>
|
38 | 17 | The <code>data</code> URL scheme is defined by
|
39 | 18 | <a href="http://tools.ietf.org/html/rfc2397">RFC 2397</a>,
|
40 | 19 | which unfortunately is vague regarding many details of the syntax.
|
41 |
| - This document describes a more precise parsing algorithm for |
42 |
| - <code>data:</code> URLs. |
| 20 | + This document lists some of the details that should be specified in a future, |
| 21 | + more precise specification. |
43 | 22 |
|
44 | 23 | <p>
|
45 | 24 | See also
|
46 | 25 | <a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494">Bug 19494</a>
|
47 | 26 | on the W3C Bugzilla
|
48 | 27 | and other stuff linked from there.
|
49 | 28 |
|
| 29 | +<ul> |
| 30 | + <li> |
| 31 | + If the URL has a <a href="http://url.spec.whatwg.org/#concept-url-query">query</a>, |
| 32 | + the <code>?</code> separator and the query string should be part of the input |
| 33 | + to the <code>data</code> URL parsing algorithm. |
50 | 34 |
|
51 |
| -<h2 id="“fetching”-a-data:-url"><span class="secno">2 </span>“Fetching” a <code>data:</code> URL</h2> |
52 |
| - |
53 |
| -<p> |
54 |
| - This algorithm returns either a failure or two byte strings: |
55 |
| - a MIME type with parameters |
56 |
| - (as it would appear in a <var>Content-Type</var> HTTP header) |
57 |
| - and the decoded data. |
| 35 | + <li> |
| 36 | + However if the URL has a <a href="http://url.spec.whatwg.org/#concept-url-fragment">fragment</a>, |
| 37 | + the <code>#</code> separator and the fragment identifier string should |
| 38 | + <strong>not</strong> be part of the input. |
| 39 | + Instead, the fragment identifier has the meaning and behavior |
| 40 | + as it would e.g. with an <code>http</code> URL. |
58 | 41 |
|
59 |
| -<p> |
60 |
| - To <b>obtain a resource</b> from a |
61 |
| - <a href="http://url.spec.whatwg.org/#concept-parsed-url">parsed URL</a> |
62 |
| - with the "<code>data</code>" scheme, |
63 |
| - run these steps: |
| 42 | + <li> |
| 43 | + Although it is often not necessary, |
| 44 | + <a href="https://url.spec.whatwg.org/#percent-encoded-bytes">percent-encoding</a> |
| 45 | + still applies to base64-encoded <code>data</code> URLs. |
64 | 46 |
|
65 |
| -<ol> |
66 | 47 | <li>
|
67 |
| - Let <i>input</i> be the URL’s |
68 |
| - <a href="http://url.spec.whatwg.org/#concept-url-scheme-data"> |
69 |
| - scheme data</a>. |
| 48 | + The first U+002C comma of the input separates the MIME type from the data. |
| 49 | + Does this still apply in that comma is inside a MIME quoted string for a parameter value? |
| 50 | + Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code> |
| 51 | + |
70 | 52 | <li>
|
71 |
| - If the URL’s |
72 |
| - <a href="http://url.spec.whatwg.org/#concept-url-query">query</a> |
73 |
| - is not null, append "<code>?</code>" and the query to <i>input</i>. |
| 53 | + What about a percent-encoded comma? |
| 54 | + Example: <code>data:text/plain;foo=bar%2Cbaz;charset=utf8,body</code> |
| 55 | + |
74 | 56 | <li>
|
75 |
| - If <i>input</i> does not contain a U+002C COMMA code point, |
76 |
| - return a failure and abort these steps. |
77 |
| - <p class="note"> |
78 |
| - The comma can come either from the scheme data or the query. |
| 57 | + <p>How strictly should the parser look for <code>;base64</code>? |
| 58 | + Examples: |
| 59 | + <pre>data:text/plain;base64,Rm9vCg== |
| 60 | +data:text/plain; base64,Rm9vCg== |
| 61 | +data:text/plain;base64 ,Rm9vCg== |
| 62 | +data:text/plain;base 64,Rm9vCg== |
| 63 | +data:text/plain;Base64,Rm9vCg== |
| 64 | +data:text/plain;%62ase64,Rm9vCg== |
| 65 | +data:text/plain%3Bbase64,Rm9vCg== |
| 66 | +</pre> |
| 67 | + <p>When RFC 2397 says: |
| 68 | + <blockquote> |
| 69 | + The ";base64" extension is distinguishable from a content-type |
| 70 | + parameter by the fact that it doesn't have a following "=" sign. |
| 71 | + </blockquote> |
| 72 | + <p>Does this mean that other MIME parsing rules apply? |
| 73 | + |
79 | 74 | <li>
|
80 |
| - Split <i>input</i> at the first comma. |
81 |
| - Let <i>mime_type</i> and <i>body</i> be the parts before and after the comma, |
82 |
| - respectively. |
83 |
| - <p class="XXX"> |
84 |
| - What if the comma is an a MIME quoted string for a parameter value? |
85 |
| - Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code> |
| 75 | + How should percent-encoding interact with MIME type parsing? |
| 76 | + Examples: |
| 77 | + <pre>data:text/plain;charset=utf8,%F0%9F%92%A9 |
| 78 | +data:text/plain%3Bcharset=utf8,%F0%9F%92%A9 |
| 79 | +data:text/plain;charset%3Dutf8,%F0%9F%92%A9 |
| 80 | +data:text/plain;charset="utf8%22,%F0%9F%92%A9 |
| 81 | +data:text/plain;charset=utf8,%F0%9F%92%A9 |
| 82 | +</pre> |
| 83 | + |
86 | 84 | <li>
|
87 |
| - Let <i>data</i> be the result of running |
88 |
| - <a href="http://url.spec.whatwg.org/#percent-decode">percent decode</a> |
89 |
| - on <i>body</i>. |
| 85 | + Although RFC 2397 doesn’t bother with a normative reference, |
| 86 | + base64 in IETF-land is defined by <a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>, |
| 87 | + which defines both <em>The Base 64 Alphabet</em> |
| 88 | + and <em>The "URL and Filename safe" Base 64 Alphabet</em>. |
| 89 | + Which of them should be used? |
| 90 | + The former looks like the one to be used by default, but the latter sounds kinda relevant. |
| 91 | + Or should both of them be accepted? |
| 92 | + |
90 | 93 | <li>
|
91 |
| - If <i>mime_type</i> ends with "<code>;base64</code>" then: |
92 |
| - <p class="XXX"> |
93 |
| - Match how strictly? Case sensitive or not? |
94 |
| - Allow whitespace? Percent-encoding? |
95 |
| - <ol> |
96 |
| - <li> |
97 |
| - Remove the matched substring from <i>mime_type</i> |
98 |
| - <li> |
99 |
| - Set <i>data</i> to the result of decoding <i>data</i> with the |
100 |
| - <a href="https://tools.ietf.org/html/rfc4648#section-4">Base 64 |
101 |
| - Encoding</a>. |
102 |
| - <p class="XXX"> |
103 |
| - Return a failure on "invalid" base64? |
104 |
| - What is invalid? |
105 |
| - Also accept the <i>URL and Filename Safe Alphabet</i>? |
106 |
| - Mixed alphabets in the same body? |
107 |
| - Ignore which non-alphabet bytes? |
108 |
| - Missing/too little/too much padding? |
109 |
| - </ol> |
| 94 | + What should happen to non-alphabet characters in base64 data? |
| 95 | + Options include ignoring them, or making parsing fail (return the equivalent of a network error.) |
| 96 | + Should this differ for whitespace and other non-alphabet characters? |
| 97 | + |
110 | 98 | <li>
|
111 |
| - Return <i>mime_type</i> and <i>data</i>. |
112 |
| -</ol> |
113 |
| - |
114 |
| -<p class="XXX"> |
115 |
| - <b>TODO:</b> The algorithm is missing this part of RFC2397: |
116 |
| - |
117 |
| - <q>If <mediatype> is omitted, |
118 |
| - it defaults to text/plain;charset=US-ASCII. |
119 |
| - As a shorthand, "text/plain" can be omitted |
120 |
| - but the charset parameter supplied.</q> |
121 |
| - |
122 |
| -<p class="note"> |
123 |
| - This definition does not impose any length limit on data: URLs. |
124 |
| - |
125 |
| -<p class="note"> |
126 |
| - When doing <a href="http://url.spec.whatwg.org/#concept-url-parser"> |
127 |
| - URL parsing</a> followed by this algorithm, |
128 |
| - implementation are allowed to skip some intermediate steps |
129 |
| - in order to process large URLs efficiently, |
130 |
| - as long as the "black box" behavior the same. |
| 99 | + What should happen if base64 data has too little padding (including none) or too much? |
| 100 | + |
| 101 | +</ul> |
0 commit comments