Skip to content

Commit 82fe8eb

Browse files
committed
Stop pretending this is a spec, make it a collection of issues instead.
1 parent d78fd5e commit 82fe8eb

File tree

2 files changed

+144
-182
lines changed

2 files changed

+144
-182
lines changed

index.html

Lines changed: 74 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,101 @@
11
<!DOCTYPE html><html lang="en"><meta charset="utf-8">
2-
<title>The "data" URL scheme</title>
3-
<link href="http://www.whatwg.org/style/specification" rel="stylesheet">
2+
<title>Ambiguities in the "data" URL scheme</title>
3+
<link href="https://www.whatwg.org/style/specification" rel="stylesheet">
44

5-
<div class="head">
6-
<h1>The <code>data</code> URL scheme</h1>
5+
<h1>Ambiguities in the <code>data</code> URL scheme</h1>
76

8-
<dl>
9-
<dt>Latest version:
10-
<dd><a href="http://simonsapin.github.com/data-urls/">http://simonsapin.github.com/data-urls/</a>
7+
<p>Last updated on 11 November 2014.
118

12-
<dt>This version:
13-
<dd>Last updated on 13 March 2013.
9+
<dl>
10+
<dt>Feedback:</dt>
11+
<dd><a href="https://github.com/SimonSapin/data-urls/issues">File an issue</a>
12+
<dd><a href="https://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a>
13+
</dl>
1414

15-
<dt>Participate:</dt>
16-
<dd><a href="https://github.com/SimonSapin/data-urls/issues">File a bug</a>
17-
<dd><a href="http://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a>
18-
19-
<dt>Version History:
20-
<dd><a href="https://github.com/SimonSapin/data-urls/commits">https://github.com/SimonSapin/data-urls/commits</a>
21-
22-
<dt>Editor:
23-
<dd><a href="http://exyr.org/">Simon Sapin</a>
24-
</dl>
25-
26-
<p class="copyright">
27-
<a href="http://creativecommons.org/publicdomain/zero/1.0/" rel="license">
28-
<img alt="CC0" src="http://i.creativecommons.org/p/zero/1.0/80x15.png"></a>
29-
To the extent possible under law, the editors have waived all copyright and
30-
related or neighboring rights to this work.
31-
32-
</div>
33-
34-
35-
<h2 id="introduction"><span class="secno">1 </span>Introduction</h2>
3615

3716
<p>
3817
The <code>data</code> URL scheme is defined by
3918
<a href="http://tools.ietf.org/html/rfc2397">RFC 2397</a>,
4019
which unfortunately is vague regarding many details of the syntax.
41-
This document describes a more precise parsing algorithm for
42-
<code>data:</code> URLs.
20+
This document lists some of the details that should be specified in a future,
21+
more precise specification.
4322

4423
<p>
4524
See also
4625
<a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494">Bug 19494</a>
4726
on the W3C Bugzilla
4827
and other stuff linked from there.
4928

29+
<ul>
30+
<li>
31+
If the URL has a <a href="http://url.spec.whatwg.org/#concept-url-query">query</a>,
32+
the <code>?</code> separator and the query string should be part of the input
33+
to the <code>data</code> URL parsing algorithm.
5034

51-
<h2 id="“fetching”-a-data:-url"><span class="secno">2 </span>“Fetching” a <code>data:</code> URL</h2>
52-
53-
<p>
54-
This algorithm returns either a failure or two byte strings:
55-
a MIME type with parameters
56-
(as it would appear in a <var>Content-Type</var> HTTP header)
57-
and the decoded data.
35+
<li>
36+
However if the URL has a <a href="http://url.spec.whatwg.org/#concept-url-fragment">fragment</a>,
37+
the <code>#</code> separator and the fragment identifier string should
38+
<strong>not</strong> be part of the input.
39+
Instead, the fragment identifier has the meaning and behavior
40+
as it would e.g. with an <code>http</code> URL.
5841

59-
<p>
60-
To <b>obtain a resource</b> from a
61-
<a href="http://url.spec.whatwg.org/#concept-parsed-url">parsed URL</a>
62-
with the "<code>data</code>" scheme,
63-
run these steps:
42+
<li>
43+
Although it is often not necessary,
44+
<a href="https://url.spec.whatwg.org/#percent-encoded-bytes">percent-encoding</a>
45+
still applies to base64-encoded <code>data</code> URLs.
6446

65-
<ol>
6647
<li>
67-
Let <i>input</i> be the URL’s
68-
<a href="http://url.spec.whatwg.org/#concept-url-scheme-data">
69-
scheme data</a>.
48+
The first U+002C comma of the input separates the MIME type from the data.
49+
Does this still apply in that comma is inside a MIME quoted string for a parameter value?
50+
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
51+
7052
<li>
71-
If the URL’s
72-
<a href="http://url.spec.whatwg.org/#concept-url-query">query</a>
73-
is not null, append "<code>?</code>" and the query to <i>input</i>.
53+
What about a percent-encoded comma?
54+
Example: <code>data:text/plain;foo=bar%2Cbaz;charset=utf8,body</code>
55+
7456
<li>
75-
If <i>input</i> does not contain a U+002C COMMA code point,
76-
return a failure and abort these steps.
77-
<p class="note">
78-
The comma can come either from the scheme data or the query.
57+
<p>How strictly should the parser look for <code>;base64</code>?
58+
Examples:
59+
<pre>data:text/plain;base64,Rm9vCg==
60+
data:text/plain; base64,Rm9vCg==
61+
data:text/plain;base64 ,Rm9vCg==
62+
data:text/plain;base 64,Rm9vCg==
63+
data:text/plain;Base64,Rm9vCg==
64+
data:text/plain;%62ase64,Rm9vCg==
65+
data:text/plain%3Bbase64,Rm9vCg==
66+
</pre>
67+
<p>When RFC 2397 says:
68+
<blockquote>
69+
The ";base64" extension is distinguishable from a content-type
70+
parameter by the fact that it doesn't have a following "=" sign.
71+
</blockquote>
72+
<p>Does this mean that other MIME parsing rules apply?
73+
7974
<li>
80-
Split <i>input</i> at the first comma.
81-
Let <i>mime_type</i> and <i>body</i> be the parts before and after the comma,
82-
respectively.
83-
<p class="XXX">
84-
What if the comma is an a MIME quoted string for a parameter value?
85-
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
75+
How should percent-encoding interact with MIME type parsing?
76+
Examples:
77+
<pre>data:text/plain;charset=utf8,%F0%9F%92%A9
78+
data:text/plain%3Bcharset=utf8,%F0%9F%92%A9
79+
data:text/plain;charset%3Dutf8,%F0%9F%92%A9
80+
data:text/plain;charset="utf8%22,%F0%9F%92%A9
81+
data:text/plain;charset=utf8,%F0%9F%92%A9
82+
</pre>
83+
8684
<li>
87-
Let <i>data</i> be the result of running
88-
<a href="http://url.spec.whatwg.org/#percent-decode">percent decode</a>
89-
on <i>body</i>.
85+
Although RFC 2397 doesn’t bother with a normative reference,
86+
base64 in IETF-land is defined by <a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>,
87+
which defines both <em>The Base 64 Alphabet</em>
88+
and <em>The "URL and Filename safe" Base 64 Alphabet</em>.
89+
Which of them should be used?
90+
The former looks like the one to be used by default, but the latter sounds kinda relevant.
91+
Or should both of them be accepted?
92+
9093
<li>
91-
If <i>mime_type</i> ends with "<code>;base64</code>" then:
92-
<p class="XXX">
93-
Match how strictly? Case sensitive or not?
94-
Allow whitespace? Percent-encoding?
95-
<ol>
96-
<li>
97-
Remove the matched substring from <i>mime_type</i>
98-
<li>
99-
Set <i>data</i> to the result of decoding <i>data</i> with the
100-
<a href="https://tools.ietf.org/html/rfc4648#section-4">Base 64
101-
Encoding</a>.
102-
<p class="XXX">
103-
Return a failure on "invalid" base64?
104-
What is invalid?
105-
Also accept the <i>URL and Filename Safe Alphabet</i>?
106-
Mixed alphabets in the same body?
107-
Ignore which non-alphabet bytes?
108-
Missing/too little/too much padding?
109-
</ol>
94+
What should happen to non-alphabet characters in base64 data?
95+
Options include ignoring them, or making parsing fail (return the equivalent of a network error.)
96+
Should this differ for whitespace and other non-alphabet characters?
97+
11098
<li>
111-
Return <i>mime_type</i> and <i>data</i>.
112-
</ol>
113-
114-
<p class="XXX">
115-
<b>TODO:</b> The algorithm is missing this part of RFC2397:
116-
117-
<q>If &lt;mediatype&gt; is omitted,
118-
it defaults to text/plain;charset=US-ASCII.
119-
As a shorthand, "text/plain" can be omitted
120-
but the charset parameter supplied.</q>
121-
122-
<p class="note">
123-
This definition does not impose any length limit on data: URLs.
124-
125-
<p class="note">
126-
When doing <a href="http://url.spec.whatwg.org/#concept-url-parser">
127-
URL parsing</a> followed by this algorithm,
128-
implementation are allowed to skip some intermediate steps
129-
in order to process large URLs efficiently,
130-
as long as the "black box" behavior the same.
99+
What should happen if base64 data has too little padding (including none) or too much?
100+
101+
</ul>

index.src.html

Lines changed: 70 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
<!doctype html>
22
<html lang=en>
33
<meta charset=utf-8>
4-
<title>The "data" URL scheme</title>
5-
<link rel=stylesheet href=http://www.whatwg.org/style/specification>
4+
<title>Ambiguities in the "data" URL scheme</title>
5+
<link rel=stylesheet href=https://www.whatwg.org/style/specification>
66

7-
<div class=head>
8-
<h1>The <code>data</code> URL scheme</h1>
7+
<h1>Ambiguities in the <code>data</code> URL scheme</h1>
98

109
<p>Last updated on [DATE].
1110

@@ -16,99 +15,91 @@ <h1>The <code>data</code> URL scheme</h1>
1615
</dl>
1716

1817

19-
<h2>Introduction</h2>
20-
2118
<p>
2219
The <code>data</code> URL scheme is defined by
2320
<a href="http://tools.ietf.org/html/rfc2397">RFC 2397</a>,
2421
which unfortunately is vague regarding many details of the syntax.
25-
This document describes a more precise parsing algorithm for
26-
<code>data:</code> URLs.
22+
This document lists some of the details that should be specified in a future,
23+
more precise specification.
2724

2825
<p>
2926
See also
3027
<a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494">Bug 19494</a>
3128
on the W3C Bugzilla
3229
and other stuff linked from there.
3330

31+
<ul>
32+
<li>
33+
If the URL has a <a href="http://url.spec.whatwg.org/#concept-url-query">query</a>,
34+
the <code>?</code> separator and the query string should be part of the input
35+
to the <code>data</code> URL parsing algorithm.
3436

35-
<h2>“Fetching” a <code>data:</code> URL</h2>
36-
37-
<p>
38-
This algorithm returns either a failure or two byte strings:
39-
a MIME type with parameters
40-
(as it would appear in a <var>Content-Type</var> HTTP header)
41-
and the decoded data.
37+
<li>
38+
However if the URL has a <a href="http://url.spec.whatwg.org/#concept-url-fragment">fragment</a>,
39+
the <code>#</code> separator and the fragment identifier string should
40+
<strong>not</strong> be part of the input.
41+
Instead, the fragment identifier has the meaning and behavior
42+
as it would e.g. with an <code>http</code> URL.
4243

43-
<p>
44-
To <b>obtain a resource</b> from a
45-
<a href="http://url.spec.whatwg.org/#concept-parsed-url">parsed URL</a>
46-
with the "<code>data</code>" scheme,
47-
run these steps:
44+
<li>
45+
Although it is often not necessary,
46+
<a href="https://url.spec.whatwg.org/#percent-encoded-bytes">percent-encoding</a>
47+
still applies to base64-encoded <code>data</code> URLs.
4848

49-
<ol>
5049
<li>
51-
Let <i>input</i> be the URL’s
52-
<a href="http://url.spec.whatwg.org/#concept-url-scheme-data">
53-
scheme data</a>.
50+
The first U+002C comma of the input separates the MIME type from the data.
51+
Does this still apply in that comma is inside a MIME quoted string for a parameter value?
52+
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
53+
5454
<li>
55-
If the URL’s
56-
<a href="http://url.spec.whatwg.org/#concept-url-query">query</a>
57-
is not null, append "<code>?</code>" and the query to <i>input</i>.
55+
What about a percent-encoded comma?
56+
Example: <code>data:text/plain;foo=bar%2Cbaz;charset=utf8,body</code>
57+
5858
<li>
59-
If <i>input</i> does not contain a U+002C COMMA code point,
60-
return a failure and abort these steps.
61-
<p class=note>
62-
The comma can come either from the scheme data or the query.
59+
<p>How strictly should the parser look for <code>;base64</code>?
60+
Examples:
61+
<pre>
62+
data:text/plain;base64,Rm9vCg==
63+
data:text/plain; base64,Rm9vCg==
64+
data:text/plain;base64 ,Rm9vCg==
65+
data:text/plain;base 64,Rm9vCg==
66+
data:text/plain;Base64,Rm9vCg==
67+
data:text/plain;%62ase64,Rm9vCg==
68+
data:text/plain%3Bbase64,Rm9vCg==
69+
</pre>
70+
<p>When RFC 2397 says:
71+
<blockquote>
72+
The ";base64" extension is distinguishable from a content-type
73+
parameter by the fact that it doesn't have a following "=" sign.
74+
</blockquote>
75+
<p>Does this mean that other MIME parsing rules apply?
76+
6377
<li>
64-
Split <i>input</i> at the first comma.
65-
Let <i>mime_type</i> and <i>body</i> be the parts before and after the comma,
66-
respectively.
67-
<p class=XXX>
68-
What if the comma is an a MIME quoted string for a parameter value?
69-
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
78+
How should percent-encoding interact with MIME type parsing?
79+
Examples:
80+
<pre>
81+
data:text/plain;charset=utf8,%F0%9F%92%A9
82+
data:text/plain%3Bcharset=utf8,%F0%9F%92%A9
83+
data:text/plain;charset%3Dutf8,%F0%9F%92%A9
84+
data:text/plain;charset="utf8%22,%F0%9F%92%A9
85+
data:text/plain;charset=utf8,%F0%9F%92%A9
86+
</pre>
87+
7088
<li>
71-
Let <i>data</i> be the result of running
72-
<a href="http://url.spec.whatwg.org/#percent-decode">percent decode</a>
73-
on <i>body</i>.
89+
Although RFC 2397 doesn’t bother with a normative reference,
90+
base64 in IETF-land is defined by <a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>,
91+
which defines both <em>The Base 64 Alphabet</em>
92+
and <em>The "URL and Filename safe" Base 64 Alphabet</em>.
93+
Which of them should be used?
94+
The former looks like the one to be used by default, but the latter sounds kinda relevant.
95+
Or should both of them be accepted?
96+
7497
<li>
75-
If <i>mime_type</i> ends with "<code>;base64</code>" then:
76-
<p class=XXX>
77-
Match how strictly? Case sensitive or not?
78-
Allow whitespace? Percent-encoding?
79-
<ol>
80-
<li>
81-
Remove the matched substring from <i>mime_type</i>
82-
<li>
83-
Set <i>data</i> to the result of decoding <i>data</i> with the
84-
<a href="https://tools.ietf.org/html/rfc4648#section-4">Base 64
85-
Encoding</a>.
86-
<p class=XXX>
87-
Return a failure on "invalid" base64?
88-
What is invalid?
89-
Also accept the <i>URL and Filename Safe Alphabet</i>?
90-
Mixed alphabets in the same body?
91-
Ignore which non-alphabet bytes?
92-
Missing/too little/too much padding?
93-
</ol>
98+
What should happen to non-alphabet characters in base64 data?
99+
Options include ignoring them, or making parsing fail (return the equivalent of a network error.)
100+
Should this differ for whitespace and other non-alphabet characters?
101+
94102
<li>
95-
Return <i>mime_type</i> and <i>data</i>.
96-
</ol>
97-
98-
<p class=XXX>
99-
<b>TODO:</b> The algorithm is missing this part of RFC2397:
100-
101-
<q>If &lt;mediatype&gt; is omitted,
102-
it defaults to text/plain;charset=US-ASCII.
103-
As a shorthand, "text/plain" can be omitted
104-
but the charset parameter supplied.</q>
105-
106-
<p class=note>
107-
This definition does not impose any length limit on data: URLs.
108-
109-
<p class=note>
110-
When doing <a href="http://url.spec.whatwg.org/#concept-url-parser">
111-
URL&nbsp;parsing</a> followed by this algorithm,
112-
implementation are allowed to skip some intermediate steps
113-
in order to process large URLs efficiently,
114-
as long as the "black box" behavior the same.
103+
What should happen if base64 data has too little padding (including none) or too much?
104+
105+
</ul>

0 commit comments

Comments
 (0)