Skip to content

Commit 36ef3c8

Browse files
authored
Define data: URL processing
Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail. Tests: web-platform-tests/wpt#6890. Fixes #234.
1 parent 14858d3 commit 36ef3c8

File tree

1 file changed

+93
-24
lines changed

1 file changed

+93
-24
lines changed

fetch.bs

Lines changed: 93 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,6 @@ url: https://tools.ietf.org/html/rfc7234#section-1.2.1;text:delta-seconds;type:d
6161
"publisher": "IETF",
6262
"title": "HTTP Client Hints"
6363
},
64-
"DATAURL": {
65-
"authors": ["Simon Sapin"],
66-
"href": "https://simonsapin.github.io/data-urls/",
67-
"title": "The data URL scheme"
68-
},
6964
"HTTPVERBSEC1": {
7065
"publisher": "US-CERT",
7166
"href": "https://www.kb.cert.org/vuls/id/867593",
@@ -151,13 +146,14 @@ of abstraction.
151146

152147
<p>This specification depends on the Infra Standard. [[!INFRA]]
153148

154-
<p>This specification uses terminology from the ABNF, Encoding, HTML, HTTP, IDL, Streams, and URL
155-
Standards.
149+
<p>This specification uses terminology from the ABNF, Encoding, HTML, HTTP, IDL, MIME Sniffing,
150+
Streams, and URL Standards.
156151
[[!ABNF]]
157152
[[!ENCODING]]
158153
[[!HTML]]
159154
[[!HTTP]]
160155
[[!WEBIDL]]
156+
[[!MIMESNIFF]]
161157
[[!STREAMS]]
162158
[[!URL]]
163159

@@ -2983,23 +2979,21 @@ steps:
29832979

29842980
<dt>"<code>data</code>"
29852981
<dd>
2986-
<p>If <a href=https://simonsapin.github.io/data-urls/>obtaining a resource</a> from
2987-
<var>request</var>'s <a for=request>current url</a> does not return
2988-
failure, then return a <a for=/>response</a> whose
2989-
<a for=response>header list</a> consist of a single
2990-
<a for=/>header</a> whose <a for=header>name</a> is
2991-
`<code>Content-Type</code>` and <a for=header>value</a> is the
2992-
MIME type and parameters returned from
2993-
<a href=https://simonsapin.github.io/data-urls/>obtaining a resource</a>,
2994-
<a for=response>body</a> is the data returned from
2995-
<a href=https://simonsapin.github.io/data-urls/>obtaining a resource</a>, and
2996-
<a for=response>HTTPS state</a> is <var>request</var>'s
2997-
<a for=request>client</a>'s <a for="environment settings object">HTTPS state</a>
2998-
if <var>request</var>'s <a for=request>client</a> is non-null.
2999-
[[!DATAURL]]
3000-
<!-- XXX "obtaining a resource" needs a better reference -->
3001-
3002-
<p>Otherwise, return a <a>network error</a>.
2982+
<ol>
2983+
<li><p>Let <var>dataURLStruct</var> be the result of running the
2984+
<a><code>data:</code> URL processor</a> on <var>request</var>'s <a for=request>current url</a>.
2985+
2986+
<li><p>If <var>dataURLStruct</var> is failure, then return a <a>network error</a>.
2987+
2988+
<li><p>Return a <a for=/>response</a> whose <a for=response>header list</a> consist of a single
2989+
<a for=/>header</a> whose <a for=header>name</a> is `<code>Content-Type</code>` and
2990+
<a for=header>value</a> is <var>dataURLStruct</var>'s <a for="data: URL struct">MIME type</a>,
2991+
<a lt="serialize a MIME type to bytes">serialized</a>, whose <a for=response>body</a> is
2992+
<var>dataURLStruct</var>'s <a for="data: URL struct">body</a>, and whose
2993+
<a for=response>HTTPS state</a> is <var>request</var>'s <a for=request>client</a>'s
2994+
<a for="environment settings object">HTTPS state</a> if <var>request</var>'s
2995+
<a for=request>client</a> is non-null.
2996+
</ol>
30032997

30042998
<dt>"<code>file</code>"
30052999
<dt>"<code>ftp</code>"
@@ -6055,6 +6049,78 @@ if the script checks that the URL has the right hostname.
60556049

60566050

60576051

6052+
<h2 id=data-urls><code>data:</code> URLs</h2>
6053+
6054+
<p>For an informative description of <code>data:</code> URLs, see RFC 2397. This section replaces
6055+
that RFC's normative processing requirements to be compatible with deployed content. [[RFC2397]]
6056+
6057+
<p>A <dfn><code>data:</code> URL struct</dfn> is a <a>struct</a> that consists of a
6058+
<dfn for="data: URL struct">MIME type</dfn> (a <a for=/>MIME type</a>) and a
6059+
<dfn for="data: URL struct">body</dfn> (a <a>byte sequence</a>).
6060+
6061+
<p>The <dfn export><code>data:</code> URL processor</dfn> takes a <a for=/>URL</a>
6062+
<var>dataURL</var> and then runs these steps:
6063+
6064+
<ol>
6065+
<li><p>Assert: <var>dataURL</var>'s <a for=url>scheme</a> is "<code>data</code>".
6066+
6067+
<li><p>Let <var>input</var> be the result of running the <a>URL serializer</a> on
6068+
<var>dataURL</var> with the <i>exclude fragment flag</i> set.
6069+
6070+
<li><p>Remove the leading "<code>data:</code>" string from <var>input</var>.
6071+
6072+
<li><p>Let <var>position</var> point at the start of <var>input</var>.
6073+
6074+
<li><p>Let <var>mimeType</var> be the result of <a>collecting a sequence of code points</a> that
6075+
are not equal to U+002C (,), given <var>position</var>.
6076+
6077+
<li>
6078+
<p><a>Strip leading and trailing ASCII whitespace</a> from <var>mimeType</var>.
6079+
6080+
<p class="note">This will only remove U+0020 SPACE <a>code points</a>, if any.
6081+
6082+
<li><p>If <var>position</var> is past the end of <var>input</var>, then return failure.
6083+
6084+
<li><p>Advance <var>position</var> by 1.
6085+
6086+
<li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.
6087+
6088+
<li><p>Let <var>body</var> be the <a>string percent decoding</a> of <var>encodedBody</var>.
6089+
6090+
<li>
6091+
<p>If <var>mimeType</var> ends with U+003B (;), followed by zero or more U+0020 SPACE, followed by
6092+
an <a>ASCII case-insensitive</a> match for "<code>base64</code>", then:
6093+
6094+
<ol>
6095+
<li><p>Let <var>stringBody</var> be the <a>isomorphic decode</a> of <var>body</var>.
6096+
6097+
<li><p>Set <var>body</var> to the <a>forgiving-base64 decode</a> of <var>stringBody</var>.
6098+
6099+
<li><p>If <var>body</var> is failure, then return failure.
6100+
6101+
<li><p>Remove the last 6 <a>code points</a> from <var>mimeType</var>.
6102+
6103+
<li><p>Remove trailing U+0020 SPACE <a>code points</a> from <var>mimeType</var>, if any.
6104+
6105+
<li><p>Remove the last U+003B (;) <a>code point</a> from <var>mimeType</var>.
6106+
</ol>
6107+
6108+
<li><p>If <var>mimeType</var> starts with U+003B (;), then prepend "<code>text/plain</code>"
6109+
to <var>mimeType</var>.
6110+
6111+
<li><p>Let <var>mimeTypeRecord</var> be the result of <a lt="parse a MIME type">parsing</a>
6112+
<var>mimeType</var>.
6113+
6114+
<li><p>If <var>mimeTypeRecord</var> is failure, then set <var>mimeTypeRecord</var> to
6115+
<code>text/plain;charset=US-ASCII</code>.
6116+
6117+
<li><p>Return a new <a><code>data:</code> URL struct</a> whose
6118+
<a for="data: URL struct">MIME type</a> is <var>mimeTypeRecord</var> and
6119+
<a for="data: URL struct">body</a> is <var>body</var>.
6120+
</ol>
6121+
6122+
6123+
60586124
<h2 id=background-reading class=no-num>Background reading</h2>
60596125

60606126
<p><em>This section and its subsections are informative only.</em>
@@ -6175,6 +6241,7 @@ Brad Porter,
61756241
Bryan Smith,
61766242
Caitlin Potter,
61776243
Cameron McCormack,
6244+
Chris Rebert,
61786245
Clement Pellerin,
61796246
Collin Jackson,
61806247
Daniel Robertson,
@@ -6231,6 +6298,7 @@ Jxck,
62316298
Keith Yeung,
62326299
Kenji Baheux,
62336300
Lachlan Hunt,
6301+
Larry Masinter,
62346302
Liam Brummitt,
62356303
Louis Ryan,
62366304
Lucas Gonze,
@@ -6276,6 +6344,7 @@ Sharath Udupa,
62766344
Shivakumar Jagalur Matt,
62776345
Sigbjørn Finne,
62786346
Simon Pieters,
6347+
Simon Sapin,
62796348
Srirama Chandra Sekhar Mogali,
62806349
Steven Salat,
62816350
Sunava Dutta,

0 commit comments

Comments
 (0)