@@ -260,12 +260,11 @@ The above TOML maps to the following JSON.
260
260
## String
261
261
262
262
There are four ways to express strings: basic, multi-line basic, literal, and
263
- multi-line literal. All strings must contain only Unicode characters .
263
+ multi-line literal. All strings must be encoded as UTF-8 .
264
264
265
- ** Basic strings** are surrounded by quotation marks (` " ` ). Any Unicode character
266
- may be used except those that must be escaped: quotation mark, backslash, and
267
- the control characters other than tab (U+0000 to U+0008, U+000A to U+001F,
268
- U+007F).
265
+ ** Basic strings** are surrounded by quotation marks (` " ` ). Any codepoint may be
266
+ used except those that must be escaped: quotation mark, backslash, and the
267
+ control characters other than tab (U+0000 to U+0008, U+000A to U+001F, U+007F).
269
268
270
269
``` toml
271
270
str = " I'm a string. \" You can quote me\" . Name\t Jos\x E9\n Location\t SF."
@@ -282,19 +281,18 @@ For convenience, some popular characters have a compact escape sequence.
282
281
\e - escape (U+001B)
283
282
\" - quote (U+0022)
284
283
\\ - backslash (U+005C)
285
- \xHH - unicode (U+00HH)
286
- \uHHHH - unicode (U+HHHH)
287
- \UHHHHHHHH - unicode (U+HHHHHHHH)
284
+ \xHH - codepoint (U+00HH)
285
+ \uHHHH - codepoint (U+HHHH)
286
+ \UHHHHHHHH - codepoint (U+HHHHHHHH)
288
287
```
289
288
290
- Any Unicode character may be escaped with the ` \xHH ` , ` \uHHHH ` , or ` \UHHHHHHHH `
289
+ Any codepoint may be escaped with the ` \xHH ` , ` \uHHHH ` , or ` \UHHHHHHHH `
291
290
forms. The escape codes must be Unicode
292
291
[ scalar values] ( https://unicode.org/glossary/#unicode_scalar_value ) .
293
292
294
- Keep in mind that all TOML strings are sequences of Unicode characters, _ not_
295
- byte sequences. For binary data, avoid using these escape codes. Instead,
296
- external binary-to-text encoding strategies, like hexadecimal sequences or
297
- [ Base64] ( https://www.base64decode.org/ ) , are recommended for converting between
293
+ All TOML strings are UTF-8 encoded, _ not_ byte sequences. For binary data, avoid
294
+ using these escape codes. Instead, external binary-to-text encoding strategies,
295
+ like hexadecimal sequences or base64, are recommended for converting between
298
296
bytes and strings.
299
297
300
298
All other escape sequences not listed above are reserved; if they are used, TOML
@@ -307,6 +305,11 @@ like to break up a very long string into multiple lines. TOML makes this easy.
307
305
side and allow newlines. A newline immediately following the opening delimiter
308
306
will be trimmed. All other whitespace and newline characters remain intact.
309
307
308
+ Any codepoint may be used except those that must be escaped: backslash and the
309
+ control characters other than tab, line feed, and carriage return (U+0000 to
310
+ U+0008, U+000B, U+000C, U+000E to U+001F, U+007F). Carriage returns (U+000D) are
311
+ only allowed as part of a newline sequence.
312
+
310
313
``` toml
311
314
str1 = """
312
315
Roses are red
@@ -349,11 +352,6 @@ str3 = """\
349
352
"""
350
353
```
351
354
352
- Any Unicode character may be used except those that must be escaped: backslash
353
- and the control characters other than tab, line feed, and carriage return
354
- (U+0000 to U+0008, U+000B, U+000C, U+000E to U+001F, U+007F). Carriage returns
355
- (U+000D) are only allowed as part of a newline sequence.
356
-
357
355
You can write a quotation mark, or two adjacent quotation marks, anywhere inside
358
356
a multi-line basic string. They can also be written just inside the delimiters.
359
357
@@ -371,8 +369,10 @@ If you're a frequent specifier of Windows paths or regular expressions, then
371
369
having to escape backslashes quickly becomes tedious and error-prone. To help,
372
370
TOML supports literal strings which do not allow escaping at all.
373
371
374
- ** Literal strings** are surrounded by single quotes. Like basic strings, they
375
- must appear on a single line:
372
+ ** Literal strings** are surrounded by single quotes and don't support ` \ `
373
+ escapes. Any codepoint may be used except for control characters other than tab.
374
+
375
+ Like basic strings, they must appear on a single line:
376
376
377
377
``` toml
378
378
# What you see is what you get.
@@ -383,11 +383,13 @@ regex = '<\i\c*\s*>'
383
383
```
384
384
385
385
Since there is no escaping, there is no way to write a single quote inside a
386
- literal string enclosed by single quotes. Luckily, TOML supports a multi-line
387
- version of literal strings that solves this problem.
386
+ literal string enclosed by single quotes. TOML supports a multi-line version of
387
+ literal strings that solves this problem.
388
388
389
389
** Multi-line literal strings** are surrounded by three single quotes on each
390
- side and allow newlines. Like literal strings, there is no escaping whatsoever.
390
+ side and allow newlines. Like literal strings, there are ` \ ` escapes. Any
391
+ codepoint may be used except for control characters other than tab.
392
+
391
393
A newline immediately following the opening delimiter will be trimmed. TOML
392
394
parsers must normalize newlines in the same manner as multi-line basic strings.
393
395
@@ -417,8 +419,6 @@ apos15 = "Here are fifteen apostrophes: '''''''''''''''"
417
419
str = ''' 'That,' she said, 'is still pointless.''''
418
420
```
419
421
420
- Control characters other than tab are not permitted in a literal string.
421
-
422
422
## Integer
423
423
424
424
Integers are whole numbers. Positive numbers may be prefixed with a plus sign.
0 commit comments