draft-ietf-cbor-7049bis-04.txt | draft-ietf-cbor-7049bis-05.txt | |||
---|---|---|---|---|
Network Working Group C. Bormann | Network Working Group C. Bormann | |||
Internet-Draft Universitaet Bremen TZI | Internet-Draft Universitaet Bremen TZI | |||
Intended status: Standards Track P. Hoffman | Intended status: Standards Track P. Hoffman | |||
Expires: April 26, 2019 ICANN | Expires: July 19, 2019 ICANN | |||
October 23, 2018 | January 15, 2019 | |||
Concise Binary Object Representation (CBOR) | Concise Binary Object Representation (CBOR) | |||
draft-ietf-cbor-7049bis-04 | draft-ietf-cbor-7049bis-05 | |||
Abstract | Abstract | |||
The Concise Binary Object Representation (CBOR) is a data format | The Concise Binary Object Representation (CBOR) is a data format | |||
whose design goals include the possibility of extremely small code | whose design goals include the possibility of extremely small code | |||
size, fairly small message size, and extensibility without the need | size, fairly small message size, and extensibility without the need | |||
for version negotiation. These design goals make it different from | for version negotiation. These design goals make it different from | |||
earlier binary serializations such as ASN.1 and MessagePack. | earlier binary serializations such as ASN.1 and MessagePack. | |||
Contributing | Contributing | |||
skipping to change at page 1, line 47 ¶ | skipping to change at page 1, line 47 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 26, 2019. | This Internet-Draft will expire on July 19, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | |||
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 8 | 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | |||
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | |||
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 11 | 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 12 | |||
3.2.1. Indefinite-Length Arrays and Maps . . . . . . . . . . 12 | 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 12 | |||
3.2.2. Indefinite-Length Byte Strings and Text Strings . . . 14 | 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 12 | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 14 | ||||
3.3. Floating-Point Numbers and Values with No Content . . . . 15 | 3.3. Floating-Point Numbers and Values with No Content . . . . 15 | |||
3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 16 | 3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 16 | |||
3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 18 | 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 18 | |||
3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 18 | 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 18 | |||
3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 18 | 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 18 | |||
3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 20 | 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 20 | |||
3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 21 | 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 21 | |||
3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 21 | 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 21 | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON | |||
Converters . . . . . . . . . . . . . . . . . . . 21 | Converters . . . . . . . . . . . . . . . . . . . 22 | |||
3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 22 | 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 22 | |||
3.4.7. Self-Describe CBOR . . . . . . . . . . . . . . . . . 22 | 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 23 | |||
4. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 23 | 4. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 23 | |||
4.1. CBOR in Streaming Applications . . . . . . . . . . . . . 24 | 4.1. CBOR in Streaming Applications . . . . . . . . . . . . . 24 | |||
4.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 24 | 4.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 24 | |||
4.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 25 | 4.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 25 | |||
4.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 25 | 4.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 25 | |||
4.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 25 | 4.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 26 | |||
4.3.3. Unknown Additional Information Values . . . . . . . . 26 | 4.3.3. Unknown Additional Information Values . . . . . . . . 26 | |||
4.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 26 | 4.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 26 | |||
4.5. Handling Unknown Simple Values and Tags . . . . . . . . . 27 | 4.5. Handling Unknown Simple Values and Tags . . . . . . . . . 27 | |||
4.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 27 | 4.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
4.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 28 | 4.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 28 | |||
4.7.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 29 | 4.7.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 29 | |||
4.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 30 | 4.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 30 | |||
4.9. Preferred Serialization . . . . . . . . . . . . . . . . . 30 | 4.9. Preferred Serialization . . . . . . . . . . . . . . . . . 30 | |||
4.10. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 31 | 4.10. Canonically Encoded CBOR . . . . . . . . . . . . . . . . 31 | |||
4.10.1. Length-first map key ordering . . . . . . . . . . . 33 | 4.10.1. Length-first map key ordering . . . . . . . . . . . 33 | |||
4.11. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 34 | 4.11. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 34 | |||
5. Converting Data between CBOR and JSON . . . . . . . . . . . . 35 | 5. Converting Data between CBOR and JSON . . . . . . . . . . . . 35 | |||
5.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 35 | 5.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 36 | |||
5.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 37 | 5.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 37 | |||
6. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 37 | 6. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 38 | |||
6.1. Extension Points . . . . . . . . . . . . . . . . . . . . 38 | 6.1. Extension Points . . . . . . . . . . . . . . . . . . . . 38 | |||
6.2. Curating the Additional Information Space . . . . . . . . 39 | 6.2. Curating the Additional Information Space . . . . . . . . 39 | |||
7. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 39 | 7. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 39 | |||
7.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 40 | 7.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 40 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 | |||
8.1. Simple Values Registry . . . . . . . . . . . . . . . . . 41 | 8.1. Simple Values Registry . . . . . . . . . . . . . . . . . 41 | |||
8.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 41 | 8.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 42 | |||
8.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 42 | 8.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 42 | |||
8.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 42 | 8.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 43 | |||
8.5. The +cbor Structured Syntax Suffix Registration . . . . . 43 | 8.5. The +cbor Structured Syntax Suffix Registration . . . . . 43 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 44 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 44 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 44 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 45 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 45 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 46 | 11.2. Informative References . . . . . . . . . . . . . . . . . 46 | |||
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 48 | Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 48 | |||
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 52 | Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 52 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 55 | Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 55 | |||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 57 | Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 57 | |||
Appendix E. Comparison of Other Binary Formats to CBOR's Design | Appendix E. Comparison of Other Binary Formats to CBOR's Design | |||
Objectives . . . . . . . . . . . . . . . . . . . . . 58 | Objectives . . . . . . . . . . . . . . . . . . . . . 58 | |||
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 59 | E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 59 | |||
skipping to change at page 9, line 51 ¶ | skipping to change at page 10, line 15 ¶ | |||
The meaning of this argument depends on the major type. For example, | The meaning of this argument depends on the major type. For example, | |||
in major type 0, the argument is the value of the data item itself | in major type 0, the argument is the value of the data item itself | |||
(and in major type 1 the value of the data item is computed from the | (and in major type 1 the value of the data item is computed from the | |||
argument); in major type 2 and 3 it gives the length of the string | argument); in major type 2 and 3 it gives the length of the string | |||
data in bytes that follows; and in major types 4 and 5 it is used to | data in bytes that follows; and in major types 4 and 5 it is used to | |||
determine the number of data items enclosed. | determine the number of data items enclosed. | |||
If the encoded sequence of bytes ends before the end of a data item | If the encoded sequence of bytes ends before the end of a data item | |||
would be reached, that encoding is not well-formed. If the encoded | would be reached, that encoding is not well-formed. If the encoded | |||
sequence of bytes still has bytes remaining after the outermost | sequence of bytes still has bytes remaining after the outermost | |||
encoded item is parsed, that encoding is not a single well-formed | encoded item is decoded, that encoding is not a single well-formed | |||
CBOR item. | CBOR item. | |||
A CBOR decoder implementation can be based on a jump table with all | A CBOR decoder implementation can be based on a jump table with all | |||
256 defined values for the initial byte (Table 5). A decoder in a | 256 defined values for the initial byte (Table 5). A decoder in a | |||
constrained implementation can instead use the structure of the | constrained implementation can instead use the structure of the | |||
initial byte and following bytes for more compact code (see | initial byte and following bytes for more compact code (see | |||
Appendix C for a rough impression of how this could look). | Appendix C for a rough impression of how this could look). | |||
3.1. Major Types | 3.1. Major Types | |||
skipping to change at page 12, line 5 ¶ | skipping to change at page 12, line 17 ¶ | |||
Four CBOR items (arrays, maps, byte strings, and text strings) can be | Four CBOR items (arrays, maps, byte strings, and text strings) can be | |||
encoded with an indefinite length using additional information value | encoded with an indefinite length using additional information value | |||
31. This is useful if the encoding of the item needs to begin before | 31. This is useful if the encoding of the item needs to begin before | |||
the number of items inside the array or map, or the total length of | the number of items inside the array or map, or the total length of | |||
the string, is known. (The application of this is often referred to | the string, is known. (The application of this is often referred to | |||
as "streaming" within a data item.) | as "streaming" within a data item.) | |||
Indefinite-length arrays and maps are dealt with differently than | Indefinite-length arrays and maps are dealt with differently than | |||
indefinite-length byte strings and text strings. | indefinite-length byte strings and text strings. | |||
3.2.1. Indefinite-Length Arrays and Maps | 3.2.1. The "break" Stop Code | |||
Indefinite-length arrays and maps are simply opened without | The "break" stop code is encoded with major type 7 and additional | |||
indicating the number of data items that will be included in the | information value 31 (0b111_11111). It is not itself a data item: it | |||
array or map, using the additional information value of 31. The | is just a syntactic feature to close an indefinite-length item. | |||
initial major type and additional information byte is followed by the | ||||
elements of the array or map, just as they would be in other arrays | If the "break" stop code appears anywhere where a data item is | |||
or maps. The end of the array or map is indicated by encoding a | expected, other than directly inside an indefinite-length string, | |||
"break" stop code in a place where the next data item would normally | array, or map -- for example directly inside a definite-length array | |||
have been included. The "break" is encoded with major type 7 and | or map -- the enclosing item is not well-formed. | |||
additional information value 31 (0b111_11111) but is not itself a | ||||
data item: it is just a syntactic feature to close the array or map. | 3.2.2. Indefinite-Length Arrays and Maps | |||
That is, the "break" stop code comes after the last item in the array | ||||
or map, and it cannot occur anywhere else in place of a data item. | Indefinite-length arrays and maps are represented using their major | |||
In this way, indefinite-length arrays and maps look identical to | type with the additional information value of 31, followed by an | |||
arbitrary-length sequence of items for an array or key/value pairs | ||||
for a map, followed by the "break" stop code (Section 3.2.1). In | ||||
other words, indefinite-length arrays and maps look identical to | ||||
other arrays and maps except for beginning with the additional | other arrays and maps except for beginning with the additional | |||
information value 31 and ending with the "break" stop code. | information value of 31 and ending with the "break" stop code. | |||
Arrays and maps with indefinite lengths allow any number of items | If the break stop code appears after a key in a map, in place of that | |||
(for arrays) and key/value pairs (for maps) to be given before the | key's value, the map is not well-formed. | |||
"break" stop code. There is no restriction against nesting | ||||
indefinite-length array or map items. A "break" only terminates a | There is no restriction against nesting indefinite-length array or | |||
single item, so nested indefinite-length items need exactly as many | map items. A "break" only terminates a single item, so nested | |||
"break" stop codes as there are type bytes starting an indefinite- | indefinite-length items need exactly as many "break" stop codes as | |||
length item. | there are type bytes starting an indefinite-length item. | |||
For example, assume an encoder wants to represent the abstract array | For example, assume an encoder wants to represent the abstract array | |||
[1, [2, 3], [4, 5]]. The definite-length encoding would be | [1, [2, 3], [4, 5]]. The definite-length encoding would be | |||
0x8301820203820405: | 0x8301820203820405: | |||
83 -- Array of length 3 | 83 -- Array of length 3 | |||
01 -- 1 | 01 -- 1 | |||
82 -- Array of length 2 | 82 -- Array of length 2 | |||
02 -- 2 | 02 -- 2 | |||
03 -- 3 | 03 -- 3 | |||
skipping to change at page 14, line 15 ¶ | skipping to change at page 14, line 29 ¶ | |||
0xbf6346756ef563416d7421ff | 0xbf6346756ef563416d7421ff | |||
BF -- Start indefinite-length map | BF -- Start indefinite-length map | |||
63 -- First key, UTF-8 string length 3 | 63 -- First key, UTF-8 string length 3 | |||
46756e -- "Fun" | 46756e -- "Fun" | |||
F5 -- First value, true | F5 -- First value, true | |||
63 -- Second key, UTF-8 string length 3 | 63 -- Second key, UTF-8 string length 3 | |||
416d74 -- "Amt" | 416d74 -- "Amt" | |||
21 -- Second value, -2 | 21 -- Second value, -2 | |||
FF -- "break" | FF -- "break" | |||
3.2.2. Indefinite-Length Byte Strings and Text Strings | 3.2.3. Indefinite-Length Byte Strings and Text Strings | |||
Indefinite-length byte strings and text strings are actually a | Indefinite-length strings are represented by a byte containing the | |||
concatenation of zero or more definite-length byte or text strings | major type and additional information value of 31, followed by a | |||
("chunks") that are together treated as one contiguous string. | series of byte or text strings ("chunks") that have definite lengths, | |||
Indefinite-length strings are opened with the major type and | followed by the "break" stop code (Section 3.2.1). The data item | |||
additional information value of 31, but what follows are a series of | represented by the indefinite-length string is the concatenation of | |||
byte or text strings that have definite lengths (the chunks). The | the chunks. | |||
end of the series of chunks is indicated by encoding the "break" stop | ||||
code (0b111_11111) in a place where the next chunk in the series | ||||
would occur. The contents of the chunks are concatenated together, | ||||
and the overall length of the indefinite-length string will be the | ||||
sum of the lengths of all of the chunks. In summary, an indefinite- | ||||
length string is encoded similarly to how an indefinite-length array | ||||
of its chunks would be encoded, except that the major type of the | ||||
indefinite-length string is that of a (text or byte) string and | ||||
matches the major types of its chunks. | ||||
For indefinite-length byte strings, every data item (chunk) between | If any item between the indefinite-length string indicator | |||
the indefinite-length indicator and the "break" MUST be a definite- | (0b010_11111 or 0b011_11111) and the "break" stop code is not a | |||
length byte string item; if the parser sees any item type other than | definite-length string item of the same major type, the string is not | |||
a byte string before it sees the "break", it is an error. | well-formed. | |||
If any definite-length text string inside an indefinite-length text | ||||
string is invalid, the indefinite-length text string is invalid. | ||||
Note that this implies that the bytes of a single UTF-8 character | ||||
cannot be spread between chunks: a new chunk can only be started at a | ||||
character boundary. | ||||
For example, assume the sequence: | For example, assume the sequence: | |||
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 | 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 | |||
5F -- Start indefinite-length byte string | 5F -- Start indefinite-length byte string | |||
44 -- Byte string of length 4 | 44 -- Byte string of length 4 | |||
aabbccdd -- Bytes content | aabbccdd -- Bytes content | |||
43 -- Byte string of length 3 | 43 -- Byte string of length 3 | |||
eeff99 -- Bytes content | eeff99 -- Bytes content | |||
FF -- "break" | FF -- "break" | |||
After decoding, this results in a single byte string with seven | After decoding, this results in a single byte string with seven | |||
bytes: 0xaabbccddeeff99. | bytes: 0xaabbccddeeff99. | |||
skipping to change at page 15, line 5 ¶ | skipping to change at page 15, line 14 ¶ | |||
5F -- Start indefinite-length byte string | 5F -- Start indefinite-length byte string | |||
44 -- Byte string of length 4 | 44 -- Byte string of length 4 | |||
aabbccdd -- Bytes content | aabbccdd -- Bytes content | |||
43 -- Byte string of length 3 | 43 -- Byte string of length 3 | |||
eeff99 -- Bytes content | eeff99 -- Bytes content | |||
FF -- "break" | FF -- "break" | |||
After decoding, this results in a single byte string with seven | After decoding, this results in a single byte string with seven | |||
bytes: 0xaabbccddeeff99. | bytes: 0xaabbccddeeff99. | |||
Text strings with indefinite lengths act the same as byte strings | ||||
with indefinite lengths, except that all their chunks MUST be | ||||
definite-length text strings. Note that this implies that the bytes | ||||
of a single UTF-8 character cannot be spread between chunks: a new | ||||
chunk can only be started at a character boundary. | ||||
3.3. Floating-Point Numbers and Values with No Content | 3.3. Floating-Point Numbers and Values with No Content | |||
Major type 7 is for two types of data: floating-point numbers and | Major type 7 is for two types of data: floating-point numbers and | |||
"simple values" that do not need any content. Each value of the | "simple values" that do not need any content. Each value of the | |||
5-bit additional information in the initial byte has its own separate | 5-bit additional information in the initial byte has its own separate | |||
meaning, as defined in Table 1. Like the major types for integers, | meaning, as defined in Table 1. Like the major types for integers, | |||
items of this major type do not carry content data; all the | items of this major type do not carry content data; all the | |||
information is in the initial bytes. | information is in the initial bytes. | |||
+-------------+--------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
| 5-Bit Value | Semantics | | | 5-Bit | Semantics | | |||
+-------------+--------------------------------------------------+ | | Value | | | |||
| 0..23 | Simple value (value 0..23) | | +------------+------------------------------------------------------+ | |||
| | | | | 0..23 | Simple value (value 0..23) | | |||
| 24 | Simple value (value 32..255 in following byte) | | | | | | |||
| | | | | 24 | Simple value (value 32..255 in following byte) | | |||
| 25 | IEEE 754 Half-Precision Float (16 bits follow) | | | | | | |||
| | | | | 25 | IEEE 754 Half-Precision Float (16 bits follow) | | |||
| 26 | IEEE 754 Single-Precision Float (32 bits follow) | | | | | | |||
| | | | | 26 | IEEE 754 Single-Precision Float (32 bits follow) | | |||
| 27 | IEEE 754 Double-Precision Float (64 bits follow) | | | | | | |||
| | | | | 27 | IEEE 754 Double-Precision Float (64 bits follow) | | |||
| 28-30 | (Unassigned) | | | | | | |||
| | | | | 28-30 | (Unassigned) | | |||
| 31 | "break" stop code for indefinite-length items | | | | | | |||
+-------------+--------------------------------------------------+ | | 31 | "break" stop code for indefinite-length items | | |||
| | (Section 3.2.1) | | ||||
+------------+------------------------------------------------------+ | ||||
Table 1: Values for Additional Information in Major Type 7 | Table 1: Values for Additional Information in Major Type 7 | |||
As with all other major types, the 5-bit value 24 signifies a single- | As with all other major types, the 5-bit value 24 signifies a single- | |||
byte extension: it is followed by an additional byte to represent the | byte extension: it is followed by an additional byte to represent the | |||
simple value. (To minimize confusion, only the values 32 to 255 are | simple value. (To minimize confusion, only the values 32 to 255 are | |||
used.) This maintains the structure of the initial bytes: as for the | used.) This maintains the structure of the initial bytes: as for the | |||
other major types, the length of these always depends on the | other major types, the length of these always depends on the | |||
additional information in the first byte. Table 2 lists the values | additional information in the first byte. Table 2 lists the values | |||
assigned and available for simple types. | assigned and available for simple types. | |||
skipping to change at page 16, line 25 ¶ | skipping to change at page 16, line 25 ¶ | |||
| | | | | | | | |||
| 23 | Undefined value | | | 23 | Undefined value | | |||
| | | | | | | | |||
| 24..31 | (Reserved) | | | 24..31 | (Reserved) | | |||
| | | | | | | | |||
| 32..255 | (Unassigned) | | | 32..255 | (Unassigned) | | |||
+---------+-----------------+ | +---------+-----------------+ | |||
Table 2: Simple Values | Table 2: Simple Values | |||
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | ||||
IEEE 754 binary floating-point values [IEEE.754.2008]. These | ||||
floating-point values are encoded in the additional bytes of the | ||||
appropriate size. (See Appendix D for some information about 16-bit | ||||
floating point.) | ||||
An encoder MUST NOT encode False as the two-byte sequence of 0xf814, | An encoder MUST NOT encode False as the two-byte sequence of 0xf814, | |||
MUST NOT encode True as the two-byte sequence of 0xf815, MUST NOT | MUST NOT encode True as the two-byte sequence of 0xf815, MUST NOT | |||
encode Null as the two-byte sequence of 0xf816, and MUST NOT encode | encode Null as the two-byte sequence of 0xf816, and MUST NOT encode | |||
Undefined value as the two-byte sequence of 0xf817. A decoder MUST | Undefined value as the two-byte sequence of 0xf817. A decoder MUST | |||
treat these two-byte sequences as an error. Similar prohibitions | treat these two-byte sequences as an error. Similar prohibitions | |||
apply to the unassigned simple values as well. | apply to the unassigned simple values as well. | |||
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | ||||
IEEE 754 binary floating-point values [IEEE.754.2008]. These | ||||
floating-point values are encoded in the additional bytes of the | ||||
appropriate size. (See Appendix D for some information about 16-bit | ||||
floating point.) | ||||
3.4. Optional Tagging of Items | 3.4. Optional Tagging of Items | |||
In CBOR, a data item can optionally be preceded by a tag to give it | In CBOR, a data item can optionally be preceded by a tag to give it | |||
additional semantics while retaining its structure. The tag is major | additional semantics while retaining its structure. The tag is major | |||
type 6, and represents an integer number as indicated by the tag's | type 6, and represents an integer number as indicated by the tag's | |||
argument (Section 3); the (sole) data item is carried as content | argument (Section 3); the (sole) data item is carried as content | |||
data. If a tag requires structured data, this structure is encoded | data. If a tag requires structured data, this structure is encoded | |||
into the nested data item. The definition of a tag usually restricts | into the nested data item. The definition of a tag usually restricts | |||
what kinds of nested data item or items are valid. | what kinds of nested data item or items are valid. | |||
skipping to change at page 17, line 33 ¶ | skipping to change at page 17, line 33 ¶ | |||
value. The content of the tagged item is the data item (the value) | value. The content of the tagged item is the data item (the value) | |||
that is being tagged. | that is being tagged. | |||
IANA maintains a registry of tag values as described in Section 8.2. | IANA maintains a registry of tag values as described in Section 8.2. | |||
Table 3 provides a list of initial values, with definitions in the | Table 3 provides a list of initial values, with definitions in the | |||
rest of this section. | rest of this section. | |||
+-----------+--------------+----------------------------------------+ | +-----------+--------------+----------------------------------------+ | |||
| Tag | Data Item | Semantics | | | Tag | Data Item | Semantics | | |||
+-----------+--------------+----------------------------------------+ | +-----------+--------------+----------------------------------------+ | |||
| 0 | UTF-8 string | Standard date/time string; see Section | | | 0 | UTF-8 string | Standard date/time string; see | | |||
| | | 3.4.2 | | | | | Section 3.4.2 | | |||
| | | | | | | | | | |||
| 1 | multiple | Epoch-based date/time; see Section | | | 1 | multiple | Epoch-based date/time; see | | |||
| | | 3.4.3 | | | | | Section 3.4.3 | | |||
| | | | | | | | | | |||
| 2 | byte string | Positive bignum; see Section 3.4.4 | | | 2 | byte string | Positive bignum; see Section 3.4.4 | | |||
| | | | | | | | | | |||
| 3 | byte string | Negative bignum; see Section 3.4.4 | | | 3 | byte string | Negative bignum; see Section 3.4.4 | | |||
| | | | | | | | | | |||
| 4 | array | Decimal fraction; see Section 3.4.5 | | | 4 | array | Decimal fraction; see Section 3.4.5 | | |||
| | | | | | | | | | |||
| 5 | array | Bigfloat; see Section 3.4.5 | | | 5 | array | Bigfloat; see Section 3.4.5 | | |||
| | | | | | | | | | |||
| 6..20 | (Unassigned) | (Unassigned) | | | 6..20 | (Unassigned) | (Unassigned) | | |||
| | | | | | | | | | |||
| 21 | multiple | Expected conversion to base64url | | | 21 | multiple | Expected conversion to base64url | | |||
| | | encoding; see Section 3.4.6.2 | | | | | encoding; see Section 3.4.6.2 | | |||
| | | | | | | | | | |||
| 22 | multiple | Expected conversion to base64 | | | 22 | multiple | Expected conversion to base64 | | |||
| | | encoding; see Section 3.4.6.2 | | | | | encoding; see Section 3.4.6.2 | | |||
| | | | | | | | | | |||
| 23 | multiple | Expected conversion to base16 | | | 23 | multiple | Expected conversion to base16 | | |||
| | | encoding; see Section 3.4.6.2 | | | | | encoding; see Section 3.4.6.2 | | |||
| | | | | | | | | | |||
| 24 | byte string | Encoded CBOR data item; see Section | | | 24 | byte string | Encoded CBOR data item; see | | |||
| | | 3.4.6.1 | | | | | Section 3.4.6.1 | | |||
| | | | | | | | | | |||
| 25..31 | (Unassigned) | (Unassigned) | | | 25..31 | (Unassigned) | (Unassigned) | | |||
| | | | | | | | | | |||
| 32 | UTF-8 string | URI; see Section 3.4.6.3 | | | 32 | UTF-8 string | URI; see Section 3.4.6.3 | | |||
| | | | | | | | | | |||
| 33 | UTF-8 string | base64url; see Section 3.4.6.3 | | | 33 | UTF-8 string | base64url; see Section 3.4.6.3 | | |||
| | | | | | | | | | |||
| 34 | UTF-8 string | base64; see Section 3.4.6.3 | | | 34 | UTF-8 string | base64; see Section 3.4.6.3 | | |||
| | | | | | | | | | |||
| 35 | UTF-8 string | Regular expression; see Section | | | 35 | UTF-8 string | Regular expression; see | | |||
| | | 3.4.6.3 | | | | | Section 3.4.6.3 | | |||
| | | | | | | | | | |||
| 36 | UTF-8 string | MIME message; see Section 3.4.6.3 | | | 36 | UTF-8 string | MIME message; see Section 3.4.6.3 | | |||
| | | | | | | | | | |||
| 37..55798 | (Unassigned) | (Unassigned) | | | 37..55798 | (Unassigned) | (Unassigned) | | |||
| | | | | | | | | | |||
| 55799 | multiple | Self-describe CBOR; see Section 3.4.7 | | | 55799 | multiple | Self-described CBOR; see Section 3.4.7 | | |||
| | | | | | | | | | |||
| 55800+ | (Unassigned) | (Unassigned) | | | 55800+ | (Unassigned) | (Unassigned) | | |||
+-----------+--------------+----------------------------------------+ | +-----------+--------------+----------------------------------------+ | |||
Table 3: Values for Tags | Table 3: Values for Tags | |||
3.4.1. Date and Time | 3.4.1. Date and Time | |||
Protocols using tag values 0 and 1 extend the generic data model | Protocols using tag values 0 and 1 extend the generic data model | |||
(Section 2) with data items representing points in time. | (Section 2) with data items representing points in time. | |||
skipping to change at page 19, line 22 ¶ | skipping to change at page 19, line 22 ¶ | |||
known as UNIX Epoch time. Note that leap seconds are handled | known as UNIX Epoch time. Note that leap seconds are handled | |||
specially by POSIX time and this results in a 1 second discontinuity | specially by POSIX time and this results in a 1 second discontinuity | |||
several times per decade.) Note that applications that require the | several times per decade.) Note that applications that require the | |||
expression of times beyond early 2106 cannot leave out support of | expression of times beyond early 2106 cannot leave out support of | |||
64-bit integers for the tagged value. | 64-bit integers for the tagged value. | |||
Negative values (major type 1 and negative floating-point numbers) | Negative values (major type 1 and negative floating-point numbers) | |||
are interpreted as determined by the application requirements as | are interpreted as determined by the application requirements as | |||
there is no universal standard for UTC count-of-seconds time before | there is no universal standard for UTC count-of-seconds time before | |||
1970-01-01T00:00Z (this is particularly true for points in time that | 1970-01-01T00:00Z (this is particularly true for points in time that | |||
precede discontinuities in national calendars). | precede discontinuities in national calendars). The same applies to | |||
non-finite values. | ||||
To indicate fractional seconds, floating point values can be used | To indicate fractional seconds, floating point values can be used | |||
within Tag 1 instead of integer values. Note that this generally | within Tag 1 instead of integer values. Note that this generally | |||
requires binary64 support, as binary16 and binary32 provide non-zero | requires binary64 support, as binary16 and binary32 provide non-zero | |||
fractions of seconds only for a short period of time around early | fractions of seconds only for a short period of time around early | |||
1970. An application that requires Tag 1 support may restrict the | 1970. An application that requires Tag 1 support may restrict the | |||
tagged value to be an integer (or a floating-point value) only. | tagged value to be an integer (or a floating-point value) only. | |||
3.4.4. Bignums | 3.4.4. Bignums | |||
Protocols using tag values 2 and 3 extend the generic data model | Protocols using tag values 2 and 3 extend the generic data model | |||
(Section 2) with "bignums" representing arbitrary integers. In the | (Section 2) with "bignums" representing arbitrarily sized integers. | |||
generic data model, bignum values are not equal to integers from the | In the generic data model, bignum values are not equal to integers | |||
basic data model, but specific data models can define that | from the basic data model, but specific data models can define that | |||
equivalence. | equivalence, and preferred encoding never makes use of bignums that | |||
also can be expressed as basic integers (see below). | ||||
Bignums are encoded as a byte string data item, which is interpreted | Bignums are encoded as a byte string data item, which is interpreted | |||
as an unsigned integer n in network byte order. For tag value 2, the | as an unsigned integer n in network byte order. For tag value 2, the | |||
value of the bignum is n. For tag value 3, the value of the bignum | value of the bignum is n. For tag value 3, the value of the bignum | |||
is -1 - n. Decoders that understand these tags MUST be able to | is -1 - n. The preferred encoding of the byte string is to leave out | |||
decode bignums that have leading zeroes. | any leading zeroes (note that this means the preferred encoding for | |||
n = 0 is the empty byte string, but see below). Decoders that | ||||
understand these tags MUST be able to decode bignums that do have | ||||
leading zeroes. The preferred encoding of an integer that can be | ||||
represented using major type 0 or 1 is to encode it this way instead | ||||
of as a bignum (which means that the empty string never occurs in a | ||||
bignum when using preferred encoding). Note that this means the non- | ||||
preferred choice of a bignum representation instead of a basic | ||||
integer for encoding a number is not intended to have application | ||||
semantics (just as the choice of a longer basic integer | ||||
representation than needed, such as 0x1800 for 0x00 does not). | ||||
For example, the number 18446744073709551616 (2**64) is represented | For example, the number 18446744073709551616 (2**64) is represented | |||
as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major | as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major | |||
type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 | type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 | |||
and eight bytes 0x00). In hexadecimal: | and eight bytes 0x00). In hexadecimal: | |||
C2 -- Tag 2 | C2 -- Tag 2 | |||
49 -- Byte string of length 9 | 49 -- Byte string of length 9 | |||
010000000000000000 -- Bytes content | 010000000000000000 -- Bytes content | |||
skipping to change at page 21, line 35 ¶ | skipping to change at page 21, line 48 ¶ | |||
3.4.6. Content Hints | 3.4.6. Content Hints | |||
The tags in this section are for content hints that might be used by | The tags in this section are for content hints that might be used by | |||
generic CBOR processors. These content hints do not extend the | generic CBOR processors. These content hints do not extend the | |||
generic data model. | generic data model. | |||
3.4.6.1. Encoded CBOR Data Item | 3.4.6.1. Encoded CBOR Data Item | |||
Sometimes it is beneficial to carry an embedded CBOR data item that | Sometimes it is beneficial to carry an embedded CBOR data item that | |||
is not meant to be decoded immediately at the time the enclosing data | is not meant to be decoded immediately at the time the enclosing data | |||
item is being parsed. Tag 24 (CBOR data item) can be used to tag the | item is being decoded. Tag 24 (CBOR data item) can be used to tag | |||
embedded byte string as a data item encoded in CBOR format. | the embedded byte string as a data item encoded in CBOR format. | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | |||
Tags 21 to 23 indicate that a byte string might require a specific | Tags 21 to 23 indicate that a byte string might require a specific | |||
encoding when interoperating with a text-based representation. These | encoding when interoperating with a text-based representation. These | |||
tags are useful when an encoder knows that the byte string data it is | tags are useful when an encoder knows that the byte string data it is | |||
writing is likely to be later converted to a particular JSON-based | writing is likely to be later converted to a particular JSON-based | |||
usage. That usage specifies that some strings are encoded as base64, | usage. That usage specifies that some strings are encoded as base64, | |||
base64url, and so on. The encoder uses byte strings instead of doing | base64url, and so on. The encoder uses byte strings instead of doing | |||
the encoding itself to reduce the message size, to reduce the code | the encoding itself to reduce the message size, to reduce the code | |||
size of the encoder, or both. The encoder does not know whether or | size of the encoder, or both. The encoder does not know whether or | |||
not the converter will be generic, and therefore wants to say what it | not the converter will be generic, and therefore wants to say what it | |||
believes is the proper way to convert binary strings to JSON. | believes is the proper way to convert binary strings to JSON. | |||
The data item tagged can be a byte string or any other data item. In | The data item tagged can be a byte string or any other data item. In | |||
the latter case, the tag applies to all of the byte string data items | the latter case, the tag applies to all of the byte string data items | |||
contained in the data item, except for those contained in a nested | contained in the data item, except for those contained in a nested | |||
data item tagged with an expected conversion. | data item tagged with an expected conversion. | |||
These three tag types suggest conversions to three of the base data | These three tag types suggest conversions to three of the base data | |||
encodings defined in [RFC4648]. For base64url encoding, padding is | encodings defined in [RFC4648]. For base64url encoding (tag 21), | |||
not used (see Section 3.2 of RFC 4648); that is, all trailing equals | padding is not used (see Section 3.2 of RFC 4648); that is, all | |||
signs ("=") are removed from the base64url-encoded string. Later | trailing equals signs ("=") are removed from the encoded string. For | |||
tags might be defined for other data encodings of RFC 4648 or for | base64 encoding (tag 22), padding is used as defined in RFC 4648. | |||
other ways to encode binary data in strings. | For both base64url and base64, padding bits are set to zero (see | |||
Section 3.5 of RFC 4648), and encoding is performed without the | ||||
inclusion of any line breaks, whitespace, or other additional | ||||
characters. Note that, for all three tags, the encoding of the empty | ||||
byte string is the empty text string. | ||||
3.4.6.3. Encoded Text | 3.4.6.3. Encoded Text | |||
Some text strings hold data that have formats widely used on the | Some text strings hold data that have formats widely used on the | |||
Internet, and sometimes those formats can be validated and presented | Internet, and sometimes those formats can be validated and presented | |||
to the application in appropriate form by the decoder. There are | to the application in appropriate form by the decoder. There are | |||
tags for some of these formats. | tags for some of these formats. | |||
o Tag 32 is for URIs, as defined in [RFC3986]; | o Tag 32 is for URIs, as defined in [RFC3986]; | |||
skipping to change at page 22, line 44 ¶ | skipping to change at page 23, line 12 ¶ | |||
expression, or more than just the text of the regular expression | expression, or more than just the text of the regular expression | |||
itself, need to be conveyed.) | itself, need to be conveyed.) | |||
o Tag 36 is for MIME messages (including all headers), as defined in | o Tag 36 is for MIME messages (including all headers), as defined in | |||
[RFC2045]; | [RFC2045]; | |||
Note that tags 33 and 34 differ from 21 and 22 in that the data is | Note that tags 33 and 34 differ from 21 and 22 in that the data is | |||
transported in base-encoded form for the former and in raw byte | transported in base-encoded form for the former and in raw byte | |||
string form for the latter. | string form for the latter. | |||
3.4.7. Self-Describe CBOR | 3.4.7. Self-Described CBOR | |||
In many applications, it will be clear from the context that CBOR is | In many applications, it will be clear from the context that CBOR is | |||
being employed for encoding a data item. For instance, a specific | being employed for encoding a data item. For instance, a specific | |||
protocol might specify the use of CBOR, or a media type is indicated | protocol might specify the use of CBOR, or a media type is indicated | |||
that specifies its use. However, there may be applications where | that specifies its use. However, there may be applications where | |||
such context information is not available, such as when CBOR data is | such context information is not available, such as when CBOR data is | |||
stored in a file and disambiguating metadata is not in use. Here, it | stored in a file and disambiguating metadata is not in use. Here, it | |||
may help to have some distinguishing characteristics for the data | may help to have some distinguishing characteristics for the data | |||
itself. | itself. | |||
Tag 55799 is defined for this purpose. It does not impart any | Tag 55799 is defined for this purpose. It does not impart any | |||
special semantics on the data item that follows; that is, the | special semantics on the data item that follows; that is, the | |||
semantics of a data item tagged with tag 55799 is exactly identical | semantics of a data item tagged with tag 55799 is exactly identical | |||
to the semantics of the data item itself. | to the semantics of the data item itself. | |||
The serialization of this tag is 0xd9d9f7, which appears not to be in | The serialization of this tag is 0xd9d9f7, which appears not to be in | |||
use as a distinguishing mark for frequently used file types. In | use as a distinguishing mark for frequently used file types. In | |||
particular, it is not a valid start of a Unicode text in any Unicode | particular, it is not a valid start of a Unicode text in any Unicode | |||
encoding if followed by a valid CBOR data item. | encoding if followed by a valid CBOR data item. | |||
For instance, a decoder might be able to parse both CBOR and JSON. | For instance, a decoder might be able to decode both CBOR and JSON. | |||
Such a decoder would need to mechanically distinguish the two | Such a decoder would need to mechanically distinguish the two | |||
formats. An easy way for an encoder to help the decoder would be to | formats. An easy way for an encoder to help the decoder would be to | |||
tag the entire CBOR item with tag 55799, the serialization of which | tag the entire CBOR item with tag 55799, the serialization of which | |||
will never be found at the beginning of a JSON text. | will never be found at the beginning of a JSON text. | |||
4. Creating CBOR-Based Protocols | 4. Creating CBOR-Based Protocols | |||
Data formats such as CBOR are often used in environments where there | Data formats such as CBOR are often used in environments where there | |||
is no format negotiation. A specific design goal of CBOR is to not | is no format negotiation. A specific design goal of CBOR is to not | |||
need any included or assumed schema: a decoder can take a CBOR item | need any included or assumed schema: a decoder can take a CBOR item | |||
skipping to change at page 25, line 22 ¶ | skipping to change at page 25, line 34 ¶ | |||
generally can choose to completely fail the decoding (issue an error | generally can choose to completely fail the decoding (issue an error | |||
and/or stop processing altogether), substitute the problematic data | and/or stop processing altogether), substitute the problematic data | |||
and data items using a decoder-specific convention that clearly | and data items using a decoder-specific convention that clearly | |||
indicates there has been a problem, or take some other action. | indicates there has been a problem, or take some other action. | |||
4.3.1. Incomplete CBOR Data Items | 4.3.1. Incomplete CBOR Data Items | |||
The representation of a CBOR data item has a specific length, | The representation of a CBOR data item has a specific length, | |||
determined by its initial bytes and by the structure of any data | determined by its initial bytes and by the structure of any data | |||
items enclosed in the data items. If less data is available, this | items enclosed in the data items. If less data is available, this | |||
can be treated as a syntax error. A decoder may also implement | can be treated as a syntax error. A decoder may also decode | |||
incremental parsing, that is, decode the data item as far as it is | incrementally, that is, decode the data item as far as it is | |||
available and present the data found so far (such as in an event- | available and present the data found so far (such as in an event- | |||
based interface), with the option of continuing the decoding once | based interface), with the option of continuing the decoding once | |||
further data is available. | further data is available. | |||
Examples of incomplete data items include: | Examples of incomplete data items include: | |||
o A decoder expects a certain number of array or map entries but | o A decoder expects a certain number of array or map entries but | |||
instead encounters the end of the data. | instead encounters the end of the data. | |||
o A decoder processes what it expects to be the last pair in a map | o A decoder processes what it expects to be the last pair in a map | |||
skipping to change at page 26, line 16 ¶ | skipping to change at page 26, line 31 ¶ | |||
where there is no immediately enclosing (unclosed) indefinite-length | where there is no immediately enclosing (unclosed) indefinite-length | |||
item. | item. | |||
4.3.3. Unknown Additional Information Values | 4.3.3. Unknown Additional Information Values | |||
At the time of writing, some additional information values are | At the time of writing, some additional information values are | |||
unassigned and reserved for future versions of this document (see | unassigned and reserved for future versions of this document (see | |||
Section 6.2). Since the overall syntax for these additional | Section 6.2). Since the overall syntax for these additional | |||
information values is not yet defined, a decoder that sees an | information values is not yet defined, a decoder that sees an | |||
additional information value that it does not understand cannot | additional information value that it does not understand cannot | |||
continue parsing. | continue decoding. | |||
4.4. Other Decoding Errors | 4.4. Other Decoding Errors | |||
A CBOR data item may be syntactically well-formed but present a | A CBOR data item may be syntactically well-formed but present a | |||
problem with interpreting the data encoded in it in the CBOR data | problem with interpreting the data encoded in it in the CBOR data | |||
model. Generally speaking, a decoder that finds a data item with | model. Generally speaking, a decoder that finds a data item with | |||
such a problem might issue a warning, might stop processing | such a problem might issue a warning, might stop processing | |||
altogether, might handle the error and make the problematic value | altogether, might handle the error and make the problematic value | |||
available to the application as such, or take some other type of | available to the application as such, or take some other type of | |||
action. | action. | |||
skipping to change at page 31, line 13 ¶ | skipping to change at page 31, line 30 ¶ | |||
by definition variation-tolerant; the distinction is only relevant if | by definition variation-tolerant; the distinction is only relevant if | |||
a constrained implementation of a CBOR decoder meets a variant | a constrained implementation of a CBOR decoder meets a variant | |||
encoder. | encoder. | |||
The preferred serialization always uses the shortest form of | The preferred serialization always uses the shortest form of | |||
representing the argument (Section 3)); it also uses the shortest | representing the argument (Section 3)); it also uses the shortest | |||
floating point encoding that preserves the value being encoded (see | floating point encoding that preserves the value being encoded (see | |||
Section 4.6). Definite length encoding is preferred whenever the | Section 4.6). Definite length encoding is preferred whenever the | |||
length is known at the time the serialization of the item starts. | length is known at the time the serialization of the item starts. | |||
4.10. Canonical CBOR | 4.10. Canonically Encoded CBOR | |||
Some protocols may want encoders to only emit CBOR in a particular | Some protocols may want encoders to only emit CBOR in a particular | |||
canonical format; those protocols might also have the decoders check | canonical format; those protocols might also have the decoders check | |||
that their input is canonical. Those protocols are free to define | that their input is canonical. Those protocols are free to define | |||
what they mean by a canonical format and what encoders and decoders | what they mean by a canonical format and what encoders and decoders | |||
are expected to do. This section defines a set of restrictions that | are expected to do. This section defines a set of restrictions that | |||
can serve as the base of such a canonical format. | can serve as the base of such a canonical format. | |||
A CBOR encoding satisfies the "core canonicalization requirements" if | A CBOR encoding satisfies the "core canonicalization requirements" if | |||
it satisfies the following restrictions: | it satisfies the following restrictions: | |||
skipping to change at page 34, line 13 ¶ | skipping to change at page 34, line 30 ¶ | |||
4. 100, encoded as 0x1864. | 4. 100, encoded as 0x1864. | |||
5. "z", encoded as 0x617a. | 5. "z", encoded as 0x617a. | |||
6. [-1], encoded as 0x8120. | 6. [-1], encoded as 0x8120. | |||
7. "aa", encoded as 0x626161. | 7. "aa", encoded as 0x626161. | |||
8. [100], encoded as 0x811864. | 8. [100], encoded as 0x811864. | |||
4.11. Strict Mode | 4.11. Strict Decoding Mode | |||
Some areas of application of CBOR do not require canonicalization | Some areas of application of CBOR do not require canonicalization | |||
(Section 4.10) but may require that different decoders reach the same | (Section 4.10) but may require that different decoders reach the same | |||
(semantically equivalent) results, even in the presence of | (semantically equivalent) results, even in the presence of | |||
potentially malicious data. This can be required if one application | potentially malicious data. This can be required if one application | |||
(such as a firewall or other protecting entity) makes a decision | (such as a firewall or other protecting entity) makes a decision | |||
based on the data that another application, which independently | based on the data that another application, which independently | |||
decodes the data, relies on. | decodes the data, relies on. | |||
Normally, it is the responsibility of the sender to avoid ambiguously | Normally, it is the responsibility of the sender to avoid ambiguously | |||
skipping to change at page 39, line 4 ¶ | skipping to change at page 39, line 24 ¶ | |||
part of the codepoint space has been allocated, and the space is | part of the codepoint space has been allocated, and the space is | |||
abundant (although the early numbers are more efficient than the | abundant (although the early numbers are more efficient than the | |||
later ones). Implementations receiving an unknown tag can choose | later ones). Implementations receiving an unknown tag can choose | |||
to simply ignore it or to process it as an unknown tag wrapping | to simply ignore it or to process it as an unknown tag wrapping | |||
the following data item. The IANA registry in Section 8.2 is the | the following data item. The IANA registry in Section 8.2 is the | |||
appropriate way to address the extensibility of this codepoint | appropriate way to address the extensibility of this codepoint | |||
space. | space. | |||
o the "additional information" space. An implementation receiving | o the "additional information" space. An implementation receiving | |||
an unknown additional information value has no way to continue | an unknown additional information value has no way to continue | |||
parsing, so allocating codepoints to this space is a major step. | decoding, so allocating codepoints to this space is a major step. | |||
There are also very few codepoints left. | There are also very few codepoints left. | |||
6.2. Curating the Additional Information Space | 6.2. Curating the Additional Information Space | |||
The human mind is sometimes drawn to filling in little perceived gaps | The human mind is sometimes drawn to filling in little perceived gaps | |||
to make something neat. We expect the remaining gaps in the | to make something neat. We expect the remaining gaps in the | |||
codepoint space for the additional information values to be an | codepoint space for the additional information values to be an | |||
attractor for new ideas, just because they are there. | attractor for new ideas, just because they are there. | |||
The present specification does not manage the additional information | The present specification does not manage the additional information | |||
skipping to change at page 44, line 22 ¶ | skipping to change at page 44, line 49 ¶ | |||
vulnerabilities by reducing parser complexity, by giving the entire | vulnerabilities by reducing parser complexity, by giving the entire | |||
range of encodable values a meaning where possible. | range of encodable values a meaning where possible. | |||
Resource exhaustion attacks might attempt to lure a decoder into | Resource exhaustion attacks might attempt to lure a decoder into | |||
allocating very big data items (strings, arrays, maps) or exhaust the | allocating very big data items (strings, arrays, maps) or exhaust the | |||
stack depth by setting up deeply nested items. Decoders need to have | stack depth by setting up deeply nested items. Decoders need to have | |||
appropriate resource management to mitigate these attacks. (Items | appropriate resource management to mitigate these attacks. (Items | |||
for which very large sizes are given can also attempt to exploit | for which very large sizes are given can also attempt to exploit | |||
integer overflow vulnerabilities.) | integer overflow vulnerabilities.) | |||
Applications where a CBOR data item is examined by a gatekeeper | Protocols that are used in a security context should be defined in | |||
function and later used by a different application may exhibit | such a way that potential multiple interpretations are reliably | |||
vulnerabilities when multiple interpretations of the data item are | reduced to a single one. For example, an attacker could make use of | |||
possible. For example, an attacker could make use of duplicate keys | duplicate keys in maps or precision issues in numbers to make one | |||
in maps and precision issues in numbers to make the gatekeeper base | decoder base its decisions on a different interpretation than the one | |||
its decisions on a different interpretation than the one that will be | that will be used by a second decoder. To facilitate this, encoder | |||
used by the second application. Protocols that are used in a | and decoder implementations used in such contexts should provide at | |||
security context should be defined in such a way that these multiple | least one strict mode of operation (Section 4.11). | |||
interpretations are reliably reduced to a single one. To facilitate | ||||
this, encoder and decoder implementations used in such contexts | ||||
should provide at least one strict mode of operation (Section 4.11). | ||||
10. Acknowledgements | 10. Acknowledgements | |||
CBOR was inspired by MessagePack. MessagePack was developed and | CBOR was inspired by MessagePack. MessagePack was developed and | |||
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | |||
MessagePack is solely for attribution; CBOR is not intended as a | MessagePack is solely for attribution; CBOR is not intended as a | |||
version of or replacement for MessagePack, as it has different design | version of or replacement for MessagePack, as it has different design | |||
goals and requirements. | goals and requirements. | |||
The need for functionality beyond the original MessagePack | The need for functionality beyond the original MessagePack | |||
skipping to change at page 54, line 15 ¶ | skipping to change at page 54, line 15 ¶ | |||
| | | | | | | | |||
| 0xba | map (four-byte uint32_t for n, and then n pairs of | | | 0xba | map (four-byte uint32_t for n, and then n pairs of | | |||
| | data items follow) | | | | data items follow) | | |||
| | | | | | | | |||
| 0xbb | map (eight-byte uint64_t for n, and then n pairs of | | | 0xbb | map (eight-byte uint64_t for n, and then n pairs of | | |||
| | data items follow) | | | | data items follow) | | |||
| | | | | | | | |||
| 0xbf | map, pairs of data items follow, terminated by | | | 0xbf | map, pairs of data items follow, terminated by | | |||
| | "break" | | | | "break" | | |||
| | | | | | | | |||
| 0xc0 | Text-based date/time (data item follows; see Section | | | 0xc0 | Text-based date/time (data item follows; see | | |||
| | 3.4.2) | | | | Section 3.4.2) | | |||
| | | | | | | | |||
| 0xc1 | Epoch-based date/time (data item follows; see | | | 0xc1 | Epoch-based date/time (data item follows; see | | |||
| | Section 3.4.3) | | | | Section 3.4.3) | | |||
| | | | | | | | |||
| 0xc2 | Positive bignum (data item "byte string" follows) | | | 0xc2 | Positive bignum (data item "byte string" follows) | | |||
| | | | | | | | |||
| 0xc3 | Negative bignum (data item "byte string" follows) | | | 0xc3 | Negative bignum (data item "byte string" follows) | | |||
| | | | | | | | |||
| 0xc4 | Decimal Fraction (data item "array" follows; see | | | 0xc4 | Decimal Fraction (data item "array" follows; see | | |||
| | Section 3.4.5) | | | | Section 3.4.5) | | |||
| | | | | | | | |||
| 0xc5 | Bigfloat (data item "array" follows; see Section | | | 0xc5 | Bigfloat (data item "array" follows; see | | |||
| | 3.4.5) | | | | Section 3.4.5) | | |||
| | | | | | | | |||
| 0xc6..0xd4 | (tagged item) | | | 0xc6..0xd4 | (tagged item) | | |||
| | | | | | | | |||
| 0xd5..0xd7 | Expected Conversion (data item follows; see Section | | | 0xd5..0xd7 | Expected Conversion (data item follows; see | | |||
| | 3.4.6.2) | | | | Section 3.4.6.2) | | |||
| | | | | | | | |||
| 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data | | | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data | | |||
| | item follow) | | | | item follow) | | |||
| | | | | | | | |||
| 0xe0..0xf3 | (simple value) | | | 0xe0..0xf3 | (simple value) | | |||
| | | | | | | | |||
| 0xf4 | False | | | 0xf4 | False | | |||
| | | | | | | | |||
| 0xf5 | True | | | 0xf5 | True | | |||
| | | | | | | | |||
skipping to change at page 56, line 51 ¶ | skipping to change at page 56, line 51 ¶ | |||
return -1; // signal break out | return -1; // signal break out | |||
else fail(); // no enclosing indefinite | else fail(); // no enclosing indefinite | |||
default: fail(); // wrong mt | default: fail(); // wrong mt | |||
} | } | |||
return 0; // no break out | return 0; // no break out | |||
} | } | |||
Figure 1: Pseudocode for Well-Formedness Check | Figure 1: Pseudocode for Well-Formedness Check | |||
Note that the remaining complexity of a complete CBOR decoder is | Note that the remaining complexity of a complete CBOR decoder is | |||
about presenting data that has been parsed to the application in an | about presenting data that has been decoded to the application in an | |||
appropriate form. | appropriate form. | |||
Major types 0 and 1 are designed in such a way that they can be | Major types 0 and 1 are designed in such a way that they can be | |||
encoded in C from a signed integer without actually doing an if-then- | encoded in C from a signed integer without actually doing an if-then- | |||
else for positive/negative (Figure 2). This uses the fact that | else for positive/negative (Figure 2). This uses the fact that | |||
(-1-n), the transformation for major type 1, is the same as ~n | (-1-n), the transformation for major type 1, is the same as ~n | |||
(bitwise complement) in C unsigned arithmetic; ~n can then be | (bitwise complement) in C unsigned arithmetic; ~n can then be | |||
expressed as (-1)^n for the negative case, while 0^n leaves n | expressed as (-1)^n for the negative case, while 0^n leaves n | |||
unchanged for non-negative. The sign of a number can be converted to | unchanged for non-negative. The sign of a number can be converted to | |||
-1 for negative and 0 for non-negative (0 or positive) by arithmetic- | -1 for negative and 0 for non-negative (0 or positive) by arithmetic- | |||
skipping to change at page 61, line 32 ¶ | skipping to change at page 61, line 32 ¶ | |||
+-------------+--------------------------+--------------------------+ | +-------------+--------------------------+--------------------------+ | |||
Table 6: Examples for Different Levels of Conciseness | Table 6: Examples for Different Levels of Conciseness | |||
Appendix F. Changes from RFC 7049 | Appendix F. Changes from RFC 7049 | |||
The following is a list of known changes from RFC 7049. This list is | The following is a list of known changes from RFC 7049. This list is | |||
non-authoritative. It is meant to help reviewers see the significant | non-authoritative. It is meant to help reviewers see the significant | |||
differences. | differences. | |||
o Updated reference for [RFC4267] to [RFC8259] in many places | o Updated reference for [RFC4627] to [RFC8259] in many places | |||
o Updated reference for [CNN-TERMS] to [RFC7228] | o Updated reference for [CNN-TERMS] to [RFC7228] | |||
o Added a comment to the last example in Section 2.2.1 (added | o Added a comment to the last example in Section 2.2.1 (added | |||
"Second value") | "Second value") | |||
o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | |||
o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | |||
"0b000_11001") | "0b000_11001") | |||
End of changes. 55 change blocks. | ||||
145 lines changed or deleted | 153 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |