draft-ietf-cbor-7049bis-07.txt | draft-ietf-cbor-7049bis-08.txt | |||
---|---|---|---|---|
Network Working Group C. Bormann | Network Working Group C. Bormann | |||
Internet-Draft Universitaet Bremen TZI | Internet-Draft Universitaet Bremen TZI | |||
Intended status: Standards Track P. Hoffman | Obsoletes: 7049 (if approved) P. Hoffman | |||
Expires: February 26, 2020 ICANN | Intended status: Standards Track ICANN | |||
August 25, 2019 | Expires: May 8, 2020 November 05, 2019 | |||
Concise Binary Object Representation (CBOR) | Concise Binary Object Representation (CBOR) | |||
draft-ietf-cbor-7049bis-07 | draft-ietf-cbor-7049bis-08 | |||
Abstract | Abstract | |||
The Concise Binary Object Representation (CBOR) is a data format | The Concise Binary Object Representation (CBOR) is a data format | |||
whose design goals include the possibility of extremely small code | whose design goals include the possibility of extremely small code | |||
size, fairly small message size, and extensibility without the need | size, fairly small message size, and extensibility without the need | |||
for version negotiation. These design goals make it different from | for version negotiation. These design goals make it different from | |||
earlier binary serializations such as ASN.1 and MessagePack. | earlier binary serializations such as ASN.1 and MessagePack. | |||
This document is a revised edition of RFC 7049, with editorial | This document is a revised edition of RFC 7049, with editorial | |||
skipping to change at page 2, line 7 ¶ | skipping to change at page 2, line 7 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on February 26, 2020. | This Internet-Draft will expire on May 8, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 33 ¶ | skipping to change at page 2, line 33 ¶ | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | |||
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | |||
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | |||
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 | 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 | |||
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 | 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 | |||
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 13 | 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 15 | 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 | |||
3.3. Floating-Point Numbers and Values with No Content . . . . 16 | 3.3. Floating-Point Numbers and Values with No Content . . . . 16 | |||
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 17 | 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 18 | |||
3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 19 | 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 20 | |||
3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 19 | 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 20 | |||
3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 20 | 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 21 | |||
3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 20 | 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 21 | 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 22 | |||
3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 23 | 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 24 | |||
3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 23 | 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 24 | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON | |||
Converters . . . . . . . . . . . . . . . . . . . 23 | Converters . . . . . . . . . . . . . . . . . . . 24 | |||
3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 24 | 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 25 | |||
3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 25 | 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 26 | |||
4. Serialization Considerations . . . . . . . . . . . . . . . . 25 | 4. Serialization Considerations . . . . . . . . . . . . . . . . 26 | |||
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 25 | 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 26 | |||
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 26 | 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 27 | |||
4.2.1. Core Deterministic Encoding Requirements . . . . . . 26 | 4.2.1. Core Deterministic Encoding Requirements . . . . . . 27 | |||
4.2.2. Additional Deterministic Encoding Considerations . . 27 | 4.2.2. Additional Deterministic Encoding Considerations . . 28 | |||
4.2.3. Length-first map key ordering . . . . . . . . . . . . 28 | 4.2.3. Length-first map key ordering . . . . . . . . . . . . 30 | |||
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 29 | 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 31 | |||
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 30 | 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 31 | |||
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 31 | 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 32 | |||
5.3. Invalid Items . . . . . . . . . . . . . . . . . . . . . . 31 | 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 32 | |||
5.4. Handling Unknown Simple Values and Tags . . . . . . . . . 32 | 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 33 | |||
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 33 | 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 33 | |||
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 33 | 5.4. Handling Unknown Simple Values and Tag numbers . . . . . 33 | |||
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 34 | 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 35 | 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 35 | |||
5.8. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 35 | 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 36 | |||
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 37 | 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 37 | |||
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 37 | 5.8. Validity Checking and Robustness . . . . . . . . . . . . 37 | |||
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 38 | 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 38 | |||
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 39 | 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 38 | |||
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 40 | 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 40 | |||
7.2. Curating the Additional Information Space . . . . . . . . 40 | 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 41 | |||
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 41 | 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 41 | |||
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 42 | 7.2. Curating the Additional Information Space . . . . . . . . 42 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 | 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 42 | |||
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 43 | 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 43 | |||
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 43 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 | |||
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 43 | 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 44 | |||
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 44 | 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 44 | |||
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 45 | 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 45 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 46 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 | 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 46 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 47 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 47 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 48 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 49 | |||
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 50 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 49 | |||
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 54 | 11.2. Informative References . . . . . . . . . . . . . . . . . 50 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 57 | Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 53 | |||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 59 | Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 57 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 60 | ||||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 62 | ||||
Appendix E. Comparison of Other Binary Formats to CBOR's Design | Appendix E. Comparison of Other Binary Formats to CBOR's Design | |||
Objectives . . . . . . . . . . . . . . . . . . . . . 60 | Objectives . . . . . . . . . . . . . . . . . . . . . 63 | |||
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 61 | E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 64 | |||
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 61 | E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 64 | |||
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 62 | E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 65 | |||
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 62 | E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 65 | |||
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 62 | E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 65 | |||
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 63 | Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 66 | |||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63 | Appendix G. Well-formedness errors and examples . . . . . . . . 66 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 | G.1. Examples for CBOR data items that are not well-formed . . 67 | |||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 69 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 70 | ||||
1. Introduction | 1. Introduction | |||
There are hundreds of standardized formats for binary representation | There are hundreds of standardized formats for binary representation | |||
of structured data (also known as binary serialization formats). Of | of structured data (also known as binary serialization formats). Of | |||
those, some are for specific domains of information, while others are | those, some are for specific domains of information, while others are | |||
generalized for arbitrary data. In the IETF, probably the best-known | generalized for arbitrary data. In the IETF, probably the best-known | |||
formats in the latter category are ASN.1's BER and DER [ASN.1]. | formats in the latter category are ASN.1's BER and DER [ASN.1]. | |||
The format defined here follows some specific design goals that are | The format defined here follows some specific design goals that are | |||
skipping to change at page 6, line 29 ¶ | skipping to change at page 6, line 31 ¶ | |||
The term "byte" is used in its now-customary sense as a synonym for | The term "byte" is used in its now-customary sense as a synonym for | |||
"octet". All multi-byte values are encoded in network byte order | "octet". All multi-byte values are encoded in network byte order | |||
(that is, most significant byte first, also known as "big-endian"). | (that is, most significant byte first, also known as "big-endian"). | |||
This specification makes use of the following terminology: | This specification makes use of the following terminology: | |||
Data item: A single piece of CBOR data. The structure of a data | Data item: A single piece of CBOR data. The structure of a data | |||
item may contain zero, one, or more nested data items. The term | item may contain zero, one, or more nested data items. The term | |||
is used both for the data item in representation format and for | is used both for the data item in representation format and for | |||
the abstract idea that can be derived from that by a decoder. | the abstract idea that can be derived from that by a decoder; the | |||
former can be addressed specifically by using "encoded data item". | ||||
Decoder: A process that decodes a well-formed CBOR data item and | Decoder: A process that decodes a well-formed CBOR data item and | |||
makes it available to an application. Formally speaking, a | makes it available to an application. Formally speaking, a | |||
decoder contains a parser to break up the input using the syntax | decoder contains a parser to break up the input using the syntax | |||
rules of CBOR, as well as a semantic processor to prepare the data | rules of CBOR, as well as a semantic processor to prepare the data | |||
in a form suitable to the application. | in a form suitable to the application. | |||
Encoder: A process that generates the representation format of a | Encoder: A process that generates the representation format of a | |||
CBOR data item from application information. | CBOR data item from application information. | |||
skipping to change at page 6, line 49 ¶ | skipping to change at page 7, line 4 ¶ | |||
Data Stream: A sequence of zero or more data items, not further | Data Stream: A sequence of zero or more data items, not further | |||
assembled into a larger containing data item. The independent | assembled into a larger containing data item. The independent | |||
data items that make up a data stream are sometimes also referred | data items that make up a data stream are sometimes also referred | |||
to as "top-level data items". | to as "top-level data items". | |||
Well-formed: A data item that follows the syntactic structure of | Well-formed: A data item that follows the syntactic structure of | |||
CBOR. A well-formed data item uses the initial bytes and the byte | CBOR. A well-formed data item uses the initial bytes and the byte | |||
strings and/or data items that are implied by their values as | strings and/or data items that are implied by their values as | |||
defined in CBOR and does not include following extraneous data. | defined in CBOR and does not include following extraneous data. | |||
CBOR decoders by definition only return contents from well-formed | CBOR decoders by definition only return contents from well-formed | |||
data items. | data items. | |||
Valid: A data item that is well-formed and also follows the semantic | Valid: A data item that is well-formed and also follows the semantic | |||
restrictions that apply to CBOR data items. | restrictions that apply to CBOR data items. | |||
Expected: Besides its normal English meaning, the term "expected" is | ||||
used to describe requirements beyond CBOR validity that an | ||||
application has on its input data. Well-formed (processable at | ||||
all), valid (checked by a valdity-checking generic decoder), and | ||||
expected (checked by the application) form a hierarchy of layers | ||||
of acceptability. | ||||
Stream decoder: A process that decodes a data stream and makes each | Stream decoder: A process that decodes a data stream and makes each | |||
of the data items in the sequence available to an application as | of the data items in the sequence available to an application as | |||
they are received. | they are received. | |||
Where bit arithmetic or data types are explained, this document uses | Where bit arithmetic or data types are explained, this document uses | |||
the notation familiar from the programming language C, except that | the notation familiar from the programming language C, except that | |||
"**" denotes exponentiation. Similar to the "0x" notation for | "**" denotes exponentiation. Similar to the "0x" notation for | |||
hexadecimal numbers, numbers in binary notation are prefixed with | hexadecimal numbers, numbers in binary notation are prefixed with | |||
"0b". Underscores can be added to such a number solely for | "0b". Underscores can be added to such a number solely for | |||
readability, so 0b00100001 (0x21) might be written 0b001_00001 to | readability, so 0b00100001 (0x21) might be written 0b001_00001 to | |||
skipping to change at page 12, line 13 ¶ | skipping to change at page 12, line 22 ¶ | |||
pairs) followed by the 18 remaining items. The first item is the | pairs) followed by the 18 remaining items. The first item is the | |||
first key, the second item is the first value, the third item is | first key, the second item is the first value, the third item is | |||
the second key, and so on. Because items in a map come in pairs, | the second key, and so on. Because items in a map come in pairs, | |||
their total number is always even: A map that contains an odd | their total number is always even: A map that contains an odd | |||
number of items (no value data present after the last key data | number of items (no value data present after the last key data | |||
item) is not well-formed. A map that has duplicate keys may be | item) is not well-formed. A map that has duplicate keys may be | |||
well-formed, but it is not valid, and thus it causes indeterminate | well-formed, but it is not valid, and thus it causes indeterminate | |||
decoding; see also Section 5.6. | decoding; see also Section 5.6. | |||
Major type 6: a tagged data item ("tag") whose tag number is the | Major type 6: a tagged data item ("tag") whose tag number is the | |||
argument and whose enclosed data item is the single encoded data | argument and whose enclosed data item ("tag content") is the | |||
item that follows the head. See Section 3.4. | single encoded data item that follows the head. See Section 3.4. | |||
Major type 7: floating-point numbers and simple values, as well as | Major type 7: floating-point numbers and simple values, as well as | |||
the "break" stop code. See Section 3.3. | the "break" stop code. See Section 3.3. | |||
These eight major types lead to a simple table showing which of the | These eight major types lead to a simple table showing which of the | |||
256 possible values for the initial byte of a data item are used | 256 possible values for the initial byte of a data item are used | |||
(Table 6). | (Table 6). | |||
In major types 6 and 7, many of the possible values are reserved for | In major types 6 and 7, many of the possible values are reserved for | |||
future specification. See Section 9 for more information on these | future specification. See Section 9 for more information on these | |||
skipping to change at page 18, line 12 ¶ | skipping to change at page 19, line 12 ¶ | |||
This would be marked as 0b110_00010 (major type 6, additional | This would be marked as 0b110_00010 (major type 6, additional | |||
information 2 for the tag number) followed by 0b010_01100 (major type | information 2 for the tag number) followed by 0b010_01100 (major type | |||
2, additional information of 12 for the length) followed by the 12 | 2, additional information of 12 for the length) followed by the 12 | |||
bytes of the bignum. | bytes of the bignum. | |||
Decoders do not need to understand tags of every tag number, and tags | Decoders do not need to understand tags of every tag number, and tags | |||
may be of little value in applications where the implementation | may be of little value in applications where the implementation | |||
creating a particular CBOR data item and the implementation decoding | creating a particular CBOR data item and the implementation decoding | |||
that stream know the semantic meaning of each item in the data flow. | that stream know the semantic meaning of each item in the data flow. | |||
Their primary purpose in this specification is to define common data | Their primary purpose in this specification is to define common data | |||
types such as dates. A secondary purpose is to allow optional | types such as dates. A secondary purpose is to provide conversion | |||
tagging when the decoder is a generic CBOR decoder that might be able | hints when it is foreseen that the CBOR data item needs to be | |||
to benefit from hints about the content of items. Understanding the | translated into a different format, requiring hints about the content | |||
semantic tags is optional for a decoder; it can just jump over the | of items. Understanding the semantics of tags is optional for a | |||
initial bytes of the tag and interpret the tagged data item itself. | decoder; it can just jump over the initial bytes of the tag (that | |||
encode the tag number) and interpret the tag content itself, | ||||
presenting both tag number and tag content to the application. | ||||
A tag applies semantics to the data item it encloses. Thus, if tag A | A tag applies semantics to the data item it encloses. Thus, if tag A | |||
encloses tag B, which encloses data item C, tag A applies to the | encloses tag B, which encloses data item C, tag A applies to the | |||
result of applying tag B on data item C. That is, a tagged item is a | result of applying tag B on data item C. That is, a tag is a data | |||
data item consisting of a tag number and an enclosed value. The | item consisting of a tag number and an enclosed value. The content | |||
content of the tagged item (the enclosed data item) is the data item | of the tag (the enclosed data item) is the data item (the value) that | |||
(the value) that is being tagged. | is being tagged. | |||
IANA maintains a registry of tag numbers as described in Section 9.2. | IANA maintains a registry of tag numbers as described in Section 9.2. | |||
Table 4 provides a list of tag numbers that were defined in | Table 4 provides a list of tag numbers that were defined in | |||
[RFC7049], with definitions in the rest of this section. Note that | [RFC7049], with definitions in the rest of this section. Note that | |||
many other tag numbers have been defined since the publication of | many other tag numbers have been defined since the publication of | |||
[RFC7049]; see the registry described at Section 9.2 for the complete | [RFC7049]; see the registry described at Section 9.2 for the complete | |||
list. | list. | |||
+----------+----------+---------------------------------------------+ | +----------+----------+---------------------------------------------+ | |||
| Tag | Data | Semantics | | | Tag | Data | Semantics | | |||
skipping to change at page 20, line 33 ¶ | skipping to change at page 21, line 33 ¶ | |||
64-bit integers for the enclosed value. | 64-bit integers for the enclosed value. | |||
Negative values (major type 1 and negative floating-point numbers) | Negative values (major type 1 and negative floating-point numbers) | |||
are interpreted as determined by the application requirements as | are interpreted as determined by the application requirements as | |||
there is no universal standard for UTC count-of-seconds time before | there is no universal standard for UTC count-of-seconds time before | |||
1970-01-01T00:00Z (this is particularly true for points in time that | 1970-01-01T00:00Z (this is particularly true for points in time that | |||
precede discontinuities in national calendars). The same applies to | precede discontinuities in national calendars). The same applies to | |||
non-finite values. | non-finite values. | |||
To indicate fractional seconds, floating-point values can be used | To indicate fractional seconds, floating-point values can be used | |||
within Tag number 1 instead of integer values. Note that this | within tag number 1 instead of integer values. Note that this | |||
generally requires binary64 support, as binary16 and binary32 provide | generally requires binary64 support, as binary16 and binary32 provide | |||
non-zero fractions of seconds only for a short period of time around | non-zero fractions of seconds only for a short period of time around | |||
early 1970. An application that requires Tag number 1 support may | early 1970. An application that requires tag number 1 support may | |||
restrict the enclosed value to be an integer (or a floating-point | restrict the enclosed value to be an integer (or a floating-point | |||
value) only. | value) only. | |||
3.4.4. Bignums | 3.4.4. Bignums | |||
Protocols using tag numbers 2 and 3 extend the generic data model | Protocols using tag numbers 2 and 3 extend the generic data model | |||
(Section 2) with "bignums" representing arbitrarily sized integers. | (Section 2) with "bignums" representing arbitrarily sized integers. | |||
In the generic data model, bignum values are not equal to integers | In the generic data model, bignum values are not equal to integers | |||
from the basic data model, but specific data models can define that | from the basic data model, but specific data models can define that | |||
equivalence, and preferred encoding never makes use of bignums that | equivalence, and preferred encoding never makes use of bignums that | |||
skipping to change at page 21, line 29 ¶ | skipping to change at page 22, line 29 ¶ | |||
(major type 2, length 9), followed by 0x010000000000000000 (one byte | (major type 2, length 9), followed by 0x010000000000000000 (one byte | |||
0x01 and eight bytes 0x00). In hexadecimal: | 0x01 and eight bytes 0x00). In hexadecimal: | |||
C2 -- Tag 2 | C2 -- Tag 2 | |||
49 -- Byte string of length 9 | 49 -- Byte string of length 9 | |||
010000000000000000 -- Bytes content | 010000000000000000 -- Bytes content | |||
3.4.5. Decimal Fractions and Bigfloats | 3.4.5. Decimal Fractions and Bigfloats | |||
Protocols using tag number 4 extend the generic data model with data | Protocols using tag number 4 extend the generic data model with data | |||
items representing arbitrary-length decimal fractions m*(10*e). | items representing arbitrary-length decimal fractions of the form | |||
Protocols using tag number 5 extend the generic data model with data | m*(10**e). Protocols using tag number 5 extend the generic data | |||
items representing arbitrary-length binary fractions m*(2*e). As | model with data items representing arbitrary-length binary fractions | |||
with bignums, values of different types are not equal in the generic | of the form m*(2**e). As with bignums, values of different types are | |||
data model. | not equal in the generic data model. | |||
Decimal fractions combine an integer mantissa with a base-10 scaling | Decimal fractions combine an integer mantissa with a base-10 scaling | |||
factor. They are most useful if an application needs the exact | factor. They are most useful if an application needs the exact | |||
representation of a decimal fraction such as 1.1 because there is no | representation of a decimal fraction such as 1.1 because there is no | |||
exact representation for many decimal fractions in binary floating | exact representation for many decimal fractions in binary floating | |||
point. | point. | |||
Bigfloats combine an integer mantissa with a base-2 scaling factor. | Bigfloats combine an integer mantissa with a base-2 scaling factor. | |||
They are binary floating-point values that can exceed the range or | They are binary floating-point values that can exceed the range or | |||
the precision of the three IEEE 754 formats supported by CBOR | the precision of the three IEEE 754 formats supported by CBOR | |||
skipping to change at page 23, line 17 ¶ | skipping to change at page 24, line 17 ¶ | |||
The tags in this section are for content hints that might be used by | The tags in this section are for content hints that might be used by | |||
generic CBOR processors. These content hints do not extend the | generic CBOR processors. These content hints do not extend the | |||
generic data model. | generic data model. | |||
3.4.6.1. Encoded CBOR Data Item | 3.4.6.1. Encoded CBOR Data Item | |||
Sometimes it is beneficial to carry an embedded CBOR data item that | Sometimes it is beneficial to carry an embedded CBOR data item that | |||
is not meant to be decoded immediately at the time the enclosing data | is not meant to be decoded immediately at the time the enclosing data | |||
item is being decoded. Tag number 24 (CBOR data item) can be used to | item is being decoded. Tag number 24 (CBOR data item) can be used to | |||
tag the embedded byte string as a data item encoded in CBOR format. | tag the embedded byte string as a data item encoded in CBOR format. | |||
Contained items that aren't byte strings are invalid. Any contained | Contained items that aren't byte strings are invalid. A contained | |||
byte string is valid, even if it encodes an invalid or ill-formed | byte string is valid if it encodes a well-formed CBOR item; validity | |||
CBOR item. | checking of the decoded CBOR item is not required for tag validity | |||
(but could be offered by a generic decoder as a special option). | ||||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | |||
Tags number 21 to 23 indicate that a byte string might require a | Tags number 21 to 23 indicate that a byte string might require a | |||
specific encoding when interoperating with a text-based | specific encoding when interoperating with a text-based | |||
representation. These tags are useful when an encoder knows that the | representation. These tags are useful when an encoder knows that the | |||
byte string data it is writing is likely to be later converted to a | byte string data it is writing is likely to be later converted to a | |||
particular JSON-based usage. That usage specifies that some strings | particular JSON-based usage. That usage specifies that some strings | |||
are encoded as base64, base64url, and so on. The encoder uses byte | are encoded as base64, base64url, and so on. The encoder uses byte | |||
strings instead of doing the encoding itself to reduce the message | strings instead of doing the encoding itself to reduce the message | |||
skipping to change at page 24, line 44 ¶ | skipping to change at page 25, line 44 ¶ | |||
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | |||
version of the JavaScript regular expression syntax [ECMA262]. | version of the JavaScript regular expression syntax [ECMA262]. | |||
(Note that more specific identification may be necessary if the | (Note that more specific identification may be necessary if the | |||
actual version of the specification underlying the regular | actual version of the specification underlying the regular | |||
expression, or more than just the text of the regular expression | expression, or more than just the text of the regular expression | |||
itself, need to be conveyed.) Any contained string value is | itself, need to be conveyed.) Any contained string value is | |||
valid. | valid. | |||
o Tag number 36 is for MIME messages (including all headers), as | o Tag number 36 is for MIME messages (including all headers), as | |||
defined in [RFC2045]. A text string that isn't a valid MIME | defined in [RFC2045]. A text string that isn't a valid MIME | |||
message is invalid. | message is invalid. (For this tag, validity checking may be | |||
particularly onerous for a generic decoder and might therefore not | ||||
be offered.) | ||||
Note that tag numbers 33 and 34 differ from 21 and 22 in that the | Note that tag numbers 33 and 34 differ from 21 and 22 in that the | |||
data is transported in base-encoded form for the former and in raw | data is transported in base-encoded form for the former and in raw | |||
byte string form for the latter. | byte string form for the latter. | |||
3.4.7. Self-Described CBOR | 3.4.7. Self-Described CBOR | |||
In many applications, it will be clear from the context that CBOR is | In many applications, it will be clear from the context that CBOR is | |||
being employed for encoding a data item. For instance, a specific | being employed for encoding a data item. For instance, a specific | |||
protocol might specify the use of CBOR, or a media type is indicated | protocol might specify the use of CBOR, or a media type is indicated | |||
skipping to change at page 25, line 39 ¶ | skipping to change at page 26, line 39 ¶ | |||
formats. An easy way for an encoder to help the decoder would be to | formats. An easy way for an encoder to help the decoder would be to | |||
tag the entire CBOR item with tag number 55799, the serialization of | tag the entire CBOR item with tag number 55799, the serialization of | |||
which will never be found at the beginning of a JSON text. | which will never be found at the beginning of a JSON text. | |||
4. Serialization Considerations | 4. Serialization Considerations | |||
4.1. Preferred Serialization | 4.1. Preferred Serialization | |||
For some values at the data model level, CBOR provides multiple | For some values at the data model level, CBOR provides multiple | |||
serializations. For many applications, it is desirable that an | serializations. For many applications, it is desirable that an | |||
encoder always chooses a preferred serialization; however, the | encoder always chooses a preferred serialization (preferred | |||
present specification does not put the burden of enforcing this | encoding); however, the present specification does not put the burden | |||
preference on either encoder or decoder. | of enforcing this preference on either encoder or decoder. | |||
Some constrained decoders may be limited in their ability to decode | Some constrained decoders may be limited in their ability to decode | |||
non-preferred serializations: For example, if only integers below | non-preferred serializations: For example, if only integers below | |||
1_000_000_000 are expected in an application, the decoder may leave | 1_000_000_000 are expected in an application, the decoder may leave | |||
out the code that would be needed to decode 64-bit arguments in | out the code that would be needed to decode 64-bit arguments in | |||
integers. An encoder that always uses preferred serialization | integers. An encoder that always uses preferred serialization | |||
("preferred encoder") interoperates with this decoder for the numbers | ("preferred encoder") interoperates with this decoder for the numbers | |||
that can occur in this application. More generally speaking, it | that can occur in this application. More generally speaking, it | |||
therefore can be said that a preferred encoder is more universally | therefore can be said that a preferred encoder is more universally | |||
interoperable (and also less wasteful) than one that, say, always | interoperable (and also less wasteful) than one that, say, always | |||
skipping to change at page 26, line 41 ¶ | skipping to change at page 27, line 41 ¶ | |||
protocols are free to define what they mean by a "deterministic | protocols are free to define what they mean by a "deterministic | |||
format" and what encoders and decoders are expected to do. This | format" and what encoders and decoders are expected to do. This | |||
section defines a set of restrictions that can serve as the base of | section defines a set of restrictions that can serve as the base of | |||
such a deterministic format. | such a deterministic format. | |||
4.2.1. Core Deterministic Encoding Requirements | 4.2.1. Core Deterministic Encoding Requirements | |||
A CBOR encoding satisfies the "core deterministic encoding | A CBOR encoding satisfies the "core deterministic encoding | |||
requirements" if it satisfies the following restrictions: | requirements" if it satisfies the following restrictions: | |||
o Arguments (see Section 3) for integers, lengths in major types 2 | o Preferred serialization MUST be used. In particular, this means | |||
through 5, and tags MUST be as short as possible. In particular: | that arguments (see Section 3) for integers, lengths in major | |||
types 2 through 5, and tags MUST be as short as possible, for | ||||
instance: | ||||
* 0 to 23 and -1 to -24 MUST be expressed in the same byte as the | * 0 to 23 and -1 to -24 MUST be expressed in the same byte as the | |||
major type; | major type; | |||
* 24 to 255 and -25 to -256 MUST be expressed only with an | * 24 to 255 and -25 to -256 MUST be expressed only with an | |||
additional uint8_t; | additional uint8_t; | |||
* 256 to 65535 and -257 to -65536 MUST be expressed only with an | * 256 to 65535 and -257 to -65536 MUST be expressed only with an | |||
additional uint16_t; | additional uint16_t; | |||
* 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed | * 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed | |||
only with an additional uint32_t. | only with an additional uint32_t. | |||
Floating point values also MUST use the shortest form that | ||||
preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 | ||||
as 0xfa49742408. | ||||
o Indefinite-length items MUST NOT appear. They can be encoded as | ||||
definite-length items instead. | ||||
o The keys in every map MUST be sorted in the bytewise lexicographic | o The keys in every map MUST be sorted in the bytewise lexicographic | |||
order of their deterministic encodings. For example, the | order of their deterministic encodings. For example, the | |||
following keys are sorted correctly: | following keys are sorted correctly: | |||
1. 10, encoded as 0x0a. | 1. 10, encoded as 0x0a. | |||
2. 100, encoded as 0x1864. | 2. 100, encoded as 0x1864. | |||
3. -1, encoded as 0x20. | 3. -1, encoded as 0x20. | |||
4. "z", encoded as 0x617a. | 4. "z", encoded as 0x617a. | |||
5. "aa", encoded as 0x626161. | 5. "aa", encoded as 0x626161. | |||
6. [100], encoded as 0x811864. | 6. [100], encoded as 0x811864. | |||
7. [-1], encoded as 0x8120. | 7. [-1], encoded as 0x8120. | |||
8. false, encoded as 0xf4. | 8. false, encoded as 0xf4. | |||
o Indefinite-length items MUST NOT appear. They can be encoded as | ||||
definite-length items instead. | ||||
4.2.2. Additional Deterministic Encoding Considerations | 4.2.2. Additional Deterministic Encoding Considerations | |||
If a protocol allows for IEEE floats, then additional deterministic | If a protocol allows for IEEE floats, then additional deterministic | |||
encoding rules might need to be added. One example rule might be to | encoding rules might need to be added. One example rule might be to | |||
have all floats start as a 64-bit float, then do a test conversion to | have all floats start as a 64-bit float, then do a test conversion to | |||
a 32-bit float; if the result is the same numeric value, use the | a 32-bit float; if the result is the same numeric value, use the | |||
shorter value and repeat the process with a test conversion to a | shorter value and repeat the process with a test conversion to a | |||
16-bit float. (This rule selects 16-bit float for positive and | 16-bit float. (This rule selects 16-bit float for positive and | |||
negative Infinity as well.) Although IEEE floats can represent both | negative Infinity as well.) Although IEEE floats can represent both | |||
positive and negative zero as distinct values, the application might | positive and negative zero as distinct values, the application might | |||
not distinguish these and might decide to represent all zero values | not distinguish these and might decide to represent all zero values | |||
with a positive sign, disallowing negative zero. Also, there are | with a positive sign, disallowing negative zero. Also, there are | |||
many representations for NaN. If NaN is an allowed value, it must | many representations for NaN. If NaN is an allowed value, it must | |||
always be represented as 0xf97e00. | always be represented as 0xf97e00. | |||
CBOR tags present additional considerations for deterministic | CBOR tags present additional considerations for deterministic | |||
encoding. The absence or presence of tags in a deterministic format | encoding. If a CBOR-based protocol were to provide the same | |||
is determined by the optionality of the tags in the protocol. In a | semantics for the presence and absence of a specific tag (e.g., by | |||
CBOR-based protocol that allows optional tagging anywhere, the | allowing both tag 1 data items and raw numbers in a date/time | |||
deterministic format must not allow them. In a protocol that | position, treating the latter as if they were tagged), the | |||
requires tags in certain places, the tag needs to appear in the | deterministic format would not allow them. In a protocol that | |||
deterministic format. A CBOR-based protocol that uses deterministic | requires tags in certain places to obtain specific semantics, the tag | |||
encoding might instead say that all tags that appear in a message | needs to appear in the deterministic format as well. | |||
must be retained regardless of whether they are optional. | ||||
Protocols that include floating, big integer, or other complex values | Protocols that include floating, big integer, or other complex values | |||
need to define extra requirements on their deterministic encodings. | need to define extra requirements on their deterministic encodings. | |||
For example: | For example: | |||
o If a protocol includes a field that can express floating-point | o If a protocol includes a field that can express floating-point | |||
values (Section 3.3), the protocol's deterministic encoding needs | values (Section 3.3), the protocol's deterministic encoding needs | |||
to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, | to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, | |||
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | 0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | |||
this are: | this are: | |||
skipping to change at page 29, line 39 ¶ | skipping to change at page 30, line 48 ¶ | |||
6. [-1], encoded as 0x8120. | 6. [-1], encoded as 0x8120. | |||
7. "aa", encoded as 0x626161. | 7. "aa", encoded as 0x626161. | |||
8. [100], encoded as 0x811864. | 8. [100], encoded as 0x811864. | |||
(Although [RFC7049] used the term "Canonical CBOR" for its form of | (Although [RFC7049] used the term "Canonical CBOR" for its form of | |||
requirements on deterministic encoding, this document avoids this | requirements on deterministic encoding, this document avoids this | |||
term because "canonicalization" is often associated with specific | term because "canonicalization" is often associated with specific | |||
uses of deterministic encoding only. The terms are essentially | uses of deterministic encoding only. The terms are essentially | |||
exchangeable, however, and the set of core requirements in this | interchangeable, however, and the set of core requirements in this | |||
document could also be called "Canonical CBOR", while the length- | document could also be called "Canonical CBOR", while the length- | |||
first-ordered version of that could be called "Old Canonical CBOR".) | first-ordered version of that could be called "Old Canonical CBOR".) | |||
5. Creating CBOR-Based Protocols | 5. Creating CBOR-Based Protocols | |||
Data formats such as CBOR are often used in environments where there | Data formats such as CBOR are often used in environments where there | |||
is no format negotiation. A specific design goal of CBOR is to not | is no format negotiation. A specific design goal of CBOR is to not | |||
need any included or assumed schema: a decoder can take a CBOR item | need any included or assumed schema: a decoder can take a CBOR item | |||
and decode it with no other knowledge. | and decode it with no other knowledge. | |||
skipping to change at page 31, line 31 ¶ | skipping to change at page 32, line 38 ¶ | |||
registered at the time the encoder/decoder is written (Section 5.4). | registered at the time the encoder/decoder is written (Section 5.4). | |||
Generic decoders provide ways to present well-formed CBOR values, | Generic decoders provide ways to present well-formed CBOR values, | |||
both valid and invalid, to an application. The diagnostic notation | both valid and invalid, to an application. The diagnostic notation | |||
(Section 8) may be used to present well-formed CBOR values to humans. | (Section 8) may be used to present well-formed CBOR values to humans. | |||
Generic encoders provide an application interface that allows the | Generic encoders provide an application interface that allows the | |||
application to specify any well-formed value, including simple values | application to specify any well-formed value, including simple values | |||
and tags unknown to the encoder. | and tags unknown to the encoder. | |||
5.3. Invalid Items | 5.3. Validity of Items | |||
A well-formed but invalid CBOR data item presents a problem with | A well-formed but invalid CBOR data item presents a problem with | |||
interpreting the data encoded in it in the CBOR data model. A CBOR- | interpreting the data encoded in it in the CBOR data model. A CBOR- | |||
based protocol could be specified in several layers, in which the | based protocol could be specified in several layers, in which the | |||
lower layers don't process the semantics of some of the CBOR data | lower layers don't process the semantics of some of the CBOR data | |||
they forward. These layers can't notice the invalidity in data they | they forward. These layers can't notice any validity errors in data | |||
don't process and MUST forward that data as-is. The first layer that | they don't process and MUST forward that data as-is. The first layer | |||
does process the semantics of an invalid CBOR item MUST take one of | that does process the semantics of an invalid CBOR item MUST take one | |||
two choices: | of two choices: | |||
1. Replace the problematic item with an error marker and continue | 1. Replace the problematic item with an error marker and continue | |||
with the next item, or | with the next item, or | |||
2. Issue an error and stop processing altogether. | 2. Issue an error and stop processing altogether. | |||
A CBOR-based protocol MUST specify which of these options its | A CBOR-based protocol MUST specify which of these options its | |||
decoders take, for each kind of invalid item they might encounter. | decoders take, for each kind of invalid item they might encounter. | |||
Such problems might include: | Such problems might occur at the basic validity level of CBOR or in | |||
the context of tags (tag validity). | ||||
5.3.1. Basic validity | ||||
Duplicate keys in a map: Generic decoders (Section 5.2) make data | Duplicate keys in a map: Generic decoders (Section 5.2) make data | |||
available to applications using the native CBOR data model. That | available to applications using the native CBOR data model. That | |||
data model includes maps (key-value mappings with unique keys), | data model includes maps (key-value mappings with unique keys), | |||
not multimaps (key-value mappings where multiple entries can have | not multimaps (key-value mappings where multiple entries can have | |||
the same key). Thus, a generic decoder that gets a CBOR map item | the same key). Thus, a generic decoder that gets a CBOR map item | |||
that has duplicate keys will decode to a map with only one | that has duplicate keys will decode to a map with only one | |||
instance of that key, or it might stop processing altogether. On | instance of that key, or it might stop processing altogether. On | |||
the other hand, a "streaming decoder" may not even be able to | the other hand, a "streaming decoder" may not even be able to | |||
notice (Section 5.6). | notice (Section 5.6). | |||
Inadmissible type on the value enclosed by a tag: Tags (Section 3.4) | ||||
specify what type of data item is supposed to be enclosed by the | ||||
tag; for example, the tags for positive or negative bignums are | ||||
supposed to be put on byte strings. A decoder that decodes the | ||||
tagged data item into a native representation (a native big | ||||
integer in this example) is expected to check the type of the data | ||||
item being tagged. Even decoders that don't have such native | ||||
representations available in their environment may perform the | ||||
check on those tags known to them and react appropriately. | ||||
Invalid UTF-8 string: A decoder might or might not want to verify | Invalid UTF-8 string: A decoder might or might not want to verify | |||
that the sequence of bytes in a UTF-8 string (major type 3) is | that the sequence of bytes in a UTF-8 string (major type 3) is | |||
actually valid UTF-8 and react appropriately. | actually valid UTF-8 and react appropriately. | |||
5.4. Handling Unknown Simple Values and Tags | 5.3.2. Tag validity | |||
Inadmissible type for tag content: Tags (Section 3.4) specify what | ||||
type of data item is supposed to be enclosed by the tag; for | ||||
example, the tags for positive or negative bignums are supposed to | ||||
be put on byte strings. A decoder that decodes the tagged data | ||||
item into a native representation (a native big integer in this | ||||
example) is expected to check the type of the data item being | ||||
tagged. Even decoders that don't have such native representations | ||||
available in their environment may perform the check on those tags | ||||
known to them and react appropriately. | ||||
Inadmissible value for tag content: The type of data item may be | ||||
admissible for a tag's content, but the specific value may not be; | ||||
e.g., a value of "yesterday" is not acceptable for the content of | ||||
tag 0, even though it properly is a text string. A decoder that | ||||
normally ingests such tags into equivalent platform types might | ||||
present this tag to the application in a similar way to how it | ||||
would present a tag with an unknown tag number (Section 5.4). | ||||
5.4. Handling Unknown Simple Values and Tag numbers | ||||
A decoder that comes across a simple value (Section 3.3) that it does | A decoder that comes across a simple value (Section 3.3) that it does | |||
not recognize, such as a value that was added to the IANA registry | not recognize, such as a value that was added to the IANA registry | |||
after the decoder was deployed or a value that the decoder chose not | after the decoder was deployed or a value that the decoder chose not | |||
to implement, might issue a warning, might stop processing | to implement, might issue a warning, might stop processing | |||
altogether, might handle the error by making the unknown value | altogether, might handle the error by making the unknown value | |||
available to the application as such (as is expected of generic | available to the application as such (as is expected of generic | |||
decoders), or take some other type of action. | decoders), or take some other type of action. | |||
A decoder that comes across a tag number (Section 3.4) that it does | A decoder that comes across a tag number (Section 3.4) that it does | |||
not recognize, such as a tag number that was added to the IANA | not recognize, such as a tag number that was added to the IANA | |||
registry after the decoder was deployed or a tag number that the | registry after the decoder was deployed or a tag number that the | |||
decoder chose not to implement, might issue a warning, might stop | decoder chose not to implement, might issue a warning, might stop | |||
processing altogether, might handle the error and present the unknown | processing altogether, might handle the error and present the unknown | |||
tag number together with the enclosed data item to the application | tag number together with the enclosed data item to the application | |||
(as is expected of generic decoders), might ignore the tag and simply | (as is expected of generic decoders), or take some other type of | |||
present the contained data item only to the application, or take some | action. | |||
other type of action. | ||||
5.5. Numbers | 5.5. Numbers | |||
CBOR-based protocols should take into account that different language | CBOR-based protocols should take into account that different language | |||
environments pose different restrictions on the range and precision | environments pose different restrictions on the range and precision | |||
of numbers that are representable. For example, the JavaScript | of numbers that are representable. For example, the JavaScript | |||
number system treats all numbers as floating point, which may result | number system treats all numbers as floating point, which may result | |||
in silent loss of precision in decoding integers with more than 53 | in silent loss of precision in decoding integers with more than 53 | |||
significant bits. A protocol that uses numbers should define its | significant bits. A protocol that uses numbers should define its | |||
expectations on the handling of non-trivial numbers in decoders and | expectations on the handling of non-trivial numbers in decoders and | |||
skipping to change at page 34, line 27 ¶ | skipping to change at page 35, line 38 ¶ | |||
the enclosing data item is completely available ("streaming encoder") | the enclosing data item is completely available ("streaming encoder") | |||
may want to reduce its overhead significantly by relying on its data | may want to reduce its overhead significantly by relying on its data | |||
source to maintain uniqueness. | source to maintain uniqueness. | |||
A CBOR-based protocol MUST define what to do when a receiving | A CBOR-based protocol MUST define what to do when a receiving | |||
application does see multiple identical keys in a map. The resulting | application does see multiple identical keys in a map. The resulting | |||
rule in the protocol MUST respect the CBOR data model: it cannot | rule in the protocol MUST respect the CBOR data model: it cannot | |||
prescribe a specific handling of the entries with the identical keys, | prescribe a specific handling of the entries with the identical keys, | |||
except that it might have a rule that having identical keys in a map | except that it might have a rule that having identical keys in a map | |||
indicates a malformed map and that the decoder has to stop with an | indicates a malformed map and that the decoder has to stop with an | |||
error. Duplicate keys are also prohibited by CBOR decoders that are | error. Duplicate keys are also prohibited by CBOR decoders that | |||
using strict mode (Section 5.8). | enforce validity (Section 5.8). | |||
The CBOR data model for maps does not allow ascribing semantics to | The CBOR data model for maps does not allow ascribing semantics to | |||
the order of the key/value pairs in the map representation. Thus, a | the order of the key/value pairs in the map representation. Thus, a | |||
CBOR-based protocol MUST NOT specify that changing the key/value pair | CBOR-based protocol MUST NOT specify that changing the key/value pair | |||
order in a map would change the semantics, except to specify that | order in a map would change the semantics, except to specify that | |||
some, orders are disallowed, for example where they would not meet | some, orders are disallowed, for example where they would not meet | |||
the requirements of a deterministic encoding (Section 4.2). (Any | the requirements of a deterministic encoding (Section 4.2). (Any | |||
secondary effects of map ordering such as on timing, cache usage, and | secondary effects of map ordering such as on timing, cache usage, and | |||
other potential side channels are not considered part of the | other potential side channels are not considered part of the | |||
semantics but may be enough reason on its own for a protocol to | semantics but may be enough reason on its own for a protocol to | |||
require a deterministic encoding format.) | require a deterministic encoding format.) | |||
Applications for constrained devices that have maps where a small | ||||
Applications for constrained devices that have maps with 24 or fewer | number of frequently used keys can be identified should consider | |||
frequently used keys should consider using small integers (and those | using small integers as keys; for instance, a set of 24 or fewer | |||
with up to 48 frequently used keys should consider also using small | frequent keys can be encoded in a single byte as unsigned integers, | |||
negative integers) because the keys can then be encoded in a single | up to 48 if negative integers are also used. Less frequently | |||
byte. | occurring keys can then use integers with longer encodings. | |||
5.6.1. Equivalence of Keys | 5.6.1. Equivalence of Keys | |||
The specific data model applying to a CBOR data item is used to | The specific data model applying to a CBOR data item is used to | |||
determine whether keys occurring in maps are duplicates or distinct. | determine whether keys occurring in maps are duplicates or distinct. | |||
At the generic data model level, numerically equivalent integer and | At the generic data model level, numerically equivalent integer and | |||
floating-point values are distinct from each other, as they are from | floating-point values are distinct from each other, as they are from | |||
the various big numbers (Tags 2 to 5). Similarly, text strings are | the various big numbers (Tags 2 to 5). Similarly, text strings are | |||
distinct from byte strings, even if composed of the same bytes. A | distinct from byte strings, even if composed of the same bytes. A | |||
tagged value is distinct from an untagged value or from a value | tagged value is distinct from an untagged value or from a value | |||
tagged with a different tag. | tagged with a different tag number. | |||
Within each of these groups, numeric values are distinct unless they | Within each of these groups, numeric values are distinct unless they | |||
are numerically equal (specifically, -0.0 is equal to 0.0); for the | are numerically equal (specifically, -0.0 is equal to 0.0); for the | |||
purpose of map key equivalence, NaN (not a number) values are | purpose of map key equivalence, NaN (not a number) values are | |||
equivalent if they have the same significand after zero-extending | equivalent if they have the same significand after zero-extending | |||
both significands at the right to 64 bits. | both significands at the right to 64 bits. | |||
(Byte and text) strings are compared byte by byte, arrays element by | (Byte and text) strings are compared byte by byte, arrays element by | |||
element, and are equal if they have the same number of bytes/elements | element, and are equal if they have the same number of bytes/elements | |||
and the same values at the same positions. Two maps are equal if | and the same values at the same positions. Two maps are equal if | |||
they have the same set of pairs regardless of their order; pairs are | they have the same set of pairs regardless of their order; pairs are | |||
equal if both the key and value are equal. | equal if both the key and value are equal. | |||
Tagged values are equal if both the tag number and the enclosed item | Tagged values are equal if both the tag number and the enclosed item | |||
are equal. Simple values are equal if they simply have the same | are equal. (Note that a generic decoder that provides processing for | |||
value. Nothing else is equal in the generic data model, a simple | a specific tag may not be able to distinguish some semantically | |||
value 2 is not equivalent to an integer 2 and an array is never | equivalent values, e.g. if leading zeroes occur in the content of tag | |||
equivalent to a map. | 2/3 (Section 3.4.4).) Simple values are equal if they simply have | |||
the same value. Nothing else is equal in the generic data model, a | ||||
simple value 2 is not equivalent to an integer 2 and an array is | ||||
never equivalent to a map. | ||||
As discussed in Section 2.2, specific data models can make values | As discussed in Section 2.2, specific data models can make values | |||
equivalent for the purpose of comparing map keys that are distinct in | equivalent for the purpose of comparing map keys that are distinct in | |||
the generic data model. Note that this implies that a generic | the generic data model. Note that this implies that a generic | |||
decoder may deliver a decoded map to an application that needs to be | decoder may deliver a decoded map to an application that needs to be | |||
checked for duplicate map keys by that application (alternatively, | checked for duplicate map keys by that application (alternatively, | |||
the decoder may provide a programming interface to perform this | the decoder may provide a programming interface to perform this | |||
service for the application). Specific data models cannot | service for the application). Specific data models cannot | |||
distinguish values for map keys that are equal for this purpose at | distinguish values for map keys that are equal for this purpose at | |||
the generic data model level. | the generic data model level. | |||
5.7. Undefined Values | 5.7. Undefined Values | |||
In some CBOR-based protocols, the simple value (Section 3.3) of | In some CBOR-based protocols, the simple value (Section 3.3) of | |||
Undefined might be used by an encoder as a substitute for a data item | Undefined might be used by an encoder as a substitute for a data item | |||
with an encoding problem, in order to allow the rest of the enclosing | with an encoding problem, in order to allow the rest of the enclosing | |||
data items to be encoded without harm. | data items to be encoded without harm. | |||
5.8. Strict Decoding Mode | 5.8. Validity Checking and Robustness | |||
Some areas of application of CBOR do not require deterministic | Some areas of application of CBOR do not require deterministic | |||
encoding (Section 4.2) but may require that different decoders reach | encoding (Section 4.2) but may require that different decoders reach | |||
the same (semantically equivalent) results, even in the presence of | the same (semantically equivalent) results, even in the presence of | |||
potentially malicious data. This can be required if one application | potentially malicious data. This can be required if one application | |||
(such as a firewall or other protecting entity) makes a decision | (such as a firewall or other protecting entity) makes a decision | |||
based on the data that another application, which independently | based on the data that another application, which independently | |||
decodes the data, relies on. | decodes the data, relies on. | |||
Normally, it is the responsibility of the sender to avoid ambiguously | Normally, it is the responsibility of the sender to avoid ambiguously | |||
decodable data. However, the sender might be an attacker specially | decodable data. However, the sender might be an attacker specially | |||
making up CBOR data such that it will be interpreted differently by | making up CBOR data such that it will be interpreted differently by | |||
different decoders in an attempt to exploit that as a vulnerability. | different decoders in an attempt to exploit that as a vulnerability. | |||
Generic decoders used in applications where this might be a problem | Generic decoders used in applications where this might be a problem | |||
need to support a strict mode in which it is also the responsibility | can help by providing a validity-checking mode in which it is also | |||
of the receiver to reject ambiguously decodable data. It is expected | the responsibility of the generic decoder to reject invalid data. It | |||
that firewalls and other security systems that decode CBOR will only | is expected that firewalls and other security systems that decode | |||
decode in strict mode. | CBOR will employ their decoders with validity checking applied. | |||
A decoder in strict mode will reliably reject any data that could be | A decoder with validity checking will expend the effort to reliably | |||
interpreted by other decoders in different ways. It will expend the | detect invalid data items (Section 5.3). For example, such a decoder | |||
effort to reliably detect invalid data items (Section 5.3). For | needs to have an API that reports an error (and does not return data) | |||
example, a strict decoder needs to have an API that reports an error | for a CBOR data item that contains any of the following: | |||
(and does not return data) for a CBOR data item that contains any of | ||||
the following: | ||||
o a map (major type 5) that has more than one entry with the same | o a map (major type 5) that has more than one entry with the same | |||
key | key | |||
o a tag that is used on a data item of the incorrect type | o a tag that is used on a data item of the incorrect type | |||
o a data item that is incorrectly formatted for the type given to | o a data item that is incorrectly formatted for the type given to | |||
it, such as invalid UTF-8 or data that cannot be interpreted with | it, such as invalid UTF-8 in a text string or data that (even if | |||
the specific tag number that it has been tagged with | of the correct type) cannot be interpreted with the specific tag | |||
number that it has been tagged with | ||||
A decoder in strict mode can do one of two things when it encounters | A validity-checking decoder can do one of two things when it | |||
a tag number or simple value that it does not recognize: | encounters a tag number or simple value that it does not recognize: | |||
o It can report an error (and not return data). | o It can report an error (and not return data). | |||
o It can emit the unknown item (type, value, and, for tags, the | o It can emit the unknown item (type, value, and, for tags, the | |||
decoded tagged data item) to the application calling the decoder | decoded tagged data item) to the application calling the decoder, | |||
with an indication that the decoder did not recognize that tag | with an indication that the decoder did not recognize that tag | |||
number or simple value. | number or simple value. | |||
The latter approach, which is also appropriate for non-strict | The latter approach, which is also appropriate for decoders that do | |||
decoders, supports forward compatibility with newly registered tags | not support validity checking, provides forward compatibility with | |||
and simple values without the requirement to update the encoder at | newly registered tags and simple values without the requirement to | |||
the same time as the calling application. (For this, the API for the | update the encoder at the same time as the calling application. (For | |||
decoder needs to have a way to mark unknown items so that the calling | this, the API for the decoder needs to have a way to mark unknown | |||
application can handle them in a manner appropriate for the program.) | items so that the calling application can handle them in a manner | |||
Since some of this processing may have an appreciable cost (in | appropriate for the program.) | |||
particular with duplicate detection for maps), support of strict mode | ||||
is not a requirement placed on all CBOR decoders. | Since some of the processing needed for validity checking may have an | |||
appreciable cost (in particular with duplicate detection for maps), | ||||
support of validity checking is not a requirement placed on all CBOR | ||||
decoders. | ||||
Some encoders will rely on their applications to provide input data | Some encoders will rely on their applications to provide input data | |||
in such a way that unambiguously decodable CBOR results. A generic | in such a way that valid CBOR results. A generic encoder also may | |||
encoder also may want to provide a strict mode where it reliably | want to provide a validity-checking mode where it reliably limits its | |||
limits its output to unambiguously decodable CBOR, independent of | output to valid CBOR, independent of whether or not its application | |||
whether or not its application is providing API-conformant data. | is providing API-conformant data. | |||
6. Converting Data between CBOR and JSON | 6. Converting Data between CBOR and JSON | |||
This section gives non-normative advice about converting between CBOR | This section gives non-normative advice about converting between CBOR | |||
and JSON. Implementations of converters are free to use whichever | and JSON. Implementations of converters are free to use whichever | |||
advice here they want. | advice here they want. | |||
It is worth noting that a JSON text is a sequence of characters, not | It is worth noting that a JSON text is a sequence of characters, not | |||
an encoded sequence of bytes, while a CBOR data item consists of | an encoded sequence of bytes, while a CBOR data item consists of | |||
bytes, not characters. | bytes, not characters. | |||
skipping to change at page 38, line 51 ¶ | skipping to change at page 40, line 21 ¶ | |||
6.2. Converting from JSON to CBOR | 6.2. Converting from JSON to CBOR | |||
All JSON values, once decoded, directly map into one or more CBOR | All JSON values, once decoded, directly map into one or more CBOR | |||
values. As with any kind of CBOR generation, decisions have to be | values. As with any kind of CBOR generation, decisions have to be | |||
made with respect to number representation. In a suggested | made with respect to number representation. In a suggested | |||
conversion: | conversion: | |||
o JSON numbers without fractional parts (integer numbers) are | o JSON numbers without fractional parts (integer numbers) are | |||
represented as integers (major types 0 and 1, possibly major type | represented as integers (major types 0 and 1, possibly major type | |||
6 tag number 2 and 3), choosing the shortest form; integers longer | 6 tag number 2 and 3), choosing the shortest form; integers longer | |||
than an implementation-defined threshold (which is usually either | than an implementation-defined threshold may instead be | |||
32 or 64 bits) may instead be represented as floating-point | represented as floating-point values. The default range that is | |||
values. (If the JSON was generated from a JavaScript | represented as integer is -2**53+1..2**53-1 (fully exploiting the | |||
implementation, its precision is already limited to 53 bits | range for exact integers in the binary64 representation often used | |||
maximum.) | for decoding JSON [RFC7493]), implementations may choose | |||
-2**32..2**32-1 or -2**64..2**64-1 (fully using the integer ranges | ||||
available in CBOR with uint32_t or uint64_t, respectively) or even | ||||
-2**31..2**31-1 or -2**63..2**63-1 (using popular ranges for two's | ||||
complement signed integers). (If the JSON was generated from a | ||||
JavaScript implementation, its precision is already limited to 53 | ||||
bits maximum.) | ||||
o Numbers with fractional parts are represented as floating-point | o Numbers with fractional parts are represented as floating-point | |||
values. Preferably, the shortest exact floating-point | values, performing the decimal-to-binary conversion based on the | |||
representation is used; for instance, 1.5 is represented in a | precision provided by IEEE 754 binary64. Then, when encoding in | |||
16-bit floating-point value (not all implementations will be | CBOR, the preferred serialization uses the shortest floating-point | |||
capable of efficiently finding the minimum form, though). There | representation exactly representing this conversion result; for | |||
may be an implementation-defined limit to the precision that will | instance, 1.5 is represented in a 16-bit floating-point value (not | |||
affect the precision of the represented values. Decimal | all implementations will be capable of efficiently finding the | |||
representation should only be used if that is specified in a | minimum form, though). Instead of using the default binary64 | |||
protocol. | precision, there may be an implementation-defined limit to the | |||
precision of the conversion that will affect the precision of the | ||||
represented values. Decimal representation should only be used on | ||||
the CBOR side if that is specified in a protocol. | ||||
CBOR has been designed to generally provide a more compact encoding | CBOR has been designed to generally provide a more compact encoding | |||
than JSON. One implementation strategy that might come to mind is to | than JSON. One implementation strategy that might come to mind is to | |||
perform a JSON-to-CBOR encoding in place in a single buffer. This | perform a JSON-to-CBOR encoding in place in a single buffer. This | |||
strategy would need to carefully consider a number of pathological | strategy would need to carefully consider a number of pathological | |||
cases, such as that some strings represented with no or very few | cases, such as that some strings represented with no or very few | |||
escapes and longer (or much longer) than 255 bytes may expand when | escapes and longer (or much longer) than 255 bytes may expand when | |||
encoded as UTF-8 strings in CBOR. Similarly, a few of the binary | encoded as UTF-8 strings in CBOR. Similarly, a few of the binary | |||
floating-point representations might cause expansion from some short | floating-point representations might cause expansion from some short | |||
decimal representations (1.1, 1e9) in JSON. This may be hard to get | decimal representations (1.1, 1e9) in JSON. This may be hard to get | |||
skipping to change at page 41, line 43 ¶ | skipping to change at page 43, line 18 ¶ | |||
RFC 8259, extending it where needed. | RFC 8259, extending it where needed. | |||
The notation borrows the JSON syntax for numbers (integer and | The notation borrows the JSON syntax for numbers (integer and | |||
floating point), True (>true<), False (>false<), Null (>null<), UTF-8 | floating point), True (>true<), False (>false<), Null (>null<), UTF-8 | |||
strings, arrays, and maps (maps are called objects in JSON; the | strings, arrays, and maps (maps are called objects in JSON; the | |||
diagnostic notation extends JSON here by allowing any data item in | diagnostic notation extends JSON here by allowing any data item in | |||
the key position). Undefined is written >undefined< as in | the key position). Undefined is written >undefined< as in | |||
JavaScript. The non-finite floating-point numbers Infinity, | JavaScript. The non-finite floating-point numbers Infinity, | |||
-Infinity, and NaN are written exactly as in this sentence (this is | -Infinity, and NaN are written exactly as in this sentence (this is | |||
also a way they can be written in JavaScript, although JSON does not | also a way they can be written in JavaScript, although JSON does not | |||
allow them). A tagged item is written as an integer number for the | allow them). A tag is written as an integer number for the tag | |||
tag, followed by the item in parentheses; for instance, an RFC 3339 | number, followed by the tag content in parentheses; for instance, an | |||
(ISO 8601) date could be notated as: | RFC 3339 (ISO 8601) date could be notated as: | |||
0("2013-03-21T20:04:00Z") | 0("2013-03-21T20:04:00Z") | |||
or the equivalent relative time as | or the equivalent relative time as | |||
1(1363896240) | 1(1363896240) | |||
Byte strings are notated in one of the base encodings, without | Byte strings are notated in one of the base encodings, without | |||
padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | |||
for base32, >h32< for base32hex, >b64< for base64 or base64url (the | for base32, >h32< for base32hex, >b64< for base64 or base64url (the | |||
skipping to change at page 47, line 4 ¶ | skipping to change at page 48, line 41 ¶ | |||
The input check itself may consume resources. This is usually linear | The input check itself may consume resources. This is usually linear | |||
in the size of the input, which means that an attacker has to spend | in the size of the input, which means that an attacker has to spend | |||
resources that are commensurate to the resources spent by the | resources that are commensurate to the resources spent by the | |||
defender on input validation. Processing for arbitrary-precision | defender on input validation. Processing for arbitrary-precision | |||
numbers may exceed linear effort. Also, some hash-table | numbers may exceed linear effort. Also, some hash-table | |||
implementations that are used by decoders to build in-memory | implementations that are used by decoders to build in-memory | |||
representations of maps can be attacked to spend quadratic effort, | representations of maps can be attacked to spend quadratic effort, | |||
unless a secret key is employed (see Section 7 of [SIPHASH]). Such | unless a secret key is employed (see Section 7 of [SIPHASH]). Such | |||
superlinear efforts can be employed by an attacker to exhaust | superlinear efforts can be employed by an attacker to exhaust | |||
resources at or before the input validator; they therefore need to be | resources at or before the input validator; they therefore need to be | |||
avoided in a CBOR decoder implementation. Note that Tag number | avoided in a CBOR decoder implementation. Note that tag number | |||
definitions and their implementations can add security considerations | definitions and their implementations can add security considerations | |||
of this kind; this should then be discussed in the security | of this kind; this should then be discussed in the security | |||
considerations of the Tag number definition. | considerations of the tag number definition. | |||
CBOR encoders do not receive input directly from the network and are | CBOR encoders do not receive input directly from the network and are | |||
thus not directly attackable in the same way as CBOR decoders. | thus not directly attackable in the same way as CBOR decoders. | |||
However, CBOR encoders often have an API that takes input from | However, CBOR encoders often have an API that takes input from | |||
another level in the implementation and can be attacked through that | another level in the implementation and can be attacked through that | |||
API. The design and implementation of that API should assume the | API. The design and implementation of that API should assume the | |||
behavior of its caller may be based on hostile input or on coding | behavior of its caller may be based on hostile input or on coding | |||
mistakes. It should check inputs for buffer overruns, overflow and | mistakes. It should check inputs for buffer overruns, overflow and | |||
underflow of integer arithmetic, and other such errors that are aimed | underflow of integer arithmetic, and other such errors that are aimed | |||
to disrupt the encoder. | to disrupt the encoder. | |||
Protocols that are used in a security context should be defined in | Protocols should be defined in such a way that potential multiple | |||
such a way that potential multiple interpretations are reliably | interpretations are reliably reduced to a single interpretation. For | |||
reduced to a single interpretation. For example, an attacker could | example, an attacker could make use of invalid input such as | |||
make use of invalid input such as duplicate keys in maps, or exploit | duplicate keys in maps, or exploit different precision in processing | |||
different precision in processing numbers to make one application | numbers to make one application base its decisions on a different | |||
base its decisions on a different interpretation than the one that | interpretation than the one that will be used by a second | |||
will be used by a second application. To facilitate consistent | application. To facilitate consistent interpretation, encoder and | |||
interpretation, encoder and decoder implementations used in such | decoder implementations should provide a validity checking mode of | |||
contexts should provide at least one strict mode of operation | operation (Section 5.8). Note, however, that a generic decoder | |||
(Section 5.8). | cannot know about all requirements that an application poses on its | |||
input data; it is therefore not relieving the application from | ||||
performing its own input checking. Also, since the set of defined | ||||
tag numbers evolves, the application may employ a tag number that is | ||||
not yet supported for validity checking by the generic decoder it | ||||
uses. Generic decoders therefore need to provide documentation which | ||||
tag numbers they support and what validity checking they can provide | ||||
for each of them as well as for basic CBOR validity (UTF-8 checking, | ||||
duplicate map key checking). | ||||
11. References | 11. References | |||
11.1. Normative References | 11.1. Normative References | |||
[ECMA262] Ecma International, "ECMAScript 2018 Language | [ECMA262] Ecma International, "ECMAScript 2018 Language | |||
Specification", ECMA Standard ECMA-262, 9th Edition, June | Specification", ECMA Standard ECMA-262, 9th Edition, June | |||
2018, <https://www.ecma- | 2018, <https://www.ecma- | |||
international.org/publications/files/ECMA-ST/ | international.org/publications/files/ECMA-ST/Ecma- | |||
Ecma-262.pdf>. | 262.pdf>. | |||
[IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE | [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE | |||
Std 754-2008. | Std 754-2008. | |||
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
Extensions (MIME) Part One: Format of Internet Message | Extensions (MIME) Part One: Format of Internet Message | |||
Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, | Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, | |||
<https://www.rfc-editor.org/info/rfc2045>. | <https://www.rfc-editor.org/info/rfc2045>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
skipping to change at page 49, line 5 ¶ | skipping to change at page 50, line 48 ¶ | |||
[ASN.1] International Telecommunication Union, "Information | [ASN.1] International Telecommunication Union, "Information | |||
Technology -- ASN.1 encoding rules: Specification of Basic | Technology -- ASN.1 encoding rules: Specification of Basic | |||
Encoding Rules (BER), Canonical Encoding Rules (CER) and | Encoding Rules (BER), Canonical Encoding Rules (CER) and | |||
Distinguished Encoding Rules (DER)", ITU-T Recommendation | Distinguished Encoding Rules (DER)", ITU-T Recommendation | |||
X.690, 1994. | X.690, 1994. | |||
[BSON] Various, "BSON - Binary JSON", 2013, | [BSON] Various, "BSON - Binary JSON", 2013, | |||
<http://bsonspec.org/>. | <http://bsonspec.org/>. | |||
[I-D.ietf-cbor-sequence] | ||||
Bormann, C., "Concise Binary Object Representation (CBOR) | ||||
Sequences", draft-ietf-cbor-sequence-02 (work in | ||||
progress), September 2019. | ||||
[IANA.cbor-simple-values] | [IANA.cbor-simple-values] | |||
IANA, "Concise Binary Object Representation (CBOR) Simple | IANA, "Concise Binary Object Representation (CBOR) Simple | |||
Values", | Values", | |||
<http://www.iana.org/assignments/cbor-simple-values>. | <http://www.iana.org/assignments/cbor-simple-values>. | |||
[IANA.cbor-tags] | [IANA.cbor-tags] | |||
IANA, "Concise Binary Object Representation (CBOR) Tags", | IANA, "Concise Binary Object Representation (CBOR) Tags", | |||
<http://www.iana.org/assignments/cbor-tags>. | <http://www.iana.org/assignments/cbor-tags>. | |||
[MessagePack] | [MessagePack] | |||
skipping to change at page 49, line 38 ¶ | skipping to change at page 51, line 38 ¶ | |||
[RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object | [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object | |||
Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, | Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, | |||
October 2013, <https://www.rfc-editor.org/info/rfc7049>. | October 2013, <https://www.rfc-editor.org/info/rfc7049>. | |||
[RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for | [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for | |||
Constrained-Node Networks", RFC 7228, | Constrained-Node Networks", RFC 7228, | |||
DOI 10.17487/RFC7228, May 2014, | DOI 10.17487/RFC7228, May 2014, | |||
<https://www.rfc-editor.org/info/rfc7228>. | <https://www.rfc-editor.org/info/rfc7228>. | |||
[RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, | ||||
DOI 10.17487/RFC7493, March 2015, | ||||
<https://www.rfc-editor.org/info/rfc7493>. | ||||
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data | [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data | |||
Interchange Format", STD 90, RFC 8259, | Interchange Format", STD 90, RFC 8259, | |||
DOI 10.17487/RFC8259, December 2017, | DOI 10.17487/RFC8259, December 2017, | |||
<https://www.rfc-editor.org/info/rfc8259>. | <https://www.rfc-editor.org/info/rfc8259>. | |||
[RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T., | ||||
and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS | ||||
Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September | ||||
2019, <https://www.rfc-editor.org/info/rfc8618>. | ||||
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- | [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- | |||
Input PRF", Lecture Notes in Computer Science pp. 489-508, | Input PRF", Lecture Notes in Computer Science pp. 489-508, | |||
DOI 10.1007/978-3-642-34931-7_28, 2012. | DOI 10.1007/978-3-642-34931-7_28, 2012. | |||
[YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup | [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup | |||
Language (YAML[TM]) Version 1.2", 3rd Edition, October | Language (YAML[TM]) Version 1.2", 3rd Edition, October | |||
2009, <http://www.yaml.org/spec/1.2/spec.html>. | 2009, <http://www.yaml.org/spec/1.2/spec.html>. | |||
Appendix A. Examples | Appendix A. Examples | |||
skipping to change at page 56, line 29 ¶ | skipping to change at page 59, line 29 ¶ | |||
| 0xc2 | Positive bignum (data item "byte string" follows) | | | 0xc2 | Positive bignum (data item "byte string" follows) | | |||
| | | | | | | | |||
| 0xc3 | Negative bignum (data item "byte string" follows) | | | 0xc3 | Negative bignum (data item "byte string" follows) | | |||
| | | | | | | | |||
| 0xc4 | Decimal Fraction (data item "array" follows; see | | | 0xc4 | Decimal Fraction (data item "array" follows; see | | |||
| | Section 3.4.5) | | | | Section 3.4.5) | | |||
| | | | | | | | |||
| 0xc5 | Bigfloat (data item "array" follows; see | | | 0xc5 | Bigfloat (data item "array" follows; see | | |||
| | Section 3.4.5) | | | | Section 3.4.5) | | |||
| | | | | | | | |||
| 0xc6..0xd4 | (tagged item) | | | 0xc6..0xd4 | (tag) | | |||
| | | | | | | | |||
| 0xd5..0xd7 | Expected Conversion (data item follows; see | | | 0xd5..0xd7 | Expected Conversion (data item follows; see | | |||
| | Section 3.4.6.2) | | | | Section 3.4.6.2) | | |||
| | | | | | | | |||
| 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data | | | 0xd8..0xdb | (more tags, 1/2/4/8 bytes and then a data item | | |||
| | item follow) | | | | follow) | | |||
| | | | | | | | |||
| 0xe0..0xf3 | (simple value) | | | 0xe0..0xf3 | (simple value) | | |||
| | | | | | | | |||
| 0xf4 | False | | | 0xf4 | False | | |||
| | | | | | | | |||
| 0xf5 | True | | | 0xf5 | True | | |||
| | | | | | | | |||
| 0xf6 | Null | | | 0xf6 | Null | | |||
| | | | | | | | |||
| 0xf7 | Undefined | | | 0xf7 | Undefined | | |||
skipping to change at page 61, line 22 ¶ | skipping to change at page 64, line 22 ¶ | |||
3. no schema description needed | 3. no schema description needed | |||
4. reasonably compact serialization | 4. reasonably compact serialization | |||
5. applicability to constrained and unconstrained applications | 5. applicability to constrained and unconstrained applications | |||
6. good JSON conversion | 6. good JSON conversion | |||
7. extensibility | 7. extensibility | |||
A discussion of CBOR and other formats with respect to a different | ||||
set of design objectives is provided in Section 5 and Appendix C of | ||||
[RFC8618]. | ||||
E.1. ASN.1 DER, BER, and PER | E.1. ASN.1 DER, BER, and PER | |||
[ASN.1] has many serializations. In the IETF, DER and BER are the | [ASN.1] has many serializations. In the IETF, DER and BER are the | |||
most common. The serialized output is not particularly compact for | most common. The serialized output is not particularly compact for | |||
many items, and the code needed to decode numeric items can be | many items, and the code needed to decode numeric items can be | |||
complex on a constrained device. | complex on a constrained device. | |||
Few (if any) IETF protocols have adopted one of the several variants | Few (if any) IETF protocols have adopted one of the several variants | |||
of Packed Encoding Rules (PER). There could be many reasons for | of Packed Encoding Rules (PER). There could be many reasons for | |||
this, but one that is commonly stated is that PER makes use of the | this, but one that is commonly stated is that PER makes use of the | |||
skipping to change at page 63, line 44 ¶ | skipping to change at page 66, line 44 ¶ | |||
o Updated reference for [CNN-TERMS] to [RFC7228] | o Updated reference for [CNN-TERMS] to [RFC7228] | |||
o Added a comment to the last example in Section 2.2.1 (added | o Added a comment to the last example in Section 2.2.1 (added | |||
"Second value") | "Second value") | |||
o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | |||
o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | |||
"0b000_11001") | "0b000_11001") | |||
Appendix G. Well-formedness errors and examples | ||||
There are three basic kinds of well-formedness errors that can occur | ||||
in decoding a CBOR data item: | ||||
o Too much data: There are input bytes left that were not consumed. | ||||
This is only an error if the application assumed that the input | ||||
bytes would span exexactly one data item. Where the application | ||||
uses the self-delimiting nature of CBOR encoding to permit | ||||
additional data after the data item, as is for example done in | ||||
CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can | ||||
simply indicate what part of the input has not been consumed. | ||||
o Too little data: The input data available would need additional | ||||
bytes added at their end for a complete CBOR data item. This may | ||||
indicate the input is truncated; it is also a common error when | ||||
trying to decode random data as CBOR. For some applications | ||||
however, this may not be actually be an error, as the application | ||||
may not be certain it has all the data yet and can obtain or wait | ||||
for additional input bytes. Some of these applications may have | ||||
an upper limit for how much additional data can show up; here the | ||||
decoder may be able to indicate that the encoded CBOR data item | ||||
cannot be completed within this limit. | ||||
o Syntax error: The input data are not consistent with the | ||||
requirements of the CBOR encoding, and this cannot be remedied by | ||||
adding (or removing) data at the end. | ||||
In Appendix C, errors of the first kind are addressed in the first | ||||
paragraph/bullet list (requiring "no bytes are left"), and errors of | ||||
the second kind are addressed in the second paragraph/bullet list | ||||
(failing "if n bytes are no longer available"). Errors of the third | ||||
kind are identified in the pseudocode by specific instances of | ||||
calling fail(), in order: | ||||
o a reserved value is used for additional information (28, 29, 30) | ||||
o major type 7, additional information 24, value < 32 (incorrect or | ||||
incorrectly encoded simple type) | ||||
o incorrect substructure of indefinite length byte/text string (may | ||||
only contain definite length strings of the same major type) | ||||
o break stop code (mt=7, ai=31) occurs in a value position of a map | ||||
or except at a position directly in an indefinite length item | ||||
where also another enclosed data item could occur | ||||
o additional information 31 used with major type 0, 1, or 6 | ||||
G.1. Examples for CBOR data items that are not well-formed | ||||
This subsection shows a few examples for CBOR data items that are not | ||||
well-formed. Each example is a sequence of bytes each shown in | ||||
hexadecimal; multiple examples in a list are separated by commas. | ||||
Examples for well-formedness error kind 1 (too much data) can easily | ||||
be formed by adding data to a well-formed encoded CBOR data item. | ||||
Similarly, examples for well-formedness error kind 2 (too little | ||||
data) can be formed by truncating a well-formed encoded CBOR data | ||||
item. In test suites, it may be beneficial to specifically test with | ||||
incomplete data items that would require large amounts of addition to | ||||
be completed (for instance by starting the encoding of a string of a | ||||
very large size). | ||||
A premature end of the input can occur in a head or within the | ||||
enclosed data, which may be bare strings or enclosed data items that | ||||
are either counted or should have been ended by a break stop code. | ||||
o End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 | ||||
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa | ||||
00 00, fb 00 00 00 | ||||
o Definite length strings with short data: 41, 61, 5a ff ff ff ff | ||||
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f | ||||
ff ff ff ff ff ff ff 01 02 03 | ||||
o Definite length maps and arrays not closed with enough items: 81, | ||||
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 | ||||
00 | ||||
o Indefinite length strings not closed by a break stop code: 5f 41 | ||||
00, 7f 61 00 | ||||
o Indefinite length maps and arrays not closed by a break stop code: | ||||
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f | ||||
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff | ||||
A few examples for the five subkinds of well-formedness error kind 3 | ||||
(syntax error) are shown below. | ||||
Subkind 1: | ||||
o Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, | ||||
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, | ||||
fd, fe, | ||||
Subkind 2: | ||||
o Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, | ||||
f8 1f | ||||
Subkind 3: | ||||
o Indefinite length string chunks not of the correct type: 5f 00 ff, | ||||
5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, | ||||
7f 41 00 ff | ||||
o Indefinite length string chunks not definite length: 5f 5f 41 00 | ||||
ff ff, 7f 7f 61 00 ff ff | ||||
Subkind 4: | ||||
o Break occurring on its own outside of an indefinite length item: | ||||
ff | ||||
o Break occurring in a definite length array or map or a tag: 81 ff, | ||||
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 | ||||
9f 81 9f 9f ff ff ff ff | ||||
o Break in indefinite length map would lead to odd number of items | ||||
(break in a value position): bf 00 ff, bf 00 00 00 ff | ||||
Subkind 5: | ||||
o Major type 0, 1, 6 with additional information 31: 1f, 3f, df | ||||
Acknowledgements | Acknowledgements | |||
CBOR was inspired by MessagePack. MessagePack was developed and | CBOR was inspired by MessagePack. MessagePack was developed and | |||
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | |||
MessagePack is solely for attribution; CBOR is not intended as a | MessagePack is solely for attribution; CBOR is not intended as a | |||
version of or replacement for MessagePack, as it has different design | version of or replacement for MessagePack, as it has different design | |||
goals and requirements. | goals and requirements. | |||
The need for functionality beyond the original MessagePack | The need for functionality beyond the original MessagePack | |||
Specification became obvious to many people at about the same time | Specification became obvious to many people at about the same time | |||
End of changes. 57 change blocks. | ||||
201 lines changed or deleted | 403 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |