draft-ietf-cbor-7049bis-05.txt | draft-ietf-cbor-7049bis-06.txt | |||
---|---|---|---|---|
Network Working Group C. Bormann | Network Working Group C. Bormann | |||
Internet-Draft Universitaet Bremen TZI | Internet-Draft Universitaet Bremen TZI | |||
Intended status: Standards Track P. Hoffman | Intended status: Standards Track P. Hoffman | |||
Expires: July 19, 2019 ICANN | Expires: January 3, 2020 ICANN | |||
January 15, 2019 | July 02, 2019 | |||
Concise Binary Object Representation (CBOR) | Concise Binary Object Representation (CBOR) | |||
draft-ietf-cbor-7049bis-05 | draft-ietf-cbor-7049bis-06 | |||
Abstract | Abstract | |||
The Concise Binary Object Representation (CBOR) is a data format | The Concise Binary Object Representation (CBOR) is a data format | |||
whose design goals include the possibility of extremely small code | whose design goals include the possibility of extremely small code | |||
size, fairly small message size, and extensibility without the need | size, fairly small message size, and extensibility without the need | |||
for version negotiation. These design goals make it different from | for version negotiation. These design goals make it different from | |||
earlier binary serializations such as ASN.1 and MessagePack. | earlier binary serializations such as ASN.1 and MessagePack. | |||
This document obsoletes RFC 7049. | ||||
Contributing | Contributing | |||
This document is being worked on in the CBOR Working Group. Please | This document is being worked on in the CBOR Working Group. Please | |||
contribute on the mailing list there, or in the GitHub repository for | contribute on the mailing list there, or in the GitHub repository for | |||
this draft: https://github.com/cbor-wg/CBORbis | this draft: https://github.com/cbor-wg/CBORbis | |||
The charter for the CBOR Working Group says that the WG will update | The charter for the CBOR Working Group says that the WG will update | |||
RFC 7049 to fix verified errata. Security issues and clarifications | RFC 7049 to fix verified errata. Security issues and clarifications | |||
may be addressed, but changes to this document will ensure backward | may be addressed, but changes to this document will ensure backward | |||
compatibility for popular deployed codebases. This document will be | compatibility for popular deployed codebases. This document will be | |||
skipping to change at page 1, line 47 ¶ | skipping to change at page 1, line 49 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on July 19, 2019. | This Internet-Draft will expire on January 3, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | |||
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 8 | |||
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | |||
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 12 | 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 12 | |||
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 12 | 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 12 | |||
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 12 | 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 12 | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 14 | 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 14 | |||
3.3. Floating-Point Numbers and Values with No Content . . . . 15 | 3.3. Floating-Point Numbers and Values with No Content . . . . 15 | |||
3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 16 | 3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 17 | |||
3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 18 | 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 19 | |||
3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 18 | 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 19 | |||
3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 18 | 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 19 | |||
3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 20 | 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 21 | |||
3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 21 | 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 22 | |||
3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 21 | 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 22 | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON | |||
Converters . . . . . . . . . . . . . . . . . . . 22 | Converters . . . . . . . . . . . . . . . . . . . 23 | |||
3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 22 | 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 23 | |||
3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 23 | 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 24 | |||
4. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 23 | 4. Serialization Considerations . . . . . . . . . . . . . . . . 25 | |||
4.1. CBOR in Streaming Applications . . . . . . . . . . . . . 24 | 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 25 | |||
4.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 24 | 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 26 | |||
4.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 25 | 4.2.1. Core Deterministic Encoding Requirements . . . . . . 26 | |||
4.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 25 | 4.2.2. Additional Deterministic Encoding Considerations . . 27 | |||
4.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 26 | 4.2.3. Length-first map key ordering . . . . . . . . . . . . 28 | |||
4.3.3. Unknown Additional Information Values . . . . . . . . 26 | ||||
4.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 26 | 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 29 | |||
4.5. Handling Unknown Simple Values and Tags . . . . . . . . . 27 | 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 30 | |||
4.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 27 | 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 30 | |||
4.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 28 | 5.3. Invalid Items . . . . . . . . . . . . . . . . . . . . . . 31 | |||
4.7.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 29 | 5.4. Handling Unknown Simple Values and Tags . . . . . . . . . 32 | |||
4.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 30 | 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
4.9. Preferred Serialization . . . . . . . . . . . . . . . . . 30 | 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 33 | |||
4.10. Canonically Encoded CBOR . . . . . . . . . . . . . . . . 31 | 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 34 | |||
4.10.1. Length-first map key ordering . . . . . . . . . . . 33 | 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 35 | |||
4.11. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 34 | 5.8. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 35 | |||
5. Converting Data between CBOR and JSON . . . . . . . . . . . . 35 | 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 36 | |||
5.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 36 | 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 36 | |||
5.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 37 | 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 38 | |||
6. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 38 | 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 39 | |||
6.1. Extension Points . . . . . . . . . . . . . . . . . . . . 38 | 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 39 | |||
6.2. Curating the Additional Information Space . . . . . . . . 39 | 7.2. Curating the Additional Information Space . . . . . . . . 40 | |||
7. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 39 | 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 40 | |||
7.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 40 | 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 41 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 | |||
8.1. Simple Values Registry . . . . . . . . . . . . . . . . . 41 | 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 42 | |||
8.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 42 | 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 42 | |||
8.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 42 | 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 43 | |||
8.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 43 | 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 44 | |||
8.5. The +cbor Structured Syntax Suffix Registration . . . . . 43 | 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 44 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 44 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 47 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 45 | 11.2. Informative References . . . . . . . . . . . . . . . . . 48 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 46 | Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 50 | |||
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 48 | Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 54 | |||
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 52 | Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 57 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 55 | Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 59 | |||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 57 | ||||
Appendix E. Comparison of Other Binary Formats to CBOR's Design | Appendix E. Comparison of Other Binary Formats to CBOR's Design | |||
Objectives . . . . . . . . . . . . . . . . . . . . . 58 | Objectives . . . . . . . . . . . . . . . . . . . . . 60 | |||
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 59 | E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 61 | |||
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 59 | E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 61 | |||
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 60 | E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 62 | |||
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 60 | E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 62 | |||
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 60 | E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 62 | |||
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 61 | Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 63 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 | Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 | ||||
1. Introduction | 1. Introduction | |||
There are hundreds of standardized formats for binary representation | There are hundreds of standardized formats for binary representation | |||
of structured data (also known as binary serialization formats). Of | of structured data (also known as binary serialization formats). Of | |||
those, some are for specific domains of information, while others are | those, some are for specific domains of information, while others are | |||
generalized for arbitrary data. In the IETF, probably the best-known | generalized for arbitrary data. In the IETF, probably the best-known | |||
formats in the latter category are ASN.1's BER and DER [ASN.1]. | formats in the latter category are ASN.1's BER and DER [ASN.1]. | |||
The format defined here follows some specific design goals that are | The format defined here follows some specific design goals that are | |||
skipping to change at page 4, line 26 ¶ | skipping to change at page 4, line 20 ¶ | |||
to note that this is not a proposal that the grammar in RFC 8259 be | to note that this is not a proposal that the grammar in RFC 8259 be | |||
extended in general, since doing so would cause a significant | extended in general, since doing so would cause a significant | |||
backwards incompatibility with already deployed JSON documents. | backwards incompatibility with already deployed JSON documents. | |||
Instead, this document simply defines its own data model that starts | Instead, this document simply defines its own data model that starts | |||
from JSON. | from JSON. | |||
Appendix E lists some existing binary formats and discusses how well | Appendix E lists some existing binary formats and discusses how well | |||
they do or do not fit the design objectives of the Concise Binary | they do or do not fit the design objectives of the Concise Binary | |||
Object Representation (CBOR). | Object Representation (CBOR). | |||
This document obsoletes [RFC7049]. | ||||
1.1. Objectives | 1.1. Objectives | |||
The objectives of CBOR, roughly in decreasing order of importance, | The objectives of CBOR, roughly in decreasing order of importance, | |||
are: | are: | |||
1. The representation must be able to unambiguously encode most | 1. The representation must be able to unambiguously encode most | |||
common data formats used in Internet standards. | common data formats used in Internet standards. | |||
* It must represent a reasonable set of basic data types and | * It must represent a reasonable set of basic data types and | |||
structures using binary encoding. "Reasonable" here is | structures using binary encoding. "Reasonable" here is | |||
skipping to change at page 6, line 8 ¶ | skipping to change at page 5, line 50 ¶ | |||
* The format must support a form of extensibility that allows | * The format must support a form of extensibility that allows | |||
fallback so that a decoder that does not understand an | fallback so that a decoder that does not understand an | |||
extension can still decode the message. | extension can still decode the message. | |||
* The format must be able to be extended in the future by later | * The format must be able to be extended in the future by later | |||
IETF standards. | IETF standards. | |||
1.2. Terminology | 1.2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
document are to be interpreted as described in RFC 2119, BCP 14 | "OPTIONAL" in this document are to be interpreted as described in | |||
[RFC2119] and indicate requirement levels for compliant CBOR | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
implementations. | capitals, as shown here. | |||
The term "byte" is used in its now-customary sense as a synonym for | The term "byte" is used in its now-customary sense as a synonym for | |||
"octet". All multi-byte values are encoded in network byte order | "octet". All multi-byte values are encoded in network byte order | |||
(that is, most significant byte first, also known as "big-endian"). | (that is, most significant byte first, also known as "big-endian"). | |||
This specification makes use of the following terminology: | This specification makes use of the following terminology: | |||
Data item: A single piece of CBOR data. The structure of a data | Data item: A single piece of CBOR data. The structure of a data | |||
item may contain zero, one, or more nested data items. The term | item may contain zero, one, or more nested data items. The term | |||
is used both for the data item in representation format and for | is used both for the data item in representation format and for | |||
the abstract idea that can be derived from that by a decoder. | the abstract idea that can be derived from that by a decoder. | |||
Decoder: A process that decodes a CBOR data item and makes it | Decoder: A process that decodes a well-formed CBOR data item and | |||
available to an application. Formally speaking, a decoder | makes it available to an application. Formally speaking, a | |||
contains a parser to break up the input using the syntax rules of | decoder contains a parser to break up the input using the syntax | |||
CBOR, as well as a semantic processor to prepare the data in a | rules of CBOR, as well as a semantic processor to prepare the data | |||
form suitable to the application. | in a form suitable to the application. | |||
Encoder: A process that generates the representation format of a | Encoder: A process that generates the representation format of a | |||
CBOR data item from application information. | CBOR data item from application information. | |||
Data Stream: A sequence of zero or more data items, not further | Data Stream: A sequence of zero or more data items, not further | |||
assembled into a larger containing data item. The independent | assembled into a larger containing data item. The independent | |||
data items that make up a data stream are sometimes also referred | data items that make up a data stream are sometimes also referred | |||
to as "top-level data items". | to as "top-level data items". | |||
Well-formed: A data item that follows the syntactic structure of | Well-formed: A data item that follows the syntactic structure of | |||
CBOR. A well-formed data item uses the initial bytes and the byte | CBOR. A well-formed data item uses the initial bytes and the byte | |||
strings and/or data items that are implied by their values as | strings and/or data items that are implied by their values as | |||
defined in CBOR and is not followed by extraneous data. | defined in CBOR and does not include following extraneous data. | |||
CBOR decoders by definition only return contents from well-formed | ||||
data items. | ||||
Valid: A data item that is well-formed and also follows the semantic | Valid: A data item that is well-formed and also follows the semantic | |||
restrictions that apply to CBOR data items. | restrictions that apply to CBOR data items. | |||
Stream decoder: A process that decodes a data stream and makes each | Stream decoder: A process that decodes a data stream and makes each | |||
of the data items in the sequence available to an application as | of the data items in the sequence available to an application as | |||
they are received. | they are received. | |||
Where bit arithmetic or data types are explained, this document uses | Where bit arithmetic or data types are explained, this document uses | |||
the notation familiar from the programming language C, except that | the notation familiar from the programming language C, except that | |||
skipping to change at page 7, line 40 ¶ | skipping to change at page 7, line 37 ¶ | |||
In the basic (un-extended) generic data model, a data item is one of: | In the basic (un-extended) generic data model, a data item is one of: | |||
o an integer in the range -2**64..2**64-1 inclusive | o an integer in the range -2**64..2**64-1 inclusive | |||
o a simple value, identified by a number between 0 and 255, but | o a simple value, identified by a number between 0 and 255, but | |||
distinct from that number | distinct from that number | |||
o a floating point value, distinct from an integer, out of the set | o a floating point value, distinct from an integer, out of the set | |||
representable by IEEE 754 binary64 (including non-finites) | representable by IEEE 754 binary64 (including non-finites) | |||
[IEEE.754.2008] | [IEEE754] | |||
o a sequence of zero or more bytes ("byte string") | o a sequence of zero or more bytes ("byte string") | |||
o a sequence of zero or more Unicode code points ("text string") | o a sequence of zero or more Unicode code points ("text string") | |||
o a sequence of zero or more data items ("array") | o a sequence of zero or more data items ("array") | |||
o a mapping (mathematical function) from zero or more data items | o a mapping (mathematical function) from zero or more data items | |||
("keys") each to a data item ("values"), ("map") | ("keys") each to a data item ("values"), ("map") | |||
skipping to change at page 9, line 30 ¶ | skipping to change at page 9, line 24 ¶ | |||
"0.0" as an integer (major type 0, Section 3.1). However, if a | "0.0" as an integer (major type 0, Section 3.1). However, if a | |||
specific data model declares that floating point and integer | specific data model declares that floating point and integer | |||
representations of integral values are equivalent, using both map | representations of integral values are equivalent, using both map | |||
keys "0" and "0.0" in a single map would be considered duplicates and | keys "0" and "0.0" in a single map would be considered duplicates and | |||
so invalid, and an encoder could encode integral-valued floats as | so invalid, and an encoder could encode integral-valued floats as | |||
integers or vice versa, perhaps to save encoded bytes. | integers or vice versa, perhaps to save encoded bytes. | |||
3. Specification of the CBOR Encoding | 3. Specification of the CBOR Encoding | |||
A CBOR data item (Section 2) is encoded to or decoded from a byte | A CBOR data item (Section 2) is encoded to or decoded from a byte | |||
string as described in this section. The encoding is summarized in | string carrying a well-formed encoded data item as described in this | |||
Table 5. | section. The encoding is summarized in Table 5. An encoder MUST | |||
produce only well-formed encoded data items. A decoder MUST NOT | ||||
return a decoded data item when it encounters input that is not a | ||||
well-formed encoded CBOR data item (this does not detract from the | ||||
usefulness of diagnostic and recovery tools that might make available | ||||
some information from a damaged encoded CBOR data item). | ||||
The initial byte of each encoded data item contains both information | The initial byte of each encoded data item contains both information | |||
about the major type (the high-order 3 bits, described in | about the major type (the high-order 3 bits, described in | |||
Section 3.1) and additional information (the low-order 5 bits). | Section 3.1) and additional information (the low-order 5 bits). With | |||
Additional information value 31 is used for indefinite-length items, | a few exceptions, the additional information's value describes how to | |||
described in Section 3.2. Additional information values 28 to 30 are | load an unsigned integer "argument": | |||
reserved for future expansion. | ||||
Additional information values from 0 to 27 describes how to construct | Less than 24: The argument's value is the value of the additional | |||
an "argument", possibly consuming additional bytes. For major type 7 | information. | |||
and additional information 25 to 27 (floating point numbers), there | ||||
is a special case; in all other cases the additional information | ||||
value, possibly combined with following bytes, the argument | ||||
constructed is an unsigned integer. | ||||
When the value of the additional information is less than 24, it is | 24, 25, 26, or 27: The argument's value is held in the following 1, | |||
directly used as the argument's value. When it is 24 to 27, the | 2, 4, or 8 bytes, respectively, in network byte order. For major | |||
argument's value is held in the following 1, 2, 4, or 8, | type 7 and additional information value 25, 26, 27, these bytes | |||
respectively, bytes, in network byte order. | are not used as an integer argument, but as a floating point value | |||
(see Section 3.3). | ||||
28, 29, 30: These values are reserved for future additions to the | ||||
CBOR format. In the present version of CBOR, the encoded item is | ||||
not well-formed. | ||||
31: No argument value is derived. If the major type is 0, 1, or 6, | ||||
the encoded item is not well-formed. For major types 2 to 5, the | ||||
item's length is indefinite, and for major type 7, the byte does | ||||
not consitute a data item at all but terminates an indefinite | ||||
length item; both are described in Section 3.2. | ||||
The meaning of this argument depends on the major type. For example, | The meaning of this argument depends on the major type. For example, | |||
in major type 0, the argument is the value of the data item itself | in major type 0, the argument is the value of the data item itself | |||
(and in major type 1 the value of the data item is computed from the | (and in major type 1 the value of the data item is computed from the | |||
argument); in major type 2 and 3 it gives the length of the string | argument); in major type 2 and 3 it gives the length of the string | |||
data in bytes that follows; and in major types 4 and 5 it is used to | data in bytes that follows; and in major types 4 and 5 it is used to | |||
determine the number of data items enclosed. | determine the number of data items enclosed. | |||
If the encoded sequence of bytes ends before the end of a data item | If the encoded sequence of bytes ends before the end of a data item, | |||
would be reached, that encoding is not well-formed. If the encoded | that item is not well-formed. If the encoded sequence of bytes still | |||
sequence of bytes still has bytes remaining after the outermost | has bytes remaining after the outermost encoded item is decoded, that | |||
encoded item is decoded, that encoding is not a single well-formed | encoding is not a single well-formed CBOR item; depending on the | |||
CBOR item. | application, the decoder may either treat the encoding as not well- | |||
formed or just identify the start of the remaining bytes to the | ||||
application. | ||||
A CBOR decoder implementation can be based on a jump table with all | A CBOR decoder implementation can be based on a jump table with all | |||
256 defined values for the initial byte (Table 5). A decoder in a | 256 defined values for the initial byte (Table 5). A decoder in a | |||
constrained implementation can instead use the structure of the | constrained implementation can instead use the structure of the | |||
initial byte and following bytes for more compact code (see | initial byte and following bytes for more compact code (see | |||
Appendix C for a rough impression of how this could look). | Appendix C for a rough impression of how this could look). | |||
3.1. Major Types | 3.1. Major Types | |||
The following lists the major types and the additional information | The following lists the major types and the additional information | |||
skipping to change at page 11, line 34 ¶ | skipping to change at page 11, line 42 ¶ | |||
tables, dictionaries, hashes, or objects (in JSON). A map is | tables, dictionaries, hashes, or objects (in JSON). A map is | |||
comprised of pairs of data items, each pair consisting of a key | comprised of pairs of data items, each pair consisting of a key | |||
that is immediately followed by a value. The argument is the | that is immediately followed by a value. The argument is the | |||
number of _pairs_ of data items in the map. For example, a map | number of _pairs_ of data items in the map. For example, a map | |||
that contains 9 pairs would have an initial byte of 0b101_01001 | that contains 9 pairs would have an initial byte of 0b101_01001 | |||
(major type of 5, additional information of 9 for the number of | (major type of 5, additional information of 9 for the number of | |||
pairs) followed by the 18 remaining items. The first item is the | pairs) followed by the 18 remaining items. The first item is the | |||
first key, the second item is the first value, the third item is | first key, the second item is the first value, the third item is | |||
the second key, and so on. A map that has duplicate keys may be | the second key, and so on. A map that has duplicate keys may be | |||
well-formed, but it is not valid, and thus it causes indeterminate | well-formed, but it is not valid, and thus it causes indeterminate | |||
decoding; see also Section 4.7. | decoding; see also Section 5.6. | |||
Major type 6: a tagged data item whose tag is the argument and whose | Major type 6: a tagged data item whose tag is the argument and whose | |||
value is the single following encoded item. See Section 3.4. | value is the single following encoded item. See Section 3.4. | |||
Major type 7: floating-point numbers and simple values, as well as | Major type 7: floating-point numbers and simple values, as well as | |||
the "break" stop code. See Section 3.3. | the "break" stop code. See Section 3.3. | |||
These eight major types lead to a simple table showing which of the | These eight major types lead to a simple table showing which of the | |||
256 possible values for the initial byte of a data item are used | 256 possible values for the initial byte of a data item are used | |||
(Table 5). | (Table 5). | |||
In major types 6 and 7, many of the possible values are reserved for | In major types 6 and 7, many of the possible values are reserved for | |||
future specification. See Section 8 for more information on these | future specification. See Section 9 for more information on these | |||
values. | values. | |||
3.2. Indefinite Lengths for Some Major Types | 3.2. Indefinite Lengths for Some Major Types | |||
Four CBOR items (arrays, maps, byte strings, and text strings) can be | Four CBOR items (arrays, maps, byte strings, and text strings) can be | |||
encoded with an indefinite length using additional information value | encoded with an indefinite length using additional information value | |||
31. This is useful if the encoding of the item needs to begin before | 31. This is useful if the encoding of the item needs to begin before | |||
the number of items inside the array or map, or the total length of | the number of items inside the array or map, or the total length of | |||
the string, is known. (The application of this is often referred to | the string, is known. (The application of this is often referred to | |||
as "streaming" within a data item.) | as "streaming" within a data item.) | |||
skipping to change at page 15, line 37 ¶ | skipping to change at page 16, line 19 ¶ | |||
| 0..23 | Simple value (value 0..23) | | | 0..23 | Simple value (value 0..23) | | |||
| | | | | | | | |||
| 24 | Simple value (value 32..255 in following byte) | | | 24 | Simple value (value 32..255 in following byte) | | |||
| | | | | | | | |||
| 25 | IEEE 754 Half-Precision Float (16 bits follow) | | | 25 | IEEE 754 Half-Precision Float (16 bits follow) | | |||
| | | | | | | | |||
| 26 | IEEE 754 Single-Precision Float (32 bits follow) | | | 26 | IEEE 754 Single-Precision Float (32 bits follow) | | |||
| | | | | | | | |||
| 27 | IEEE 754 Double-Precision Float (64 bits follow) | | | 27 | IEEE 754 Double-Precision Float (64 bits follow) | | |||
| | | | | | | | |||
| 28-30 | (Unassigned) | | | 28-30 | Unassigned, not well-formed in the present document | | |||
| | | | | | | | |||
| 31 | "break" stop code for indefinite-length items | | | 31 | "break" stop code for indefinite-length items | | |||
| | (Section 3.2.1) | | | | (Section 3.2.1) | | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
Table 1: Values for Additional Information in Major Type 7 | Table 1: Values for Additional Information in Major Type 7 | |||
As with all other major types, the 5-bit value 24 signifies a single- | As with all other major types, the 5-bit value 24 signifies a single- | |||
byte extension: it is followed by an additional byte to represent the | byte extension: it is followed by an additional byte to represent the | |||
simple value. (To minimize confusion, only the values 32 to 255 are | simple value. (To minimize confusion, only the values 32 to 255 are | |||
skipping to change at page 16, line 25 ¶ | skipping to change at page 17, line 25 ¶ | |||
| | | | | | | | |||
| 23 | Undefined value | | | 23 | Undefined value | | |||
| | | | | | | | |||
| 24..31 | (Reserved) | | | 24..31 | (Reserved) | | |||
| | | | | | | | |||
| 32..255 | (Unassigned) | | | 32..255 | (Unassigned) | | |||
+---------+-----------------+ | +---------+-----------------+ | |||
Table 2: Simple Values | Table 2: Simple Values | |||
An encoder MUST NOT encode False as the two-byte sequence of 0xf814, | An encoder MUST NOT issue two-byte sequences that start with 0xf8 | |||
MUST NOT encode True as the two-byte sequence of 0xf815, MUST NOT | (major type = 7, additional information = 24) and continue with a | |||
encode Null as the two-byte sequence of 0xf816, and MUST NOT encode | byte less than 0x20 (32 decimal). Such sequences are not well- | |||
Undefined value as the two-byte sequence of 0xf817. A decoder MUST | formed. (This implies that an encoder cannot encode false, true, | |||
treat these two-byte sequences as an error. Similar prohibitions | null, or undefined in two-byte sequences, only the one-byte variants | |||
apply to the unassigned simple values as well. | of these are well-formed.) | |||
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | |||
IEEE 754 binary floating-point values [IEEE.754.2008]. These | IEEE 754 binary floating-point values [IEEE754]. These floating- | |||
floating-point values are encoded in the additional bytes of the | point values are encoded in the additional bytes of the appropriate | |||
appropriate size. (See Appendix D for some information about 16-bit | size. (See Appendix D for some information about 16-bit floating | |||
floating point.) | point.) | |||
3.4. Optional Tagging of Items | 3.4. Optional Tagging of Items | |||
In CBOR, a data item can optionally be preceded by a tag to give it | In CBOR, a data item can optionally be preceded by a tag to give it | |||
additional semantics while retaining its structure. The tag is major | additional semantics while retaining its structure. The tag is major | |||
type 6, and represents an integer number as indicated by the tag's | type 6, and represents an unsigned integer as indicated by the tag's | |||
argument (Section 3); the (sole) data item is carried as content | argument (Section 3); the (sole) data item is carried as content | |||
data. If a tag requires structured data, this structure is encoded | data. If a tag requires structured data, this structure is encoded | |||
into the nested data item. The definition of a tag usually restricts | into the nested data item. The definition of a tag usually restricts | |||
what kinds of nested data item or items are valid. | what kinds of nested data item or items are valid for this tag. | |||
The initial bytes of the tag follow the rules for positive integers | For example, assume that a byte string of length 12 is marked with a | |||
(major type 0). The tag is followed by a single data item of any | tag to indicate it is a positive bignum (Section 3.4.4). This would | |||
type. For example, assume that a byte string of length 12 is marked | be marked as 0b110_00010 (major type 6, additional information 2 for | |||
with a tag to indicate it is a positive bignum (Section 3.4.4). This | the tag) followed by 0b010_01100 (major type 2, additional | |||
would be marked as 0b110_00010 (major type 6, additional information | ||||
2 for the tag) followed by 0b010_01100 (major type 2, additional | ||||
information of 12 for the length) followed by the 12 bytes of the | information of 12 for the length) followed by the 12 bytes of the | |||
bignum. | bignum. | |||
Decoders do not need to understand tags, and thus tags may be of | Decoders do not need to understand tags, and thus tags may be of | |||
little value in applications where the implementation creating a | little value in applications where the implementation creating a | |||
particular CBOR data item and the implementation decoding that stream | particular CBOR data item and the implementation decoding that stream | |||
know the semantic meaning of each item in the data flow. Their | know the semantic meaning of each item in the data flow. Their | |||
primary purpose in this specification is to define common data types | primary purpose in this specification is to define common data types | |||
such as dates. A secondary purpose is to allow optional tagging when | such as dates. A secondary purpose is to allow optional tagging when | |||
the decoder is a generic CBOR decoder that might be able to benefit | the decoder is a generic CBOR decoder that might be able to benefit | |||
from hints about the content of items. Understanding the semantic | from hints about the content of items. Understanding the semantic | |||
tags is optional for a decoder; it can just jump over the initial | tags is optional for a decoder; it can just jump over the initial | |||
bytes of the tag and interpret the tagged data item itself. | bytes of the tag and interpret the tagged data item itself. | |||
A tag always applies to the item that is directly followed by it. | A tag always applies to the item that directly follows it. Thus, if | |||
Thus, if tag A is followed by tag B, which is followed by data item | tag A is followed by tag B, which is followed by data item C, tag A | |||
C, tag A applies to the result of applying tag B on data item C. | applies to the result of applying tag B on data item C. That is, a | |||
That is, a tagged item is a data item consisting of a tag and a | tagged item is a data item consisting of a tag and a value. The | |||
value. The content of the tagged item is the data item (the value) | content of the tagged item is the data item (the value) that is being | |||
that is being tagged. | tagged. | |||
IANA maintains a registry of tag values as described in Section 8.2. | IANA maintains a registry of tag values as described in Section 9.2. | |||
Table 3 provides a list of initial values, with definitions in the | Table 3 provides a list of values that were defined in [RFC7049], | |||
rest of this section. | with definitions in the rest of this section. Note that many other | |||
tags have been defined since the publication of [RFC7049]; see the | ||||
registry described at Section 9.2 for the complete list. | ||||
+-----------+--------------+----------------------------------------+ | +-------+-----------+-----------------------------------------------+ | |||
| Tag | Data Item | Semantics | | | Tag | Data Item | Semantics | | |||
+-----------+--------------+----------------------------------------+ | +-------+-----------+-----------------------------------------------+ | |||
| 0 | UTF-8 string | Standard date/time string; see | | | 0 | UTF-8 | Standard date/time string; see Section 3.4.2 | | |||
| | | Section 3.4.2 | | | | string | | | |||
| | | | | | | | | | |||
| 1 | multiple | Epoch-based date/time; see | | | 1 | multiple | Epoch-based date/time; see Section 3.4.3 | | |||
| | | Section 3.4.3 | | | | | | | |||
| | | | | | 2 | byte | Positive bignum; see Section 3.4.4 | | |||
| 2 | byte string | Positive bignum; see Section 3.4.4 | | | | string | | | |||
| | | | | | | | | | |||
| 3 | byte string | Negative bignum; see Section 3.4.4 | | | 3 | byte | Negative bignum; see Section 3.4.4 | | |||
| | | | | | | string | | | |||
| 4 | array | Decimal fraction; see Section 3.4.5 | | | | | | | |||
| | | | | | 4 | array | Decimal fraction; see Section 3.4.5 | | |||
| 5 | array | Bigfloat; see Section 3.4.5 | | | | | | | |||
| | | | | | 5 | array | Bigfloat; see Section 3.4.5 | | |||
| 6..20 | (Unassigned) | (Unassigned) | | | | | | | |||
| | | | | | 21 | multiple | Expected conversion to base64url encoding; | | |||
| 21 | multiple | Expected conversion to base64url | | | | | see Section 3.4.6.2 | | |||
| | | encoding; see Section 3.4.6.2 | | | | | | | |||
| | | | | | 22 | multiple | Expected conversion to base64 encoding; see | | |||
| 22 | multiple | Expected conversion to base64 | | | | | Section 3.4.6.2 | | |||
| | | encoding; see Section 3.4.6.2 | | | | | | | |||
| | | | | | 23 | multiple | Expected conversion to base16 encoding; see | | |||
| 23 | multiple | Expected conversion to base16 | | | | | Section 3.4.6.2 | | |||
| | | encoding; see Section 3.4.6.2 | | | | | | | |||
| | | | | | 24 | byte | Encoded CBOR data item; see Section 3.4.6.1 | | |||
| 24 | byte string | Encoded CBOR data item; see | | | | string | | | |||
| | | Section 3.4.6.1 | | | | | | | |||
| | | | | | 32 | UTF-8 | URI; see Section 3.4.6.3 | | |||
| 25..31 | (Unassigned) | (Unassigned) | | | | string | | | |||
| | | | | | | | | | |||
| 32 | UTF-8 string | URI; see Section 3.4.6.3 | | | 33 | UTF-8 | base64url; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 33 | UTF-8 string | base64url; see Section 3.4.6.3 | | | | | | | |||
| | | | | | 34 | UTF-8 | base64; see Section 3.4.6.3 | | |||
| 34 | UTF-8 string | base64; see Section 3.4.6.3 | | | | string | | | |||
| | | | | | | | | | |||
| 35 | UTF-8 string | Regular expression; see | | | 35 | UTF-8 | Regular expression; see Section 3.4.6.3 | | |||
| | | Section 3.4.6.3 | | | | string | | | |||
| | | | | | | | | | |||
| 36 | UTF-8 string | MIME message; see Section 3.4.6.3 | | | 36 | UTF-8 | MIME message; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 37..55798 | (Unassigned) | (Unassigned) | | | | | | | |||
| | | | | | 55799 | multiple | Self-described CBOR; see Section 3.4.7 | | |||
| 55799 | multiple | Self-described CBOR; see Section 3.4.7 | | +-------+-----------+-----------------------------------------------+ | |||
| | | | | ||||
| 55800+ | (Unassigned) | (Unassigned) | | ||||
+-----------+--------------+----------------------------------------+ | ||||
Table 3: Values for Tags | Table 3: Values for Tags | |||
3.4.1. Date and Time | 3.4.1. Date and Time | |||
Protocols using tag values 0 and 1 extend the generic data model | Protocols using tag values 0 and 1 extend the generic data model | |||
(Section 2) with data items representing points in time. | (Section 2) with data items representing points in time. | |||
3.4.2. Standard Date/Time String | 3.4.2. Standard Date/Time String | |||
Tag value 0 is for date/time strings that follow the standard format | Tag value 0 contains a text string in the standard format described | |||
described in [RFC3339], as refined by Section 3.3 of [RFC4287]. | by the "date-time" production in [RFC3339], as refined by Section 3.3 | |||
of [RFC4287], representing the point in time described there. A | ||||
nested item of another type or that doesn't match the [RFC4287] | ||||
format is invalid. | ||||
3.4.3. Epoch-based Date/Time | 3.4.3. Epoch-based Date/Time | |||
Tag value 1 is for numerical representation of civil time expressed | Tag value 1 contains a numerical value counting the number of seconds | |||
in seconds relative to 1970-01-01T00:00Z (in UTC time). | from 1970-01-01T00:00Z in UTC time to the represented point in civil | |||
time. | ||||
The tagged item MUST be an unsigned or negative integer (major types | The tagged item MUST be an unsigned or negative integer (major types | |||
0 and 1), or a floating-point number (major type 7 with additional | 0 and 1), or a floating-point number (major type 7 with additional | |||
information 25, 26, or 27). | information 25, 26, or 27). Other contained types are invalid. | |||
Non-negative values (major type 0 and non-negative floating-point | Non-negative values (major type 0 and non-negative floating-point | |||
numbers) stand for time values on or after 1970-01-01T00:00Z UTC and | numbers) stand for time values on or after 1970-01-01T00:00Z UTC and | |||
are interpreted according to POSIX [TIME_T]. (POSIX time is also | are interpreted according to POSIX [TIME_T]. (POSIX time is also | |||
known as UNIX Epoch time. Note that leap seconds are handled | known as UNIX Epoch time. Note that leap seconds are handled | |||
specially by POSIX time and this results in a 1 second discontinuity | specially by POSIX time and this results in a 1 second discontinuity | |||
several times per decade.) Note that applications that require the | several times per decade.) Note that applications that require the | |||
expression of times beyond early 2106 cannot leave out support of | expression of times beyond early 2106 cannot leave out support of | |||
64-bit integers for the tagged value. | 64-bit integers for the tagged value. | |||
skipping to change at page 19, line 42 ¶ | skipping to change at page 20, line 42 ¶ | |||
3.4.4. Bignums | 3.4.4. Bignums | |||
Protocols using tag values 2 and 3 extend the generic data model | Protocols using tag values 2 and 3 extend the generic data model | |||
(Section 2) with "bignums" representing arbitrarily sized integers. | (Section 2) with "bignums" representing arbitrarily sized integers. | |||
In the generic data model, bignum values are not equal to integers | In the generic data model, bignum values are not equal to integers | |||
from the basic data model, but specific data models can define that | from the basic data model, but specific data models can define that | |||
equivalence, and preferred encoding never makes use of bignums that | equivalence, and preferred encoding never makes use of bignums that | |||
also can be expressed as basic integers (see below). | also can be expressed as basic integers (see below). | |||
Bignums are encoded as a byte string data item, which is interpreted | Bignums are encoded as a byte string data item, which is interpreted | |||
as an unsigned integer n in network byte order. For tag value 2, the | as an unsigned integer n in network byte order. Contained items of | |||
value of the bignum is n. For tag value 3, the value of the bignum | other types are invalid. For tag value 2, the value of the bignum is | |||
is -1 - n. The preferred encoding of the byte string is to leave out | n. For tag value 3, the value of the bignum is -1 - n. The | |||
any leading zeroes (note that this means the preferred encoding for | preferred encoding of the byte string is to leave out any leading | |||
n = 0 is the empty byte string, but see below). Decoders that | zeroes (note that this means the preferred encoding for n = 0 is the | |||
understand these tags MUST be able to decode bignums that do have | empty byte string, but see below). Decoders that understand these | |||
leading zeroes. The preferred encoding of an integer that can be | tags MUST be able to decode bignums that do have leading zeroes. The | |||
represented using major type 0 or 1 is to encode it this way instead | preferred encoding of an integer that can be represented using major | |||
of as a bignum (which means that the empty string never occurs in a | type 0 or 1 is to encode it this way instead of as a bignum (which | |||
bignum when using preferred encoding). Note that this means the non- | means that the empty string never occurs in a bignum when using | |||
preferred choice of a bignum representation instead of a basic | preferred encoding). Note that this means the non-preferred choice | |||
integer for encoding a number is not intended to have application | of a bignum representation instead of a basic integer for encoding a | |||
semantics (just as the choice of a longer basic integer | number is not intended to have application semantics (just as the | |||
representation than needed, such as 0x1800 for 0x00 does not). | choice of a longer basic integer representation than needed, such as | |||
0x1800 for 0x00 does not). | ||||
For example, the number 18446744073709551616 (2**64) is represented | For example, the number 18446744073709551616 (2**64) is represented | |||
as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major | as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major | |||
type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 | type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 | |||
and eight bytes 0x00). In hexadecimal: | and eight bytes 0x00). In hexadecimal: | |||
C2 -- Tag 2 | C2 -- Tag 2 | |||
49 -- Byte string of length 9 | 49 -- Byte string of length 9 | |||
010000000000000000 -- Bytes content | 010000000000000000 -- Bytes content | |||
skipping to change at page 20, line 46 ¶ | skipping to change at page 21, line 47 ¶ | |||
applications that need some basic binary floating-point capability | applications that need some basic binary floating-point capability | |||
without the need for supporting IEEE 754. | without the need for supporting IEEE 754. | |||
A decimal fraction or a bigfloat is represented as a tagged array | A decimal fraction or a bigfloat is represented as a tagged array | |||
that contains exactly two integer numbers: an exponent e and a | that contains exactly two integer numbers: an exponent e and a | |||
mantissa m. Decimal fractions (tag 4) use base-10 exponents; the | mantissa m. Decimal fractions (tag 4) use base-10 exponents; the | |||
value of a decimal fraction data item is m*(10**e). Bigfloats (tag | value of a decimal fraction data item is m*(10**e). Bigfloats (tag | |||
5) use base-2 exponents; the value of a bigfloat data item is | 5) use base-2 exponents; the value of a bigfloat data item is | |||
m*(2**e). The exponent e MUST be represented in an integer of major | m*(2**e). The exponent e MUST be represented in an integer of major | |||
type 0 or 1, while the mantissa also can be a bignum (Section 3.4.4). | type 0 or 1, while the mantissa also can be a bignum (Section 3.4.4). | |||
Contained items with other structures are invalid. | ||||
An example of a decimal fraction is that the number 273.15 could be | An example of a decimal fraction is that the number 273.15 could be | |||
represented as 0b110_00100 (major type of 6 for the tag, additional | represented as 0b110_00100 (major type of 6 for the tag, additional | |||
information of 4 for the type of tag), followed by 0b100_00010 (major | information of 4 for the type of tag), followed by 0b100_00010 (major | |||
type of 4 for the array, additional information of 2 for the length | type of 4 for the array, additional information of 2 for the length | |||
of the array), followed by 0b001_00001 (major type of 1 for the first | of the array), followed by 0b001_00001 (major type of 1 for the first | |||
integer, additional information of 1 for the value of -2), followed | integer, additional information of 1 for the value of -2), followed | |||
by 0b000_11001 (major type of 0 for the second integer, additional | by 0b000_11001 (major type of 0 for the second integer, additional | |||
information of 25 for a two-byte value), followed by | information of 25 for a two-byte value), followed by | |||
0b0110101010110011 (27315 in two bytes). In hexadecimal: | 0b0110101010110011 (27315 in two bytes). In hexadecimal: | |||
skipping to change at page 22, line 5 ¶ | skipping to change at page 23, line 5 ¶ | |||
generic CBOR processors. These content hints do not extend the | generic CBOR processors. These content hints do not extend the | |||
generic data model. | generic data model. | |||
3.4.6.1. Encoded CBOR Data Item | 3.4.6.1. Encoded CBOR Data Item | |||
Sometimes it is beneficial to carry an embedded CBOR data item that | Sometimes it is beneficial to carry an embedded CBOR data item that | |||
is not meant to be decoded immediately at the time the enclosing data | is not meant to be decoded immediately at the time the enclosing data | |||
item is being decoded. Tag 24 (CBOR data item) can be used to tag | item is being decoded. Tag 24 (CBOR data item) can be used to tag | |||
the embedded byte string as a data item encoded in CBOR format. | the embedded byte string as a data item encoded in CBOR format. | |||
Contained items that aren't byte strings are invalid. Any contained | ||||
byte string is valid, even if it encodes an invalid or ill-formed | ||||
CBOR item. | ||||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | |||
Tags 21 to 23 indicate that a byte string might require a specific | Tags 21 to 23 indicate that a byte string might require a specific | |||
encoding when interoperating with a text-based representation. These | encoding when interoperating with a text-based representation. These | |||
tags are useful when an encoder knows that the byte string data it is | tags are useful when an encoder knows that the byte string data it is | |||
writing is likely to be later converted to a particular JSON-based | writing is likely to be later converted to a particular JSON-based | |||
usage. That usage specifies that some strings are encoded as base64, | usage. That usage specifies that some strings are encoded as base64, | |||
base64url, and so on. The encoder uses byte strings instead of doing | base64url, and so on. The encoder uses byte strings instead of doing | |||
the encoding itself to reduce the message size, to reduce the code | the encoding itself to reduce the message size, to reduce the code | |||
size of the encoder, or both. The encoder does not know whether or | size of the encoder, or both. The encoder does not know whether or | |||
skipping to change at page 22, line 39 ¶ | skipping to change at page 23, line 43 ¶ | |||
Section 3.5 of RFC 4648), and encoding is performed without the | Section 3.5 of RFC 4648), and encoding is performed without the | |||
inclusion of any line breaks, whitespace, or other additional | inclusion of any line breaks, whitespace, or other additional | |||
characters. Note that, for all three tags, the encoding of the empty | characters. Note that, for all three tags, the encoding of the empty | |||
byte string is the empty text string. | byte string is the empty text string. | |||
3.4.6.3. Encoded Text | 3.4.6.3. Encoded Text | |||
Some text strings hold data that have formats widely used on the | Some text strings hold data that have formats widely used on the | |||
Internet, and sometimes those formats can be validated and presented | Internet, and sometimes those formats can be validated and presented | |||
to the application in appropriate form by the decoder. There are | to the application in appropriate form by the decoder. There are | |||
tags for some of these formats. | tags for some of these formats. As with tags 21 to 23, if these tags | |||
are applied to an item other than a text string, they apply to all | ||||
text string data items it contains. | ||||
o Tag 32 is for URIs, as defined in [RFC3986]; | o Tag 32 is for URIs, as defined in [RFC3986]. If the text string | |||
doesn't match the "URI-reference" production, the string is | ||||
invalid. | ||||
o Tags 33 and 34 are for base64url- and base64-encoded text strings, | o Tags 33 and 34 are for base64url- and base64-encoded text strings, | |||
as defined in [RFC4648]; | as defined in [RFC4648]. If any of: | |||
* the encoded text string contains non-alphabet characters or | ||||
only 1 character in the last block of 4, or | ||||
* the padding bits in a 2- or 3-character block are not 0, or | ||||
* the base64 encoding has the wrong number of padding characters, | ||||
or | ||||
* the base64url encoding has padding characters, | ||||
the string is invalid. | ||||
o Tag 35 is for regular expressions that are roughly in Perl | o Tag 35 is for regular expressions that are roughly in Perl | |||
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | |||
version of the JavaScript regular expression syntax [ECMA262]. | version of the JavaScript regular expression syntax [ECMA262]. | |||
(Note that more specific identification may be necessary if the | (Note that more specific identification may be necessary if the | |||
actual version of the specification underlying the regular | actual version of the specification underlying the regular | |||
expression, or more than just the text of the regular expression | expression, or more than just the text of the regular expression | |||
itself, need to be conveyed.) | itself, need to be conveyed.) Any contained string value is | |||
valid. | ||||
o Tag 36 is for MIME messages (including all headers), as defined in | o Tag 36 is for MIME messages (including all headers), as defined in | |||
[RFC2045]; | [RFC2045]. A text string that isn't a valid MIME message is | |||
invalid. | ||||
Note that tags 33 and 34 differ from 21 and 22 in that the data is | Note that tags 33 and 34 differ from 21 and 22 in that the data is | |||
transported in base-encoded form for the former and in raw byte | transported in base-encoded form for the former and in raw byte | |||
string form for the latter. | string form for the latter. | |||
3.4.7. Self-Described CBOR | 3.4.7. Self-Described CBOR | |||
In many applications, it will be clear from the context that CBOR is | In many applications, it will be clear from the context that CBOR is | |||
being employed for encoding a data item. For instance, a specific | being employed for encoding a data item. For instance, a specific | |||
protocol might specify the use of CBOR, or a media type is indicated | protocol might specify the use of CBOR, or a media type is indicated | |||
that specifies its use. However, there may be applications where | that specifies its use. However, there may be applications where | |||
such context information is not available, such as when CBOR data is | such context information is not available, such as when CBOR data is | |||
stored in a file and disambiguating metadata is not in use. Here, it | stored in a file that does not have disambiguating metadata. Here, | |||
may help to have some distinguishing characteristics for the data | it may help to have some distinguishing characteristics for the data | |||
itself. | itself. | |||
Tag 55799 is defined for this purpose. It does not impart any | Tag 55799 is defined for this purpose. It does not impart any | |||
special semantics on the data item that follows; that is, the | special semantics on the data item that follows; that is, the | |||
semantics of a data item tagged with tag 55799 is exactly identical | semantics of a data item tagged with tag 55799 is exactly identical | |||
to the semantics of the data item itself. | to the semantics of the data item itself. | |||
The serialization of this tag is 0xd9d9f7, which appears not to be in | The serialization of this tag is 0xd9d9f7, which does not appear to | |||
use as a distinguishing mark for frequently used file types. In | be in use as a distinguishing mark for any frequently used file | |||
particular, it is not a valid start of a Unicode text in any Unicode | types. In particular, 0xd9d9f7 is not a valid start of a Unicode | |||
encoding if followed by a valid CBOR data item. | text in any Unicode encoding if it is followed by a valid CBOR data | |||
item. | ||||
For instance, a decoder might be able to decode both CBOR and JSON. | For instance, a decoder might be able to decode both CBOR and JSON. | |||
Such a decoder would need to mechanically distinguish the two | Such a decoder would need to mechanically distinguish the two | |||
formats. An easy way for an encoder to help the decoder would be to | formats. An easy way for an encoder to help the decoder would be to | |||
tag the entire CBOR item with tag 55799, the serialization of which | tag the entire CBOR item with tag 55799, the serialization of which | |||
will never be found at the beginning of a JSON text. | will never be found at the beginning of a JSON text. | |||
4. Creating CBOR-Based Protocols | 4. Serialization Considerations | |||
4.1. Preferred Serialization | ||||
For some values at the data model level, CBOR provides multiple | ||||
serializations. For many applications, it is desirable that an | ||||
encoder always chooses a preferred serialization; however, the | ||||
present specification does not put the burden of enforcing this | ||||
preference on either encoder or decoder. | ||||
Some constrained decoders may be limited in their ability to decode | ||||
non-preferred serializations: For example, if only integers below | ||||
1_000_000_000 are expected in an application, the decoder may leave | ||||
out the code that would be needed to decode 64-bit arguments in | ||||
integers. An encoder that always uses preferred serialization | ||||
("preferred encoder") interoperates with this decoder for the numbers | ||||
that can occur in this application. More generally speaking, it | ||||
therefore can be said that a preferred encoder is more universally | ||||
interoperable (and also less wasteful) than one that, say, always | ||||
uses 64-bit integers. | ||||
Similarly, a constrained encoder may be limited in the variety of | ||||
representation variants it supports in such a way that it does not | ||||
emit preferred serializations ("variant encoder"): Say, it could be | ||||
designed to always use the 32-bit variant for an integer that it | ||||
encodes even if a short representation is available (again, assuming | ||||
that there is no application need for integers that can only be | ||||
represented with the 64-bit variant). A decoder that does not rely | ||||
on only ever receiving preferred serializations ("variation-tolerant | ||||
decoder") can there be said to be more universally interoperable (it | ||||
might very well optimize for the case of receiving preferred | ||||
serializations, though). Full implementations of CBOR decoders are | ||||
by definition variation-tolerant; the distinction is only relevant if | ||||
a constrained implementation of a CBOR decoder meets a variant | ||||
encoder. | ||||
The preferred serialization always uses the shortest form of | ||||
representing the argument (Section 3)); it also uses the shortest | ||||
floating point encoding that preserves the value being encoded (see | ||||
Section 5.5). Definite length encoding is preferred whenever the | ||||
length is known at the time the serialization of the item starts. | ||||
4.2. Deterministically Encoded CBOR | ||||
Some protocols may want encoders to only emit CBOR in a particular | ||||
deterministic format; those protocols might also have the decoders | ||||
check that their input is in that deterministic format. Those | ||||
protocols are free to define what they mean by a "deterministic | ||||
format" and what encoders and decoders are expected to do. This | ||||
section defines a set of restrictions that can serve as the base of | ||||
such a deterministic format. | ||||
4.2.1. Core Deterministic Encoding Requirements | ||||
A CBOR encoding satisfies the "core deterministic encoding | ||||
requirements" if it satisfies the following restrictions: | ||||
o Arguments (see Section 3) for integers, lengths in major types 2 | ||||
through 5, and tags MUST be as short as possible. In particular: | ||||
* 0 to 23 and -1 to -24 MUST be expressed in the same byte as the | ||||
major type; | ||||
* 24 to 255 and -25 to -256 MUST be expressed only with an | ||||
additional uint8_t; | ||||
* 256 to 65535 and -257 to -65536 MUST be expressed only with an | ||||
additional uint16_t; | ||||
* 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed | ||||
only with an additional uint32_t. | ||||
o The keys in every map MUST be sorted in the bytewise lexicographic | ||||
order of their deterministic encodings. For example, the | ||||
following keys are sorted correctly: | ||||
1. 10, encoded as 0x0a. | ||||
2. 100, encoded as 0x1864. | ||||
3. -1, encoded as 0x20. | ||||
4. "z", encoded as 0x617a. | ||||
5. "aa", encoded as 0x626161. | ||||
6. [100], encoded as 0x811864. | ||||
7. [-1], encoded as 0x8120. | ||||
8. false, encoded as 0xf4. | ||||
o Indefinite-length items MUST NOT appear. They can be encoded as | ||||
definite-length items instead. | ||||
4.2.2. Additional Deterministic Encoding Considerations | ||||
If a protocol allows for IEEE floats, then additional deterministic | ||||
encoding rules might need to be added. One example rule might be to | ||||
have all floats start as a 64-bit float, then do a test conversion to | ||||
a 32-bit float; if the result is the same numeric value, use the | ||||
shorter value and repeat the process with a test conversion to a | ||||
16-bit float. (This rule selects 16-bit float for positive and | ||||
negative Infinity as well.) Also, there are many representations for | ||||
NaN. If NaN is an allowed value, it must always be represented as | ||||
0xf97e00. | ||||
CBOR tags present additional considerations for deterministic | ||||
encoding. The absence or presence of tags in a deterministic format | ||||
is determined by the optionality of the tags in the protocol. In a | ||||
CBOR-based protocol that allows optional tagging anywhere, the | ||||
deterministic format must not allow them. In a protocol that | ||||
requires tags in certain places, the tag needs to appear in the | ||||
deterministic format. A CBOR-based protocol that uses deterministic | ||||
encoding might instead say that all tags that appear in a message | ||||
must be retained regardless of whether they are optional. | ||||
Protocols that include floating, big integer, or other complex values | ||||
need to define extra requirements on their deterministic encodings. | ||||
For example: | ||||
o If a protocol includes a field that can express floating values | ||||
(Section 3.3), the protocol's deterministic encoding needs to | ||||
specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, | ||||
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | ||||
this are: | ||||
1. Encode integral values that fit in 64 bits as values from | ||||
major types 0 and 1, and other values as the smallest of 16-, | ||||
32-, or 64-bit floating point that accurately represents the | ||||
value, | ||||
2. Encode all values as the smallest of 16-, 32-, or 64-bit | ||||
floating point that accurately represents the value, even for | ||||
integral values, or | ||||
3. Encode all values as 64-bit floating point. | ||||
If NaN is an allowed value, the protocol needs to pick a single | ||||
representation, for example 0xf97e00. | ||||
o If a protocol includes a field that can express integers larger | ||||
than 2^64 using tag 2 (Section 3.4.4), the protocol's | ||||
deterministic encoding needs to specify whether small integers are | ||||
expressed using the tag or major types 0 and 1. | ||||
o A protocol might give encoders the choice of representing a URL as | ||||
either a text string or, using Section 3.4.6.3, tag 32 containing | ||||
a text string. This protocol's deterministic encoding needs to | ||||
either require that the tag is present or require that it's | ||||
absent, not allow either one. | ||||
4.2.3. Length-first map key ordering | ||||
The core deterministic encoding requirements sort map keys in a | ||||
different order from the one suggested by Section 3.9 of [RFC7049] | ||||
(called "Canonical CBOR" there). Protocols that need to be | ||||
compatible with [RFC7049]'s order can instead be specified in terms | ||||
of this specification's "length-first core deterministic encoding | ||||
requirements": | ||||
A CBOR encoding satisfies the "length-first core deterministic | ||||
encoding requirements" if it satisfies the core deterministic | ||||
encoding requirements except that the keys in every map MUST be | ||||
sorted such that: | ||||
1. If two keys have different lengths, the shorter one sorts | ||||
earlier; | ||||
2. If two keys have the same length, the one with the lower value in | ||||
(byte-wise) lexical order sorts earlier. | ||||
For example, under the length-first core deterministic encoding | ||||
requirements, the following keys are sorted correctly: | ||||
1. 10, encoded as 0x0a. | ||||
2. -1, encoded as 0x20. | ||||
3. false, encoded as 0xf4. | ||||
4. 100, encoded as 0x1864. | ||||
5. "z", encoded as 0x617a. | ||||
6. [-1], encoded as 0x8120. | ||||
7. "aa", encoded as 0x626161. | ||||
8. [100], encoded as 0x811864. | ||||
(Although [RFC7049] used the term "Canonical CBOR" for its form of | ||||
requirements on deterministic encoding, this document avoids this | ||||
term because "canonicalization" is often associated with specific | ||||
uses of deterministic encoding only. The terms are essentially | ||||
exchangeable, however, and the set of core requirements in this | ||||
document could also be called "Canonical CBOR", while the length- | ||||
first-ordered version of that could be called "Old Canonical CBOR".) | ||||
5. Creating CBOR-Based Protocols | ||||
Data formats such as CBOR are often used in environments where there | Data formats such as CBOR are often used in environments where there | |||
is no format negotiation. A specific design goal of CBOR is to not | is no format negotiation. A specific design goal of CBOR is to not | |||
need any included or assumed schema: a decoder can take a CBOR item | need any included or assumed schema: a decoder can take a CBOR item | |||
and decode it with no other knowledge. | and decode it with no other knowledge. | |||
Of course, in real-world implementations, the encoder and the decoder | Of course, in real-world implementations, the encoder and the decoder | |||
will have a shared view of what should be in a CBOR data item. For | will have a shared view of what should be in a CBOR data item. For | |||
example, an agreed-to format might be "the item is an array whose | example, an agreed-to format might be "the item is an array whose | |||
first value is a UTF-8 string, second value is an integer, and | first value is a UTF-8 string, second value is an integer, and | |||
subsequent values are zero or more floating-point numbers" or "the | subsequent values are zero or more floating-point numbers" or "the | |||
item is a map that has byte strings for keys and contains at least | item is a map that has byte strings for keys and contains at least | |||
one pair whose key is 0xab01". | one pair whose key is 0xab01". | |||
This specification puts no restrictions on CBOR-based protocols. An | CBOR-based protocols MUST specify how their decoders handle invalid | |||
encoder can be capable of encoding as many or as few types of values | and other unexpected data. CBOR-based protocols MAY specify that | |||
as is required by the protocol in which it is used; a decoder can be | they treat arbitrary valid data as unexpected. Encoders for CBOR- | |||
capable of understanding as many or as few types of values as is | based protocols MUST produce only valid items, that is, the protocol | |||
required by the protocols in which it is used. This lack of | cannot be designed to make use of invalid items. An encoder can be | |||
restrictions allows CBOR to be used in extremely constrained | capable of encoding as many or as few types of values as is required | |||
environments. | by the protocol in which it is used; a decoder can be capable of | |||
understanding as many or as few types of values as is required by the | ||||
protocols in which it is used. This lack of restrictions allows CBOR | ||||
to be used in extremely constrained environments. | ||||
This section discusses some considerations in creating CBOR-based | This section discusses some considerations in creating CBOR-based | |||
protocols. It is advisory only and explicitly excludes any language | protocols. With few exceptions, it is advisory only and explicitly | |||
from RFC 2119 other than words that could be interpreted as "MAY" in | excludes any language from BCP 14 other than words that could be | |||
the sense of RFC 2119. | interpreted as "MAY" in the sense of BCP 14. The exceptions aim at | |||
facilitating interoperability of CBOR-based protocols while making | ||||
use of a wide variety of both generic and application-specific | ||||
encoders and decoders. | ||||
4.1. CBOR in Streaming Applications | 5.1. CBOR in Streaming Applications | |||
In a streaming application, a data stream may be composed of a | In a streaming application, a data stream may be composed of a | |||
sequence of CBOR data items concatenated back-to-back. In such an | sequence of CBOR data items concatenated back-to-back. In such an | |||
environment, the decoder immediately begins decoding a new data item | environment, the decoder immediately begins decoding a new data item | |||
if data is found after the end of a previous data item. | if data is found after the end of a previous data item. | |||
Not all of the bytes making up a data item may be immediately | Not all of the bytes making up a data item may be immediately | |||
available to the decoder; some decoders will buffer additional data | available to the decoder; some decoders will buffer additional data | |||
until a complete data item can be presented to the application. | until a complete data item can be presented to the application. | |||
Other decoders can present partial information about a top-level data | Other decoders can present partial information about a top-level data | |||
skipping to change at page 24, line 40 ¶ | skipping to change at page 30, line 27 ¶ | |||
already be decoded, or even parts of a byte string that hasn't | already be decoded, or even parts of a byte string that hasn't | |||
completely arrived yet. | completely arrived yet. | |||
Note that some applications and protocols will not want to use | Note that some applications and protocols will not want to use | |||
indefinite-length encoding. Using indefinite-length encoding allows | indefinite-length encoding. Using indefinite-length encoding allows | |||
an encoder to not need to marshal all the data for counting, but it | an encoder to not need to marshal all the data for counting, but it | |||
requires a decoder to allocate increasing amounts of memory while | requires a decoder to allocate increasing amounts of memory while | |||
waiting for the end of the item. This might be fine for some | waiting for the end of the item. This might be fine for some | |||
applications but not others. | applications but not others. | |||
4.2. Generic Encoders and Decoders | 5.2. Generic Encoders and Decoders | |||
A generic CBOR decoder can decode all well-formed CBOR data and | A generic CBOR decoder can decode all well-formed CBOR data and | |||
present them to an application. CBOR data is well-formed if it uses | present them to an application. See Appendix C. | |||
the initial bytes, as well as the byte strings and/or data items that | ||||
are implied by their values, in the manner defined by CBOR, and no | ||||
extraneous data follows (Appendix C). | ||||
Even though CBOR attempts to minimize these cases, not all well- | Even though CBOR attempts to minimize these cases, not all well- | |||
formed CBOR data is valid: for example, the format excludes simple | formed CBOR data is valid: for example, the encoded text string | |||
values below 32 that are encoded with an extension byte. Also, | "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR | |||
specific tags may make semantic constraints that may be violated, | item. Also, specific tags may make semantic constraints that may be | |||
such as by including a tag in a bignum tag or by following a byte | violated, such as a bignum tag containing another tag, or an instance | |||
string within a date tag. Finally, the data may be invalid, such as | of tag 0 containing a byte string or a text string with contents that | |||
invalid UTF-8 strings or date strings that do not conform to | do not match [RFC3339]'s "date-time" production. There is no | |||
[RFC3339]. There is no requirement that generic encoders and | requirement that generic encoders and decoders make unnatural choices | |||
decoders make unnatural choices for their application interface to | for their application interface to enable the processing of invalid | |||
enable the processing of invalid data. Generic encoders and decoders | data. Generic encoders and decoders are expected to forward simple | |||
are expected to forward simple values and tags even if their specific | values and tags even if their specific codepoints are not registered | |||
codepoints are not registered at the time the encoder/decoder is | at the time the encoder/decoder is written (Section 5.4). | |||
written (Section 4.5). | ||||
Generic decoders provide ways to present well-formed CBOR values, | Generic decoders provide ways to present well-formed CBOR values, | |||
both valid and invalid, to an application. The diagnostic notation | both valid and invalid, to an application. The diagnostic notation | |||
(Section 7) may be used to present well-formed CBOR values to humans. | (Section 8) may be used to present well-formed CBOR values to humans. | |||
Generic encoders provide an application interface that allows the | Generic encoders provide an application interface that allows the | |||
application to specify any well-formed value, including simple values | application to specify any well-formed value, including simple values | |||
and tags unknown to the encoder. | and tags unknown to the encoder. | |||
4.3. Syntax Errors | 5.3. Invalid Items | |||
A decoder encountering a CBOR data item that is not well-formed | ||||
generally can choose to completely fail the decoding (issue an error | ||||
and/or stop processing altogether), substitute the problematic data | ||||
and data items using a decoder-specific convention that clearly | ||||
indicates there has been a problem, or take some other action. | ||||
4.3.1. Incomplete CBOR Data Items | ||||
The representation of a CBOR data item has a specific length, | ||||
determined by its initial bytes and by the structure of any data | ||||
items enclosed in the data items. If less data is available, this | ||||
can be treated as a syntax error. A decoder may also decode | ||||
incrementally, that is, decode the data item as far as it is | ||||
available and present the data found so far (such as in an event- | ||||
based interface), with the option of continuing the decoding once | ||||
further data is available. | ||||
Examples of incomplete data items include: | ||||
o A decoder expects a certain number of array or map entries but | ||||
instead encounters the end of the data. | ||||
o A decoder processes what it expects to be the last pair in a map | ||||
and comes to the end of the data. | ||||
o A decoder has just seen a tag and then encounters the end of the | ||||
data. | ||||
o A decoder has seen the beginning of an indefinite-length item but | ||||
encounters the end of the data before it sees the "break" stop | ||||
code. | ||||
4.3.2. Malformed Indefinite-Length Items | ||||
Examples of malformed indefinite-length data items include: | ||||
o Within an indefinite-length byte string or text, a decoder finds | ||||
an item that is not of the appropriate major type before it finds | ||||
the "break" stop code. | ||||
o Within an indefinite-length map, a decoder encounters the "break" | ||||
stop code immediately after reading a key (the value is missing). | ||||
Another error is finding a "break" stop code at a point in the data | ||||
where there is no immediately enclosing (unclosed) indefinite-length | ||||
item. | ||||
4.3.3. Unknown Additional Information Values | A well-formed but invalid CBOR data item presents a problem with | |||
interpreting the data encoded in it in the CBOR data model. A CBOR- | ||||
based protocol could be specified in several layers, in which the | ||||
lower layers don't process the semantics of some of the CBOR data | ||||
they forward. These layers can't notice the invalidity in data they | ||||
don't process and MUST forward that data as-is. The first layer that | ||||
does process the semantics of an invalid CBOR item MUST take one of | ||||
two choices: | ||||
At the time of writing, some additional information values are | 1. Replace the problematic item with an error marker and continue | |||
unassigned and reserved for future versions of this document (see | with the next item, or | |||
Section 6.2). Since the overall syntax for these additional | ||||
information values is not yet defined, a decoder that sees an | ||||
additional information value that it does not understand cannot | ||||
continue decoding. | ||||
4.4. Other Decoding Errors | 2. Issue an error and stop processing altogether. | |||
A CBOR data item may be syntactically well-formed but present a | A CBOR-based protocol MUST specify which of these options its | |||
problem with interpreting the data encoded in it in the CBOR data | decoders take, for each kind of invalid item they might encounter. | |||
model. Generally speaking, a decoder that finds a data item with | ||||
such a problem might issue a warning, might stop processing | ||||
altogether, might handle the error and make the problematic value | ||||
available to the application as such, or take some other type of | ||||
action. | ||||
Such problems might include: | Such problems might include: | |||
Duplicate keys in a map: Generic decoders (Section 4.2) make data | Duplicate keys in a map: Generic decoders (Section 5.2) make data | |||
available to applications using the native CBOR data model. That | available to applications using the native CBOR data model. That | |||
data model includes maps (key-value mappings with unique keys), | data model includes maps (key-value mappings with unique keys), | |||
not multimaps (key-value mappings where multiple entries can have | not multimaps (key-value mappings where multiple entries can have | |||
the same key). Thus, a generic decoder that gets a CBOR map item | the same key). Thus, a generic decoder that gets a CBOR map item | |||
that has duplicate keys will decode to a map with only one | that has duplicate keys will decode to a map with only one | |||
instance of that key, or it might stop processing altogether. On | instance of that key, or it might stop processing altogether. On | |||
the other hand, a "streaming decoder" may not even be able to | the other hand, a "streaming decoder" may not even be able to | |||
notice (Section 4.7). | notice (Section 5.6). | |||
Inadmissible type on the value following a tag: Tags (Section 3.4) | Inadmissible type on the value following a tag: Tags (Section 3.4) | |||
specify what type of data item is supposed to follow the tag; for | specify what type of data item is supposed to follow the tag; for | |||
example, the tags for positive or negative bignums are supposed to | example, the tags for positive or negative bignums are supposed to | |||
be put on byte strings. A decoder that decodes the tagged data | be put on byte strings. A decoder that decodes the tagged data | |||
item into a native representation (a native big integer in this | item into a native representation (a native big integer in this | |||
example) is expected to check the type of the data item being | example) is expected to check the type of the data item being | |||
tagged. Even decoders that don't have such native representations | tagged. Even decoders that don't have such native representations | |||
available in their environment may perform the check on those tags | available in their environment may perform the check on those tags | |||
known to them and react appropriately. | known to them and react appropriately. | |||
Invalid UTF-8 string: A decoder might or might not want to verify | Invalid UTF-8 string: A decoder might or might not want to verify | |||
that the sequence of bytes in a UTF-8 string (major type 3) is | that the sequence of bytes in a UTF-8 string (major type 3) is | |||
actually valid UTF-8 and react appropriately. | actually valid UTF-8 and react appropriately. | |||
4.5. Handling Unknown Simple Values and Tags | 5.4. Handling Unknown Simple Values and Tags | |||
A decoder that comes across a simple value (Section 3.3) that it does | A decoder that comes across a simple value (Section 3.3) that it does | |||
not recognize, such as a value that was added to the IANA registry | not recognize, such as a value that was added to the IANA registry | |||
after the decoder was deployed or a value that the decoder chose not | after the decoder was deployed or a value that the decoder chose not | |||
to implement, might issue a warning, might stop processing | to implement, might issue a warning, might stop processing | |||
altogether, might handle the error by making the unknown value | altogether, might handle the error by making the unknown value | |||
available to the application as such (as is expected of generic | available to the application as such (as is expected of generic | |||
decoders), or take some other type of action. | decoders), or take some other type of action. | |||
A decoder that comes across a tag (Section 3.4) that it does not | A decoder that comes across a tag (Section 3.4) that it does not | |||
recognize, such as a tag that was added to the IANA registry after | recognize, such as a tag that was added to the IANA registry after | |||
the decoder was deployed or a tag that the decoder chose not to | the decoder was deployed or a tag that the decoder chose not to | |||
implement, might issue a warning, might stop processing altogether, | implement, might issue a warning, might stop processing altogether, | |||
might handle the error and present the unknown tag value together | might handle the error and present the unknown tag value together | |||
with the contained data item to the application (as is expected of | with the contained data item to the application (as is expected of | |||
generic decoders), might ignore the tag and simply present the | generic decoders), might ignore the tag and simply present the | |||
contained data item only to the application, or take some other type | contained data item only to the application, or take some other type | |||
of action. | of action. | |||
4.6. Numbers | 5.5. Numbers | |||
An application or protocol that uses CBOR might restrict the | ||||
representations of numbers. For instance, a protocol that only deals | ||||
with integers might say that floating-point numbers may not be used | ||||
and that decoders of that protocol do not need to be able to handle | ||||
floating-point numbers. Similarly, a protocol or application that | ||||
uses CBOR might say that decoders need to be able to handle either | ||||
type of number. | ||||
CBOR-based protocols should take into account that different language | CBOR-based protocols should take into account that different language | |||
environments pose different restrictions on the range and precision | environments pose different restrictions on the range and precision | |||
of numbers that are representable. For example, the JavaScript | of numbers that are representable. For example, the JavaScript | |||
number system treats all numbers as floating point, which may result | number system treats all numbers as floating point, which may result | |||
in silent loss of precision in decoding integers with more than 53 | in silent loss of precision in decoding integers with more than 53 | |||
significant bits. A protocol that uses numbers should define its | significant bits. A protocol that uses numbers should define its | |||
expectations on the handling of non-trivial numbers in decoders and | expectations on the handling of non-trivial numbers in decoders and | |||
receiving applications. | receiving applications. | |||
skipping to change at page 28, line 39 ¶ | skipping to change at page 33, line 14 ¶ | |||
The preferred encoding for a floating point value is the shortest | The preferred encoding for a floating point value is the shortest | |||
floating point encoding that preserves its value, e.g., 0xf94580 for | floating point encoding that preserves its value, e.g., 0xf94580 for | |||
the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the | the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the | |||
CBOR-based protocol specifically excludes the use of the shorter | CBOR-based protocol specifically excludes the use of the shorter | |||
floating point encodings. For NaN values, a shorter encoding is | floating point encodings. For NaN values, a shorter encoding is | |||
preferred if zero-padding the shorter significand towards the right | preferred if zero-padding the shorter significand towards the right | |||
reconstitutes the original NaN value (for many applications, the | reconstitutes the original NaN value (for many applications, the | |||
single NaN encoding 0xf97e00 will suffice). | single NaN encoding 0xf97e00 will suffice). | |||
4.7. Specifying Keys for Maps | 5.6. Specifying Keys for Maps | |||
The encoding and decoding applications need to agree on what types of | The encoding and decoding applications need to agree on what types of | |||
keys are going to be used in maps. In applications that need to | keys are going to be used in maps. In applications that need to | |||
interwork with JSON-based applications, keys probably should be | interwork with JSON-based applications, keys probably should be | |||
limited to UTF-8 strings only; otherwise, there has to be a specified | limited to UTF-8 strings only; otherwise, there has to be a specified | |||
mapping from the other CBOR types to Unicode characters, and this | mapping from the other CBOR types to Unicode characters, and this | |||
often leads to implementation errors. In applications where keys are | often leads to implementation errors. In applications where keys are | |||
numeric in nature and numeric ordering of keys is important to the | numeric in nature and numeric ordering of keys is important to the | |||
application, directly using the numbers for the keys is useful. | application, directly using the numbers for the keys is useful. | |||
skipping to change at page 29, line 17 ¶ | skipping to change at page 33, line 41 ¶ | |||
values of which happen to be integer numbers in the same map. | values of which happen to be integer numbers in the same map. | |||
Decoders that deliver data items nested within a CBOR data item | Decoders that deliver data items nested within a CBOR data item | |||
immediately on decoding them ("streaming decoders") often do not keep | immediately on decoding them ("streaming decoders") often do not keep | |||
the state that is necessary to ascertain uniqueness of a key in a | the state that is necessary to ascertain uniqueness of a key in a | |||
map. Similarly, an encoder that can start encoding data items before | map. Similarly, an encoder that can start encoding data items before | |||
the enclosing data item is completely available ("streaming encoder") | the enclosing data item is completely available ("streaming encoder") | |||
may want to reduce its overhead significantly by relying on its data | may want to reduce its overhead significantly by relying on its data | |||
source to maintain uniqueness. | source to maintain uniqueness. | |||
A CBOR-based protocol should make an intentional decision about what | A CBOR-based protocol MUST define what to do when a receiving | |||
to do when a receiving application does see multiple identical keys | application does see multiple identical keys in a map. The resulting | |||
in a map. The resulting rule in the protocol should respect the CBOR | rule in the protocol MUST respect the CBOR data model: it cannot | |||
data model: it cannot prescribe a specific handling of the entries | prescribe a specific handling of the entries with the identical keys, | |||
with the identical keys, except that it might have a rule that having | except that it might have a rule that having identical keys in a map | |||
identical keys in a map indicates a malformed map and that the | indicates a malformed map and that the decoder has to stop with an | |||
decoder has to stop with an error. Duplicate keys are also | error. Duplicate keys are also prohibited by CBOR decoders that are | |||
prohibited by CBOR decoders that are using strict mode | using strict mode (Section 5.8). | |||
(Section 4.11). | ||||
The CBOR data model for maps does not allow ascribing semantics to | The CBOR data model for maps does not allow ascribing semantics to | |||
the order of the key/value pairs in the map representation. Thus, a | the order of the key/value pairs in the map representation. Thus, a | |||
CBOR-based protocol MUST NOT specify that changing the key/value pair | CBOR-based protocol MUST NOT specify that changing the key/value pair | |||
order in a map would change the semantics, except to specify that | order in a map would change the semantics, except to specify that | |||
some, e.g. non-canonical, orders are disallowed. Timing, cache | some, orders are disallowed, for example where they would not meet | |||
usage, and other side channels are not considered part of the | the requirements of a deterministic encoding (Section 4.2. (Any | |||
semantics. | secondary effects of map ordering such as on timing, cache usage, and | |||
other potential side channels are not considered part of the | ||||
semantics but may be enough reason on its own for a protocol to | ||||
require a deterministic encoding format.) | ||||
Applications for constrained devices that have maps with 24 or fewer | Applications for constrained devices that have maps with 24 or fewer | |||
frequently used keys should consider using small integers (and those | frequently used keys should consider using small integers (and those | |||
with up to 48 frequently used keys should consider also using small | with up to 48 frequently used keys should consider also using small | |||
negative integers) because the keys can then be encoded in a single | negative integers) because the keys can then be encoded in a single | |||
byte. | byte. | |||
4.7.1. Equivalence of Keys | 5.6.1. Equivalence of Keys | |||
The specific data model applying to a CBOR data item is used to | The specific data model applying to a CBOR data item is used to | |||
determine whether keys occurring in maps are duplicates or distinct. | determine whether keys occurring in maps are duplicates or distinct. | |||
At the generic data model level, numerically equivalent integer and | At the generic data model level, numerically equivalent integer and | |||
floating point values are distinct from each other, as they are from | floating point values are distinct from each other, as they are from | |||
the various big numbers (Tags 2 to 5). Similarly, text strings are | the various big numbers (Tags 2 to 5). Similarly, text strings are | |||
distinct from byte strings, even if composed of the same bytes. A | distinct from byte strings, even if composed of the same bytes. A | |||
tagged value is distinct from an untagged value or from a value | tagged value is distinct from an untagged value or from a value | |||
tagged with a different tag. | tagged with a different tag. | |||
skipping to change at page 30, line 32 ¶ | skipping to change at page 35, line 9 ¶ | |||
As discussed in Section 2.2, specific data models can make values | As discussed in Section 2.2, specific data models can make values | |||
equivalent for the purpose of comparing map keys that are distinct in | equivalent for the purpose of comparing map keys that are distinct in | |||
the generic data model. Note that this implies that a generic | the generic data model. Note that this implies that a generic | |||
decoder may deliver a decoded map to an application that needs to be | decoder may deliver a decoded map to an application that needs to be | |||
checked for duplicate map keys by that application (alternatively, | checked for duplicate map keys by that application (alternatively, | |||
the decoder may provide a programming interface to perform this | the decoder may provide a programming interface to perform this | |||
service for the application). Specific data models cannot | service for the application). Specific data models cannot | |||
distinguish values for map keys that are equal for this purpose at | distinguish values for map keys that are equal for this purpose at | |||
the generic data model level. | the generic data model level. | |||
4.8. Undefined Values | 5.7. Undefined Values | |||
In some CBOR-based protocols, the simple value (Section 3.3) of | In some CBOR-based protocols, the simple value (Section 3.3) of | |||
Undefined might be used by an encoder as a substitute for a data item | Undefined might be used by an encoder as a substitute for a data item | |||
with an encoding problem, in order to allow the rest of the enclosing | with an encoding problem, in order to allow the rest of the enclosing | |||
data items to be encoded without harm. | data items to be encoded without harm. | |||
4.9. Preferred Serialization | 5.8. Strict Decoding Mode | |||
For some values at the data model level, CBOR provides multiple | ||||
serializations. For many applications, it is desirable that an | ||||
encoder always chooses a preferred serialization; however, the | ||||
present specification does not put the burden of enforcing this | ||||
preference on either encoder or decoder. | ||||
Some constrained decoders may be limited in their ability to decode | ||||
non-preferred serializations: For example, if only integers below | ||||
1_000_000_000 are expected in an application, the decoder may leave | ||||
out the code that would be needed to decode 64-bit arguments in | ||||
integers. An encoder that always uses preferred serialization | ||||
("preferred encoder") interoperates with this decoder for the numbers | ||||
that can occur in this application. More generally speaking, it | ||||
therefore can be said that a preferred encoder is more universally | ||||
interoperable (and also less wasteful) than one that, say, always | ||||
uses 64-bit integers. | ||||
Similarly, a constrained encoder may be limited in the variety of | ||||
representation variants it supports in such a way that it does not | ||||
emit preferred serializations ("variant encoder"): Say, it could be | ||||
designed to always use the 32-bit variant for an integer that it | ||||
encodes even if a short representation is available (again, assuming | ||||
that there is no application need for integers that can only be | ||||
represented with the 64-bit variant). A decoder that does not rely | ||||
on only ever receiving preferred serializations ("variation-tolerant | ||||
decoder") can there be said to be more universally interoperable (it | ||||
might very well optimize for the case of receiving preferred | ||||
serializations, though). Full implementations of CBOR decoders are | ||||
by definition variation-tolerant; the distinction is only relevant if | ||||
a constrained implementation of a CBOR decoder meets a variant | ||||
encoder. | ||||
The preferred serialization always uses the shortest form of | ||||
representing the argument (Section 3)); it also uses the shortest | ||||
floating point encoding that preserves the value being encoded (see | ||||
Section 4.6). Definite length encoding is preferred whenever the | ||||
length is known at the time the serialization of the item starts. | ||||
4.10. Canonically Encoded CBOR | ||||
Some protocols may want encoders to only emit CBOR in a particular | ||||
canonical format; those protocols might also have the decoders check | ||||
that their input is canonical. Those protocols are free to define | ||||
what they mean by a canonical format and what encoders and decoders | ||||
are expected to do. This section defines a set of restrictions that | ||||
can serve as the base of such a canonical format. | ||||
A CBOR encoding satisfies the "core canonicalization requirements" if | ||||
it satisfies the following restrictions: | ||||
o Arguments (see Section 3) for integers, lengths in major types 2 | ||||
through 5, and tags MUST be as short as possible. In particular: | ||||
* 0 to 23 and -1 to -24 MUST be expressed in the same byte as the | ||||
major type; | ||||
* 24 to 255 and -25 to -256 MUST be expressed only with an | ||||
additional uint8_t; | ||||
* 256 to 65535 and -257 to -65536 MUST be expressed only with an | ||||
additional uint16_t; | ||||
* 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed | ||||
only with an additional uint32_t. | ||||
o The keys in every map MUST be sorted in the bytewise lexicographic | ||||
order of their canonical encodings. For example, the following | ||||
keys are sorted correctly: | ||||
1. 10, encoded as 0x0a. | ||||
2. 100, encoded as 0x1864. | ||||
3. -1, encoded as 0x20. | ||||
4. "z", encoded as 0x617a. | ||||
5. "aa", encoded as 0x626161. | ||||
6. [100], encoded as 0x811864. | ||||
7. [-1], encoded as 0x8120. | ||||
8. false, encoded as 0xf4. | ||||
o Indefinite-length items MUST NOT appear. They can be encoded as | ||||
definite-length items instead. | ||||
If a protocol allows for IEEE floats, then additional | ||||
canonicalization rules might need to be added. One example rule | ||||
might be to have all floats start as a 64-bit float, then do a test | ||||
conversion to a 32-bit float; if the result is the same numeric | ||||
value, use the shorter value and repeat the process with a test | ||||
conversion to a 16-bit float. (This rule selects 16-bit float for | ||||
positive and negative Infinity as well.) Also, there are many | ||||
representations for NaN. If NaN is an allowed value, it must always | ||||
be represented as 0xf97e00. | ||||
CBOR tags present additional considerations for canonicalization. | ||||
The absence or presence of tags in a canonical format is determined | ||||
by the optionality of the tags in the protocol. In a CBOR-based | ||||
protocol that allows optional tagging anywhere, the canonical format | ||||
must not allow them. In a protocol that requires tags in certain | ||||
places, the tag needs to appear in the canonical format. A CBOR- | ||||
based protocol that uses canonicalization might instead say that all | ||||
tags that appear in a message must be retained regardless of whether | ||||
they are optional. | ||||
Protocols that include floating, big integer, or other complex values | ||||
need to define extra requirements on their canonical encodings. For | ||||
example: | ||||
o If a protocol includes a field that can express floating values | ||||
(Section 3.3), the protocol's canonicalization needs to specify | ||||
whether the integer 1.0 is encoded as 0x01, 0xf93c00, | ||||
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | ||||
this are: | ||||
1. Encode integral values that fit in 64 bits as values from | ||||
major types 0 and 1, and other values as the smallest of 16-, | ||||
32-, or 64-bit floating point that accurately represents the | ||||
value, | ||||
2. Encode all values as the smallest of 16-, 32-, or 64-bit | ||||
floating point that accurately represents the value, even for | ||||
integral values, or | ||||
3. Encode all values as 64-bit floating point. | ||||
If NaN is an allowed value, the protocol needs to pick a single | ||||
representation, for example 0xf97e00. | ||||
o If a protocol includes a field that can express integers larger | ||||
than 2^64 using tag 2 (Section 3.4.4), the protocol's | ||||
canonicalization needs to specify whether small integers are | ||||
expressed using the tag or major types 0 and 1. | ||||
o A protocol might give encoders the choice of representing a URL as | ||||
either a text string or, using Section 3.4.6.3, tag 32 containing | ||||
a text string. This protocol's canonicalization needs to either | ||||
require that the tag is present or require that it's absent, not | ||||
allow either one. | ||||
4.10.1. Length-first map key ordering | ||||
The core canonicalization requirements sort map keys in a different | ||||
order from the one suggested by [RFC7049]. Protocols that need to be | ||||
compatible with [RFC7049]'s order can instead be specified in terms | ||||
of this specification's "length-first core canonicalization | ||||
requirements": | ||||
A CBOR encoding satisfies the "length-first core canonicalization | ||||
requirements" if it satisfies the core canonicalization requirements | ||||
except that the keys in every map MUST be sorted such that: | ||||
1. If two keys have different lengths, the shorter one sorts | ||||
earlier; | ||||
2. If two keys have the same length, the one with the lower value in | ||||
(byte-wise) lexical order sorts earlier. | ||||
For example, under the length-first core canonicalization | ||||
requirements, the following keys are sorted correctly: | ||||
1. 10, encoded as 0x0a. | ||||
2. -1, encoded as 0x20. | ||||
3. false, encoded as 0xf4. | ||||
4. 100, encoded as 0x1864. | ||||
5. "z", encoded as 0x617a. | ||||
6. [-1], encoded as 0x8120. | ||||
7. "aa", encoded as 0x626161. | ||||
8. [100], encoded as 0x811864. | ||||
4.11. Strict Decoding Mode | ||||
Some areas of application of CBOR do not require canonicalization | Some areas of application of CBOR do not require deterministic | |||
(Section 4.10) but may require that different decoders reach the same | encoding (Section 4.2) but may require that different decoders reach | |||
(semantically equivalent) results, even in the presence of | the same (semantically equivalent) results, even in the presence of | |||
potentially malicious data. This can be required if one application | potentially malicious data. This can be required if one application | |||
(such as a firewall or other protecting entity) makes a decision | (such as a firewall or other protecting entity) makes a decision | |||
based on the data that another application, which independently | based on the data that another application, which independently | |||
decodes the data, relies on. | decodes the data, relies on. | |||
Normally, it is the responsibility of the sender to avoid ambiguously | Normally, it is the responsibility of the sender to avoid ambiguously | |||
decodable data. However, the sender might be an attacker specially | decodable data. However, the sender might be an attacker specially | |||
making up CBOR data such that it will be interpreted differently by | making up CBOR data such that it will be interpreted differently by | |||
different decoders in an attempt to exploit that as a vulnerability. | different decoders in an attempt to exploit that as a vulnerability. | |||
Generic decoders used in applications where this might be a problem | Generic decoders used in applications where this might be a problem | |||
need to support a strict mode in which it is also the responsibility | need to support a strict mode in which it is also the responsibility | |||
of the receiver to reject ambiguously decodable data. It is expected | of the receiver to reject ambiguously decodable data. It is expected | |||
that firewalls and other security systems that decode CBOR will only | that firewalls and other security systems that decode CBOR will only | |||
decode in strict mode. | decode in strict mode. | |||
A decoder in strict mode will reliably reject any data that could be | A decoder in strict mode will reliably reject any data that could be | |||
interpreted by other decoders in different ways. It will reliably | interpreted by other decoders in different ways. It will expend the | |||
reject data items with syntax errors (Section 4.3). It will also | effort to reliably detect invalid data items (Section 5.3). For | |||
expend the effort to reliably detect other decoding errors | example, a strict decoder needs to have an API that reports an error | |||
(Section 4.4). In particular, a strict decoder needs to have an API | (and does not return data) for a CBOR data item that contains any of | |||
that reports an error (and does not return data) for a CBOR data item | the following: | |||
that contains any of the following: | ||||
o a map (major type 5) that has more than one entry with the same | o a map (major type 5) that has more than one entry with the same | |||
key | key | |||
o a tag that is used on a data item of the incorrect type | o a tag that is used on a data item of the incorrect type | |||
o a data item that is incorrectly formatted for the type given to | o a data item that is incorrectly formatted for the type given to | |||
it, such as invalid UTF-8 or data that cannot be interpreted with | it, such as invalid UTF-8 or data that cannot be interpreted with | |||
the specific tag that it has been tagged with | the specific tag that it has been tagged with | |||
skipping to change at page 35, line 45 ¶ | skipping to change at page 36, line 32 ¶ | |||
Since some of this processing may have an appreciable cost (in | Since some of this processing may have an appreciable cost (in | |||
particular with duplicate detection for maps), support of strict mode | particular with duplicate detection for maps), support of strict mode | |||
is not a requirement placed on all CBOR decoders. | is not a requirement placed on all CBOR decoders. | |||
Some encoders will rely on their applications to provide input data | Some encoders will rely on their applications to provide input data | |||
in such a way that unambiguously decodable CBOR results. A generic | in such a way that unambiguously decodable CBOR results. A generic | |||
encoder also may want to provide a strict mode where it reliably | encoder also may want to provide a strict mode where it reliably | |||
limits its output to unambiguously decodable CBOR, independent of | limits its output to unambiguously decodable CBOR, independent of | |||
whether or not its application is providing API-conformant data. | whether or not its application is providing API-conformant data. | |||
5. Converting Data between CBOR and JSON | 6. Converting Data between CBOR and JSON | |||
This section gives non-normative advice about converting between CBOR | This section gives non-normative advice about converting between CBOR | |||
and JSON. Implementations of converters are free to use whichever | and JSON. Implementations of converters are free to use whichever | |||
advice here they want. | advice here they want. | |||
It is worth noting that a JSON text is a sequence of characters, not | It is worth noting that a JSON text is a sequence of characters, not | |||
an encoded sequence of bytes, while a CBOR data item consists of | an encoded sequence of bytes, while a CBOR data item consists of | |||
bytes, not characters. | bytes, not characters. | |||
5.1. Converting from CBOR to JSON | 6.1. Converting from CBOR to JSON | |||
Most of the types in CBOR have direct analogs in JSON. However, some | Most of the types in CBOR have direct analogs in JSON. However, some | |||
do not, and someone implementing a CBOR-to-JSON converter has to | do not, and someone implementing a CBOR-to-JSON converter has to | |||
consider what to do in those cases. The following non-normative | consider what to do in those cases. The following non-normative | |||
advice deals with these by converting them to a single substitute | advice deals with these by converting them to a single substitute | |||
value, such as a JSON null. | value, such as a JSON null. | |||
o An integer (major type 0 or 1) becomes a JSON number. | o An integer (major type 0 or 1) becomes a JSON number. | |||
o A byte string (major type 2) that is not embedded in a tag that | o A byte string (major type 2) that is not embedded in a tag that | |||
skipping to change at page 36, line 35 ¶ | skipping to change at page 37, line 21 ¶ | |||
quotation mark (U+0022), reverse solidus (U+005C), and the "C0 | quotation mark (U+0022), reverse solidus (U+005C), and the "C0 | |||
control characters" (U+0000 through U+001F). All other characters | control characters" (U+0000 through U+001F). All other characters | |||
are copied unchanged into the JSON UTF-8 string. | are copied unchanged into the JSON UTF-8 string. | |||
o An array (major type 4) becomes a JSON array. | o An array (major type 4) becomes a JSON array. | |||
o A map (major type 5) becomes a JSON object. This is possible | o A map (major type 5) becomes a JSON object. This is possible | |||
directly only if all keys are UTF-8 strings. A converter might | directly only if all keys are UTF-8 strings. A converter might | |||
also convert other keys into UTF-8 strings (such as by converting | also convert other keys into UTF-8 strings (such as by converting | |||
integers into strings containing their decimal representation); | integers into strings containing their decimal representation); | |||
however, doing so introduces a danger of key collision. | however, doing so introduces a danger of key collision. Note also | |||
that, if tags on UTF-8 strings are ignored as proposed below, this | ||||
will cause a key collision if the tags are different but the | ||||
strings are the same. | ||||
o False (major type 7, additional information 20) becomes a JSON | o False (major type 7, additional information 20) becomes a JSON | |||
false. | false. | |||
o True (major type 7, additional information 21) becomes a JSON | o True (major type 7, additional information 21) becomes a JSON | |||
true. | true. | |||
o Null (major type 7, additional information 22) becomes a JSON | o Null (major type 7, additional information 22) becomes a JSON | |||
null. | null. | |||
skipping to change at page 37, line 24 ¶ | skipping to change at page 38, line 11 ¶ | |||
o A byte string with an encoding hint (major type 6, tag value 21 | o A byte string with an encoding hint (major type 6, tag value 21 | |||
through 23) is encoded as described and becomes a JSON string. | through 23) is encoded as described and becomes a JSON string. | |||
o For all other tags (major type 6, any other tag value), the | o For all other tags (major type 6, any other tag value), the | |||
embedded CBOR item is represented as a JSON value; the tag value | embedded CBOR item is represented as a JSON value; the tag value | |||
is ignored. | is ignored. | |||
o Indefinite-length items are made definite before conversion. | o Indefinite-length items are made definite before conversion. | |||
5.2. Converting from JSON to CBOR | 6.2. Converting from JSON to CBOR | |||
All JSON values, once decoded, directly map into one or more CBOR | All JSON values, once decoded, directly map into one or more CBOR | |||
values. As with any kind of CBOR generation, decisions have to be | values. As with any kind of CBOR generation, decisions have to be | |||
made with respect to number representation. In a suggested | made with respect to number representation. In a suggested | |||
conversion: | conversion: | |||
o JSON numbers without fractional parts (integer numbers) are | o JSON numbers without fractional parts (integer numbers) are | |||
represented as integers (major types 0 and 1, possibly major type | represented as integers (major types 0 and 1, possibly major type | |||
6 tag value 2 and 3), choosing the shortest form; integers longer | 6 tag value 2 and 3), choosing the shortest form; integers longer | |||
than an implementation-defined threshold (which is usually either | than an implementation-defined threshold (which is usually either | |||
skipping to change at page 38, line 13 ¶ | skipping to change at page 39, line 5 ¶ | |||
perform a JSON-to-CBOR encoding in place in a single buffer. This | perform a JSON-to-CBOR encoding in place in a single buffer. This | |||
strategy would need to carefully consider a number of pathological | strategy would need to carefully consider a number of pathological | |||
cases, such as that some strings represented with no or very few | cases, such as that some strings represented with no or very few | |||
escapes and longer (or much longer) than 255 bytes may expand when | escapes and longer (or much longer) than 255 bytes may expand when | |||
encoded as UTF-8 strings in CBOR. Similarly, a few of the binary | encoded as UTF-8 strings in CBOR. Similarly, a few of the binary | |||
floating-point representations might cause expansion from some short | floating-point representations might cause expansion from some short | |||
decimal representations (1.1, 1e9) in JSON. This may be hard to get | decimal representations (1.1, 1e9) in JSON. This may be hard to get | |||
right, and any ensuing vulnerabilities may be exploited by an | right, and any ensuing vulnerabilities may be exploited by an | |||
attacker. | attacker. | |||
6. Future Evolution of CBOR | 7. Future Evolution of CBOR | |||
Successful protocols evolve over time. New ideas appear, | Successful protocols evolve over time. New ideas appear, | |||
implementation platforms improve, related protocols are developed and | implementation platforms improve, related protocols are developed and | |||
evolve, and new requirements from applications and protocols are | evolve, and new requirements from applications and protocols are | |||
added. Facilitating protocol evolution is therefore an important | added. Facilitating protocol evolution is therefore an important | |||
design consideration for any protocol development. | design consideration for any protocol development. | |||
For protocols that will use CBOR, CBOR provides some useful | For protocols that will use CBOR, CBOR provides some useful | |||
mechanisms to facilitate their evolution. Best practices for this | mechanisms to facilitate their evolution. Best practices for this | |||
are well known, particularly from JSON format development of JSON- | are well known, particularly from JSON format development of JSON- | |||
skipping to change at page 38, line 36 ¶ | skipping to change at page 39, line 28 ¶ | |||
However, facilitating the evolution of CBOR itself is very well | However, facilitating the evolution of CBOR itself is very well | |||
within its scope. CBOR is designed to both provide a stable basis | within its scope. CBOR is designed to both provide a stable basis | |||
for development of CBOR-based protocols and to be able to evolve. | for development of CBOR-based protocols and to be able to evolve. | |||
Since a successful protocol may live for decades, CBOR needs to be | Since a successful protocol may live for decades, CBOR needs to be | |||
designed for decades of use and evolution. This section provides | designed for decades of use and evolution. This section provides | |||
some guidance for the evolution of CBOR. It is necessarily more | some guidance for the evolution of CBOR. It is necessarily more | |||
subjective than other parts of this document. It is also necessarily | subjective than other parts of this document. It is also necessarily | |||
incomplete, lest it turn into a textbook on protocol development. | incomplete, lest it turn into a textbook on protocol development. | |||
6.1. Extension Points | 7.1. Extension Points | |||
In a protocol design, opportunities for evolution are often included | In a protocol design, opportunities for evolution are often included | |||
in the form of extension points. For example, there may be a | in the form of extension points. For example, there may be a | |||
codepoint space that is not fully allocated from the outset, and the | codepoint space that is not fully allocated from the outset, and the | |||
protocol is designed to tolerate and embrace implementations that | protocol is designed to tolerate and embrace implementations that | |||
start using more codepoints than initially allocated. | start using more codepoints than initially allocated. | |||
Sizing the codepoint space may be difficult because the range | Sizing the codepoint space may be difficult because the range | |||
required may be hard to predict. An attempt should be made to make | required may be hard to predict. An attempt should be made to make | |||
the codepoint space large enough so that it can slowly be filled over | the codepoint space large enough so that it can slowly be filled over | |||
the intended lifetime of the protocol. | the intended lifetime of the protocol. | |||
CBOR has three major extension points: | CBOR has three major extension points: | |||
o the "simple" space (values in major type 7). Of the 24 efficient | o the "simple" space (values in major type 7). Of the 24 efficient | |||
(and 224 slightly less efficient) values, only a small number have | (and 224 slightly less efficient) values, only a small number have | |||
been allocated. Implementations receiving an unknown simple data | been allocated. Implementations receiving an unknown simple data | |||
item may be able to process it as such, given that the structure | item may be able to process it as such, given that the structure | |||
of the value is indeed simple. The IANA registry in Section 8.1 | of the value is indeed simple. The IANA registry in Section 9.1 | |||
is the appropriate way to address the extensibility of this | is the appropriate way to address the extensibility of this | |||
codepoint space. | codepoint space. | |||
o the "tag" space (values in major type 6). Again, only a small | o the "tag" space (values in major type 6). Again, only a small | |||
part of the codepoint space has been allocated, and the space is | part of the codepoint space has been allocated, and the space is | |||
abundant (although the early numbers are more efficient than the | abundant (although the early numbers are more efficient than the | |||
later ones). Implementations receiving an unknown tag can choose | later ones). Implementations receiving an unknown tag can choose | |||
to simply ignore it or to process it as an unknown tag wrapping | to simply ignore it or to process it as an unknown tag wrapping | |||
the following data item. The IANA registry in Section 8.2 is the | the following data item. The IANA registry in Section 9.2 is the | |||
appropriate way to address the extensibility of this codepoint | appropriate way to address the extensibility of this codepoint | |||
space. | space. | |||
o the "additional information" space. An implementation receiving | o the "additional information" space. An implementation receiving | |||
an unknown additional information value has no way to continue | an unknown additional information value has no way to continue | |||
decoding, so allocating codepoints to this space is a major step. | decoding, so allocating codepoints to this space is a major step. | |||
There are also very few codepoints left. | There are also very few codepoints left. | |||
6.2. Curating the Additional Information Space | 7.2. Curating the Additional Information Space | |||
The human mind is sometimes drawn to filling in little perceived gaps | The human mind is sometimes drawn to filling in little perceived gaps | |||
to make something neat. We expect the remaining gaps in the | to make something neat. We expect the remaining gaps in the | |||
codepoint space for the additional information values to be an | codepoint space for the additional information values to be an | |||
attractor for new ideas, just because they are there. | attractor for new ideas, just because they are there. | |||
The present specification does not manage the additional information | The present specification does not manage the additional information | |||
codepoint space by an IANA registry. Instead, allocations out of | codepoint space by an IANA registry. Instead, allocations out of | |||
this space can only be done by updating this specification. | this space can only be done by updating this specification. | |||
For an additional information value of n >= 24, the size of the | For an additional information value of n >= 24, the size of the | |||
additional data typically is 2**(n-24) bytes. Therefore, additional | additional data typically is 2**(n-24) bytes. Therefore, additional | |||
information values 28 and 29 should be viewed as candidates for | information values 28 and 29 should be viewed as candidates for | |||
128-bit and 256-bit quantities, in case a need arises to add them to | 128-bit and 256-bit quantities, in case a need arises to add them to | |||
the protocol. Additional information value 30 is then the only | the protocol. Additional information value 30 is then the only | |||
additional information value available for general allocation, and | additional information value available for general allocation, and | |||
there should be a very good reason for allocating it before assigning | there should be a very good reason for allocating it before assigning | |||
it through an update of this protocol. | it through an update of this protocol. | |||
7. Diagnostic Notation | 8. Diagnostic Notation | |||
CBOR is a binary interchange format. To facilitate documentation and | CBOR is a binary interchange format. To facilitate documentation and | |||
debugging, and in particular to facilitate communication between | debugging, and in particular to facilitate communication between | |||
entities cooperating in debugging, this section defines a simple | entities cooperating in debugging, this section defines a simple | |||
human-readable diagnostic notation. All actual interchange always | human-readable diagnostic notation. All actual interchange always | |||
happens in the binary format. | happens in the binary format. | |||
Note that this truly is a diagnostic format; it is not meant to be | Note that this truly is a diagnostic format; it is not meant to be | |||
parsed. Therefore, no formal definition (as in ABNF) is given in | parsed. Therefore, no formal definition (as in ABNF) is given in | |||
this document. (Implementers looking for a text-based format for | this document. (Implementers looking for a text-based format for | |||
skipping to change at page 40, line 45 ¶ | skipping to change at page 41, line 34 ¶ | |||
padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | |||
for base32, >h32< for base32hex, >b64< for base64 or base64url (the | for base32, >h32< for base32hex, >b64< for base64 or base64url (the | |||
actual encodings do not overlap, so the string remains unambiguous). | actual encodings do not overlap, so the string remains unambiguous). | |||
For example, the byte string 0x12345678 could be written h'12345678', | For example, the byte string 0x12345678 could be written h'12345678', | |||
b32'CI2FM6A', or b64'EjRWeA'. | b32'CI2FM6A', or b64'EjRWeA'. | |||
Unassigned simple values are given as "simple()" with the appropriate | Unassigned simple values are given as "simple()" with the appropriate | |||
integer in the parentheses. For example, "simple(42)" indicates | integer in the parentheses. For example, "simple(42)" indicates | |||
major type 7, value 42. | major type 7, value 42. | |||
7.1. Encoding Indicators | 8.1. Encoding Indicators | |||
Sometimes it is useful to indicate in the diagnostic notation which | Sometimes it is useful to indicate in the diagnostic notation which | |||
of several alternative representations were actually used; for | of several alternative representations were actually used; for | |||
example, a data item written >1.5< by a diagnostic decoder might have | example, a data item written >1.5< by a diagnostic decoder might have | |||
been encoded as a half-, single-, or double-precision float. | been encoded as a half-, single-, or double-precision float. | |||
The convention for encoding indicators is that anything starting with | The convention for encoding indicators is that anything starting with | |||
an underscore and all following characters that are alphanumeric or | an underscore and all following characters that are alphanumeric or | |||
underscore, is an encoding indicator, and can be ignored by anyone | underscore, is an encoding indicator, and can be ignored by anyone | |||
not interested in this information. Encoding indicators are always | not interested in this information. Encoding indicators are always | |||
skipping to change at page 41, line 29 ¶ | skipping to change at page 42, line 17 ¶ | |||
preceding bracket or brace) was encoded with an additional | preceding bracket or brace) was encoded with an additional | |||
information value of 24+n. For example, 1.5_1 is a half-precision | information value of 24+n. For example, 1.5_1 is a half-precision | |||
floating-point number, while 1.5_3 is encoded as double precision. | floating-point number, while 1.5_3 is encoded as double precision. | |||
This encoding indicator is not shown in Appendix A. (Note that the | This encoding indicator is not shown in Appendix A. (Note that the | |||
encoding indicator "_" is thus an abbreviation of the full form "_7", | encoding indicator "_" is thus an abbreviation of the full form "_7", | |||
which is not used.) | which is not used.) | |||
As a special case, byte and text strings of indefinite length can be | As a special case, byte and text strings of indefinite length can be | |||
notated in the form (_ h'0123', h'4567') and (_ "foo", "bar"). | notated in the form (_ h'0123', h'4567') and (_ "foo", "bar"). | |||
8. IANA Considerations | 9. IANA Considerations | |||
IANA has created two registries for new CBOR values. The registries | IANA has created two registries for new CBOR values. The registries | |||
are separate, that is, not under an umbrella registry, and follow the | are separate, that is, not under an umbrella registry, and follow the | |||
rules in [RFC8126]. IANA has also assigned a new MIME media type and | rules in [RFC8126]. IANA has also assigned a new MIME media type and | |||
an associated Constrained Application Protocol (CoAP) Content-Format | an associated Constrained Application Protocol (CoAP) Content-Format | |||
entry. | entry. | |||
8.1. Simple Values Registry | 9.1. Simple Values Registry | |||
IANA has created the "Concise Binary Object Representation (CBOR) | IANA has created the "Concise Binary Object Representation (CBOR) | |||
Simple Values" registry. The initial values are shown in Table 2. | Simple Values" registry at [IANA.cbor-simple-values]. The initial | |||
values are shown in Table 2. | ||||
New entries in the range 0 to 19 are assigned by Standards Action. | New entries in the range 0 to 19 are assigned by Standards Action. | |||
It is suggested that these Standards Actions allocate values starting | It is suggested that these Standards Actions allocate values starting | |||
with the number 16 in order to reserve the lower numbers for | with the number 16 in order to reserve the lower numbers for | |||
contiguous blocks (if any). | contiguous blocks (if any). | |||
New entries in the range 32 to 255 are assigned by Specification | New entries in the range 32 to 255 are assigned by Specification | |||
Required. | Required. | |||
8.2. Tags Registry | 9.2. Tags Registry | |||
IANA has created the "Concise Binary Object Representation (CBOR) | IANA has created the "Concise Binary Object Representation (CBOR) | |||
Tags" registry. The initial values are shown in Table 3. | Tags" registry at [IANA.cbor-tags]. The tags that were defined in | |||
[RFC7049] are described in detail in Section 3.4, but other tags have | ||||
already been defined. | ||||
New entries in the range 0 to 23 are assigned by Standards Action. | New entries in the range 0 to 23 are assigned by Standards Action. | |||
New entries in the range 24 to 255 are assigned by Specification | New entries in the range 24 to 255 are assigned by Specification | |||
Required. New entries in the range 256 to 18446744073709551615 are | Required. New entries in the range 256 to 18446744073709551615 are | |||
assigned by First Come First Served. The template for registration | assigned by First Come First Served. The template for registration | |||
requests is: | requests is: | |||
o Data item | o Data item | |||
o Semantics (short form) | o Semantics (short form) | |||
In addition, First Come First Served requests should include: | In addition, First Come First Served requests should include: | |||
o Point of contact | o Point of contact | |||
o Description of semantics (URL) - This description is optional; the | o Description of semantics (URL) - This description is optional; the | |||
URL can point to something like an Internet-Draft or a web page. | URL can point to something like an Internet-Draft or a web page. | |||
8.3. Media Type ("MIME Type") | 9.3. Media Type ("MIME Type") | |||
The Internet media type [RFC6838] for CBOR data is application/cbor. | The Internet media type [RFC6838] for a single encoded CBOR data item | |||
is application/cbor. | ||||
Type name: application | Type name: application | |||
Subtype name: cbor | Subtype name: cbor | |||
Required parameters: n/a | Required parameters: n/a | |||
Optional parameters: n/a | Optional parameters: n/a | |||
Encoding considerations: binary | Encoding considerations: binary | |||
Security considerations: See Section 9 of this document | Security considerations: See Section 10 of this document | |||
Interoperability considerations: n/a | Interoperability considerations: n/a | |||
Published specification: This document | Published specification: This document | |||
Applications that use this media type: None yet, but it is expected | Applications that use this media type: None yet, but it is expected | |||
that this format will be deployed in protocols and applications. | that this format will be deployed in protocols and applications. | |||
Additional information: | Additional information: | |||
Magic number(s): n/a | Magic number(s): n/a | |||
skipping to change at page 43, line 24 ¶ | skipping to change at page 44, line 24 ¶ | |||
Intended usage: COMMON | Intended usage: COMMON | |||
Restrictions on usage: none | Restrictions on usage: none | |||
Author: | Author: | |||
Carsten Bormann <cabo@tzi.org> | Carsten Bormann <cabo@tzi.org> | |||
Change controller: | Change controller: | |||
The IESG <iesg@ietf.org> | The IESG <iesg@ietf.org> | |||
8.4. CoAP Content-Format | 9.4. CoAP Content-Format | |||
Media Type: application/cbor | Media Type: application/cbor | |||
Encoding: - | Encoding: - | |||
Id: 60 | Id: 60 | |||
Reference: [RFCthis] | Reference: [RFCthis] | |||
8.5. The +cbor Structured Syntax Suffix Registration | 9.5. The +cbor Structured Syntax Suffix Registration | |||
Name: Concise Binary Object Representation (CBOR) | Name: Concise Binary Object Representation (CBOR) | |||
+suffix: +cbor | +suffix: +cbor | |||
References: [RFCthis] | References: [RFCthis] | |||
Encoding Considerations: CBOR is a binary format. | Encoding Considerations: CBOR is a binary format. | |||
Interoperability Considerations: n/a | Interoperability Considerations: n/a | |||
skipping to change at page 44, line 23 ¶ | skipping to change at page 45, line 23 ¶ | |||
For cases defined in +cbor, where the fragment identifier resolves | For cases defined in +cbor, where the fragment identifier resolves | |||
per the +cbor rules, then process as specified in +cbor. | per the +cbor rules, then process as specified in +cbor. | |||
For cases defined in +cbor, where the fragment identifier does | For cases defined in +cbor, where the fragment identifier does | |||
not resolve per the +cbor rules, then process as specified in | not resolve per the +cbor rules, then process as specified in | |||
"xxx/yyy+cbor". | "xxx/yyy+cbor". | |||
For cases not defined in +cbor, then process as specified in | For cases not defined in +cbor, then process as specified in | |||
"xxx/yyy+cbor". | "xxx/yyy+cbor". | |||
Security Considerations: See Section 9 of this document | Security Considerations: See Section 10 of this document | |||
Contact: | Contact: | |||
Apps Area Working Group (apps-discuss@ietf.org) | Apps Area Working Group (apps-discuss@ietf.org) | |||
Author/Change Controller: | Author/Change Controller: | |||
The Apps Area Working Group. | The Apps Area Working Group. | |||
The IESG has change control over this registration. | The IESG has change control over this registration. | |||
9. Security Considerations | 10. Security Considerations | |||
A network-facing application can exhibit vulnerabilities in its | A network-facing application can exhibit vulnerabilities in its | |||
processing logic for incoming data. Complex parsers are well known | processing logic for incoming data. Complex parsers are well known | |||
as a likely source of such vulnerabilities, such as the ability to | as a likely source of such vulnerabilities, such as the ability to | |||
remotely crash a node, or even remotely execute arbitrary code on it. | remotely crash a node, or even remotely execute arbitrary code on it. | |||
CBOR attempts to narrow the opportunities for introducing such | CBOR attempts to narrow the opportunities for introducing such | |||
vulnerabilities by reducing parser complexity, by giving the entire | vulnerabilities by reducing parser complexity, by giving the entire | |||
range of encodable values a meaning where possible. | range of encodable values a meaning where possible. | |||
Resource exhaustion attacks might attempt to lure a decoder into | Because CBOR decoders are often used as a first step in processing | |||
allocating very big data items (strings, arrays, maps) or exhaust the | unvalidated input, they need to be fully prepared for all types of | |||
stack depth by setting up deeply nested items. Decoders need to have | hostile input that may be designed to corrupt, overrun, or achieve | |||
appropriate resource management to mitigate these attacks. (Items | control of the system decoding the CBOR data item. A CBOR decoder | |||
for which very large sizes are given can also attempt to exploit | needs to assume that all input may be hostile even if it has been | |||
integer overflow vulnerabilities.) | checked by a firewall, has come over a TLS-secured channel, is | |||
encrypted or signed, or has come from some other source that is | ||||
Protocols that are used in a security context should be defined in | presumed trusted. | |||
such a way that potential multiple interpretations are reliably | ||||
reduced to a single one. For example, an attacker could make use of | ||||
duplicate keys in maps or precision issues in numbers to make one | ||||
decoder base its decisions on a different interpretation than the one | ||||
that will be used by a second decoder. To facilitate this, encoder | ||||
and decoder implementations used in such contexts should provide at | ||||
least one strict mode of operation (Section 4.11). | ||||
10. Acknowledgements | Hostile input may be constructed to overrun buffers, overflow or | |||
underflow integer arithmetic, or cause other decoding disruption. | ||||
CBOR data items might have lengths or sizes that are intentionally | ||||
extremely large or too short. Resource exhaustion attacks might | ||||
attempt to lure a decoder into allocating very big data items | ||||
(strings, arrays, maps) or exhaust the stack depth by setting up | ||||
deeply nested items. Decoders need to have appropriate resource | ||||
management to mitigate these attacks. (Items for which very large | ||||
sizes are given can also attempt to exploit integer overflow | ||||
vulnerabilities.) | ||||
CBOR was inspired by MessagePack. MessagePack was developed and | A CBOR decoder, by definition, only accepts well-formed CBOR; this is | |||
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | the first step to its robustness. Input that is not well-formed CBOR | |||
MessagePack is solely for attribution; CBOR is not intended as a | causes no further processing from the point where the lack of well- | |||
version of or replacement for MessagePack, as it has different design | formedness was detected. If possible, any data decoded up to this | |||
goals and requirements. | point should have no impact on the application using the CBOR | |||
decoder. | ||||
The need for functionality beyond the original MessagePack | In addition to ascertaining well-formedness, a CBOR decoder might | |||
Specification became obvious to many people at about the same time | also perform validity checks on the CBOR data. Alternatively, it can | |||
around the year 2012. BinaryPack is a minor derivation of | leave those checks to the application using the decoder. This choice | |||
MessagePack that was developed by Eric Zhang for the binaryjs | needs to be clearly documented in the decoder. Beyond the validity | |||
project. A similar, but different, extension was made by Tim Caswell | at the CBOR level, an application also needs to ascertain that the | |||
for his msgpack-js and msgpack-js-browser projects. Many people have | input is in alignment with the application protocol that is | |||
contributed to the recent discussion about extending MessagePack to | serialized in CBOR. | |||
separate text string representation from byte string representation. | ||||
The encoding of the additional information in CBOR was inspired by | CBOR encoders do not receive input directly from the network and are | |||
the encoding of length information designed by Klaus Hartke for CoAP. | thus not directly attackable in the same way as CBOR decoders. | |||
However, CBOR encoders often have an API that takes input from | ||||
another level in the implementation and can be attacked through that | ||||
API. The design and implementation of that API should assume the | ||||
behavior of its caller may be based on hostile input or on coding | ||||
mistakes. It should check inputs for buffer overruns, overflow and | ||||
underflow of integer arithmetic, and other such errors that are aimed | ||||
to disrupt the encoder. | ||||
This document also incorporates suggestions made by many people, | Protocols that are used in a security context should be defined in | |||
notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, | such a way that potential multiple interpretations are reliably | |||
Laurence Lundblade, Matthew Lepinski, Michael Richardson, Nico | reduced to a single interpretation. For example, an attacker could | |||
Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, Tony Finch, Tony | make use of invalid input such as duplicate keys in maps, or exploit | |||
Hansen, and Yaron Sheffer. | different precision in processing numbers to make one application | |||
base its decisions on a different interpretation than the one that | ||||
will be used by a second application. To facilitate consistent | ||||
interpretation, encoder and decoder implementations used in such | ||||
contexts should provide at least one strict mode of operation | ||||
(Section 5.8). | ||||
11. References | 11. References | |||
11.1. Normative References | 11.1. Normative References | |||
[ECMA262] Ecma International, "ECMAScript 2018 Language | [ECMA262] Ecma International, "ECMAScript 2018 Language | |||
Specification", ECMA Standard ECMA-262, 9th Edition, June | Specification", ECMA Standard ECMA-262, 9th Edition, June | |||
2018, <https://www.ecma- | 2018, <https://www.ecma- | |||
international.org/publications/files/ECMA-ST/ | international.org/publications/files/ECMA-ST/ | |||
Ecma-262.pdf>. | Ecma-262.pdf>. | |||
[IEEE.754.2008] | [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE | |||
Institute of Electrical and Electronics Engineers, "IEEE | Std 754-2008. | |||
Standard for Floating-Point Arithmetic", IEEE | ||||
Standard 754-2008, August 2008. | ||||
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
Extensions (MIME) Part One: Format of Internet Message | Extensions (MIME) Part One: Format of Internet Message | |||
Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, | Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, | |||
<https://www.rfc-editor.org/info/rfc2045>. | <https://www.rfc-editor.org/info/rfc2045>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
skipping to change at page 46, line 41 ¶ | skipping to change at page 48, line 5 ¶ | |||
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | |||
Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | |||
<https://www.rfc-editor.org/info/rfc4648>. | <https://www.rfc-editor.org/info/rfc4648>. | |||
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | |||
Writing an IANA Considerations Section in RFCs", BCP 26, | Writing an IANA Considerations Section in RFCs", BCP 26, | |||
RFC 8126, DOI 10.17487/RFC8126, June 2017, | RFC 8126, DOI 10.17487/RFC8126, June 2017, | |||
<https://www.rfc-editor.org/info/rfc8126>. | <https://www.rfc-editor.org/info/rfc8126>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
[TIME_T] The Open Group Base Specifications, "Vol. 1: Base | [TIME_T] The Open Group Base Specifications, "Vol. 1: Base | |||
Definitions, Issue 7", Section 4.15 'Seconds Since the | Definitions, Issue 7", Section 4.15 'Seconds Since the | |||
Epoch', IEEE Std 1003.1, 2013 Edition, 2013, | Epoch', IEEE Std 1003.1, 2013 Edition, 2013, | |||
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/ | <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/ | |||
V1_chap04.html#tag_04_15>. | V1_chap04.html#tag_04_15>. | |||
11.2. Informative References | 11.2. Informative References | |||
[ASN.1] International Telecommunication Union, "Information | [ASN.1] International Telecommunication Union, "Information | |||
Technology -- ASN.1 encoding rules: Specification of Basic | Technology -- ASN.1 encoding rules: Specification of Basic | |||
Encoding Rules (BER), Canonical Encoding Rules (CER) and | Encoding Rules (BER), Canonical Encoding Rules (CER) and | |||
Distinguished Encoding Rules (DER)", ITU-T Recommendation | Distinguished Encoding Rules (DER)", ITU-T Recommendation | |||
X.690, 1994. | X.690, 1994. | |||
[BSON] Various, "BSON - Binary JSON", 2013, | [BSON] Various, "BSON - Binary JSON", 2013, | |||
<http://bsonspec.org/>. | <http://bsonspec.org/>. | |||
[IANA.cbor-simple-values] | ||||
IANA, "Concise Binary Object Representation (CBOR) Simple | ||||
Values", | ||||
<http://www.iana.org/assignments/cbor-simple-values>. | ||||
[IANA.cbor-tags] | ||||
IANA, "Concise Binary Object Representation (CBOR) Tags", | ||||
<http://www.iana.org/assignments/cbor-tags>. | ||||
[MessagePack] | [MessagePack] | |||
Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>. | Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>. | |||
[PCRE] Ho, A., "PCRE - Perl Compatible Regular Expressions", | [PCRE] Ho, A., "PCRE - Perl Compatible Regular Expressions", | |||
2018, <http://www.pcre.org/>. | 2018, <http://www.pcre.org/>. | |||
[RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission | [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission | |||
Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, | Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, | |||
<https://www.rfc-editor.org/info/rfc713>. | <https://www.rfc-editor.org/info/rfc713>. | |||
skipping to change at page 50, line 20 ¶ | skipping to change at page 52, line 20 ¶ | |||
| false | 0xf4 | | | false | 0xf4 | | |||
| | | | | | | | |||
| true | 0xf5 | | | true | 0xf5 | | |||
| | | | | | | | |||
| null | 0xf6 | | | null | 0xf6 | | |||
| | | | | | | | |||
| undefined | 0xf7 | | | undefined | 0xf7 | | |||
| | | | | | | | |||
| simple(16) | 0xf0 | | | simple(16) | 0xf0 | | |||
| | | | | | | | |||
| simple(24) | 0xf818 | | ||||
| | | | ||||
| simple(255) | 0xf8ff | | | simple(255) | 0xf8ff | | |||
| | | | | | | | |||
| 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | | | 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | | |||
| | 30343a30305a | | | | 30343a30305a | | |||
| | | | | | | | |||
| 1(1363896240) | 0xc11a514b67b0 | | | 1(1363896240) | 0xc11a514b67b0 | | |||
| | | | | | | | |||
| 1(1363896240.5) | 0xc1fb41d452d9ec200000 | | | 1(1363896240.5) | 0xc1fb41d452d9ec200000 | | |||
| | | | | | | | |||
| 23(h'01020304') | 0xd74401020304 | | | 23(h'01020304') | 0xd74401020304 | | |||
skipping to change at page 56, line 26 ¶ | skipping to change at page 58, line 26 ¶ | |||
case 31: | case 31: | |||
return well_formed_indefinite(mt, breakable); | return well_formed_indefinite(mt, breakable); | |||
} | } | |||
// process content | // process content | |||
switch (mt) { | switch (mt) { | |||
// case 0, 1, 7 do not have content; just use val | // case 0, 1, 7 do not have content; just use val | |||
case 2: case 3: take(val); break; // bytes/UTF-8 | case 2: case 3: take(val); break; // bytes/UTF-8 | |||
case 4: for (i = 0; i < val; i++) well_formed(); break; | case 4: for (i = 0; i < val; i++) well_formed(); break; | |||
case 5: for (i = 0; i < val*2; i++) well_formed(); break; | case 5: for (i = 0; i < val*2; i++) well_formed(); break; | |||
case 6: well_formed(); break; // 1 embedded data item | case 6: well_formed(); break; // 1 embedded data item | |||
case 7: if (ai == 24 && val < 32) fail(); // bad simple | ||||
} | } | |||
return mt; // finite data item | return mt; // finite data item | |||
} | } | |||
well_formed_indefinite(mt, breakable) { | well_formed_indefinite(mt, breakable) { | |||
switch (mt) { | switch (mt) { | |||
case 2: case 3: | case 2: case 3: | |||
while ((it = well_formed(true)) != -1) | while ((it = well_formed(true)) != -1) | |||
if (it != mt) // need finite embedded | if (it != mt) // need finite embedded | |||
fail(); // of same type | fail(); // of same type | |||
skipping to change at page 57, line 33 ¶ | skipping to change at page 59, line 37 ¶ | |||
*p++ = mt + 24; | *p++ = mt + 24; | |||
*p++ = ui; | *p++ = ui; | |||
} else | } else | |||
... | ... | |||
Figure 2: Pseudocode for Encoding a Signed Integer | Figure 2: Pseudocode for Encoding a Signed Integer | |||
Appendix D. Half-Precision | Appendix D. Half-Precision | |||
As half-precision floating-point numbers were only added to IEEE 754 | As half-precision floating-point numbers were only added to IEEE 754 | |||
in 2008 [IEEE.754.2008], today's programming platforms often still | in 2008 [IEEE754], today's programming platforms often still only | |||
only have limited support for them. It is very easy to include at | have limited support for them. It is very easy to include at least | |||
least decoding support for them even without such support. An | decoding support for them even without such support. An example of a | |||
example of a small decoder for half-precision floating-point numbers | small decoder for half-precision floating-point numbers in the C | |||
in the C language is shown in Figure 3. A similar program for Python | language is shown in Figure 3. A similar program for Python is in | |||
is in Figure 4; this code assumes that the 2-byte value has already | Figure 4; this code assumes that the 2-byte value has already been | |||
been decoded as an (unsigned short) integer in network byte order (as | decoded as an (unsigned short) integer in network byte order (as | |||
would be done by the pseudocode in Appendix C). | would be done by the pseudocode in Appendix C). | |||
#include <math.h> | #include <math.h> | |||
double decode_half(unsigned char *halfp) { | double decode_half(unsigned char *halfp) { | |||
int half = (halfp[0] << 8) + halfp[1]; | int half = (halfp[0] << 8) + halfp[1]; | |||
int exp = (half >> 10) & 0x1f; | int exp = (half >> 10) & 0x1f; | |||
int mant = half & 0x3ff; | int mant = half & 0x3ff; | |||
double val; | double val; | |||
if (exp == 0) val = ldexp(mant, -24); | if (exp == 0) val = ldexp(mant, -24); | |||
skipping to change at page 61, line 44 ¶ | skipping to change at page 63, line 44 ¶ | |||
o Updated reference for [CNN-TERMS] to [RFC7228] | o Updated reference for [CNN-TERMS] to [RFC7228] | |||
o Added a comment to the last example in Section 2.2.1 (added | o Added a comment to the last example in Section 2.2.1 (added | |||
"Second value") | "Second value") | |||
o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") | |||
o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> | |||
"0b000_11001") | "0b000_11001") | |||
Acknowledgements | ||||
CBOR was inspired by MessagePack. MessagePack was developed and | ||||
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | ||||
MessagePack is solely for attribution; CBOR is not intended as a | ||||
version of or replacement for MessagePack, as it has different design | ||||
goals and requirements. | ||||
The need for functionality beyond the original MessagePack | ||||
Specification became obvious to many people at about the same time | ||||
around the year 2012. BinaryPack is a minor derivation of | ||||
MessagePack that was developed by Eric Zhang for the binaryjs | ||||
project. A similar, but different, extension was made by Tim Caswell | ||||
for his msgpack-js and msgpack-js-browser projects. Many people have | ||||
contributed to the discussion about extending MessagePack to separate | ||||
text string representation from byte string representation. | ||||
The encoding of the additional information in CBOR was inspired by | ||||
the encoding of length information designed by Klaus Hartke for CoAP. | ||||
This document also incorporates suggestions made by many people, | ||||
notably Dan Frost, James Manger, Jeffrey Yaskin, Joe Hildebrand, | ||||
Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael | ||||
Richardson, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, | ||||
Tony Finch, Tony Hansen, and Yaron Sheffer. | ||||
Authors' Addresses | Authors' Addresses | |||
Carsten Bormann | Carsten Bormann | |||
Universitaet Bremen TZI | Universitaet Bremen TZI | |||
Postfach 330440 | Postfach 330440 | |||
D-28359 Bremen | D-28359 Bremen | |||
Germany | Germany | |||
Phone: +49-421-218-63921 | Phone: +49-421-218-63921 | |||
EMail: cabo@tzi.org | EMail: cabo@tzi.org | |||
Paul Hoffman | Paul Hoffman | |||
End of changes. 107 change blocks. | ||||
593 lines changed or deleted | 658 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |