--- 1/draft-ietf-cbor-7049bis-02.txt 2018-09-20 15:13:24.606731490 -0700 +++ 2/draft-ietf-cbor-7049bis-03.txt 2018-09-20 15:13:24.734734623 -0700 @@ -1,19 +1,19 @@ Network Working Group C. Bormann Internet-Draft Universitaet Bremen TZI Intended status: Standards Track P. Hoffman -Expires: September 3, 2018 ICANN - March 02, 2018 +Expires: March 24, 2019 ICANN + September 20, 2018 Concise Binary Object Representation (CBOR) - draft-ietf-cbor-7049bis-02 + draft-ietf-cbor-7049bis-03 Abstract The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack. Contributing @@ -36,21 +36,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 3, 2018. + This Internet-Draft will expire on March 24, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -58,102 +58,101 @@ to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 - 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 + 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 7 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 8 - 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 + 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 8 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 11 3.2.1. Indefinite-Length Arrays and Maps . . . . . . . . . . 11 - 3.2.2. Indefinite-Length Byte Strings and Text Strings . . . 14 + 3.2.2. Indefinite-Length Byte Strings and Text Strings . . . 13 3.3. Floating-Point Numbers and Values with No Content . . . . 14 3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 16 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 18 - 3.4.2. Bignums . . . . . . . . . . . . . . . . . . . . . . . 19 + 3.4.2. Bignums . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.3. Decimal Fractions and Bigfloats . . . . . . . . . . . 19 - 3.4.4. Content Hints . . . . . . . . . . . . . . . . . . . . 21 - 3.4.4.1. Encoded CBOR Data Item . . . . . . . . . . . . . 21 + 3.4.4. Content Hints . . . . . . . . . . . . . . . . . . . . 20 + 3.4.4.1. Encoded CBOR Data Item . . . . . . . . . . . . . 20 3.4.4.2. Expected Later Encoding for CBOR-to-JSON - Converters . . . . . . . . . . . . . . . . . . . 21 + Converters . . . . . . . . . . . . . . . . . . . 20 3.4.4.3. Encoded Text . . . . . . . . . . . . . . . . . . 21 - 3.4.5. Self-Describe CBOR . . . . . . . . . . . . . . . . . 22 - 3.5. CBOR Data Models . . . . . . . . . . . . . . . . . . . . 22 - 4. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 24 - 4.1. CBOR in Streaming Applications . . . . . . . . . . . . . 25 - 4.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 25 - 4.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 26 - 4.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 26 - 4.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 27 - 4.3.3. Unknown Additional Information Values . . . . . . . . 27 - 4.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 27 - 4.5. Handling Unknown Simple Values and Tags . . . . . . . . . 28 - 4.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 4.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 29 - 4.7.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 30 - 4.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 31 - 4.9. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 31 - 4.9.1. Length-first map key ordering . . . . . . . . . . . . 33 - 4.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 34 - 5. Converting Data between CBOR and JSON . . . . . . . . . . . . 36 - 5.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 36 - 5.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 37 - 6. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 38 - 6.1. Extension Points . . . . . . . . . . . . . . . . . . . . 38 - 6.2. Curating the Additional Information Space . . . . . . . . 39 - 7. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 40 - 7.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 41 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 - 8.1. Simple Values Registry . . . . . . . . . . . . . . . . . 41 - 8.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 42 - 8.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 42 - 8.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 43 - 8.5. The +cbor Structured Syntax Suffix Registration . . . . . 43 - 9. Security Considerations . . . . . . . . . . . . . . . . . . . 44 - 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 - 11.1. Normative References . . . . . . . . . . . . . . . . . . 45 - 11.2. Informative References . . . . . . . . . . . . . . . . . 46 - Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 48 - Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 52 - Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 55 - Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 57 + 3.4.5. Self-Describe CBOR . . . . . . . . . . . . . . . . . 21 + 4. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 22 + 4.1. CBOR in Streaming Applications . . . . . . . . . . . . . 23 + 4.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 23 + 4.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 24 + 4.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 24 + 4.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 24 + 4.3.3. Unknown Additional Information Values . . . . . . . . 25 + 4.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 25 + 4.5. Handling Unknown Simple Values and Tags . . . . . . . . . 26 + 4.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 26 + 4.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 27 + 4.7.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 28 + 4.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 29 + 4.9. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 29 + 4.9.1. Length-first map key ordering . . . . . . . . . . . . 31 + 4.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 32 + 5. Converting Data between CBOR and JSON . . . . . . . . . . . . 33 + 5.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 33 + 5.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 35 + 6. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 36 + 6.1. Extension Points . . . . . . . . . . . . . . . . . . . . 36 + 6.2. Curating the Additional Information Space . . . . . . . . 37 + 7. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 37 + 7.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 38 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 + 8.1. Simple Values Registry . . . . . . . . . . . . . . . . . 39 + 8.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 39 + 8.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 40 + 8.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 41 + 8.5. The +cbor Structured Syntax Suffix Registration . . . . . 41 + 9. Security Considerations . . . . . . . . . . . . . . . . . . . 42 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 42 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 43 + 11.2. Informative References . . . . . . . . . . . . . . . . . 44 + Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 46 + Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 50 + Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 53 + Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 55 Appendix E. Comparison of Other Binary Formats to CBOR's Design - Objectives . . . . . . . . . . . . . . . . . . . . . 58 - E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 59 - E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 59 - E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 60 - E.4. UBJSON . . . . . . . . . . . . . . . . . . . . . . . . . 60 - E.5. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 60 - E.6. Conciseness on the Wire . . . . . . . . . . . . . . . . . 60 - Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 61 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 + Objectives . . . . . . . . . . . . . . . . . . . . . 56 + E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 57 + E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 57 + E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 58 + E.4. UBJSON . . . . . . . . . . . . . . . . . . . . . . . . . 58 + E.5. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 58 + E.6. Conciseness on the Wire . . . . . . . . . . . . . . . . . 58 + Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 59 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 59 1. Introduction There are hundreds of standardized formats for binary representation of structured data (also known as binary serialization formats). Of those, some are for specific domains of information, while others are generalized for arbitrary data. In the IETF, probably the best-known formats in the latter category are ASN.1's BER and DER [ASN.1]. The format defined here follows some specific design goals that are not well met by current formats. The underlying data model is an - extended version of the JSON data model [RFC7159]. It is important - to note that this is not a proposal that the grammar in RFC 7159 be + extended version of the JSON data model [RFC8259]. It is important + to note that this is not a proposal that the grammar in RFC 8259 be extended in general, since doing so would cause a significant backwards incompatibility with already deployed JSON documents. Instead, this document simply defines its own data model that starts from JSON. Appendix E lists some existing binary formats and discusses how well they do or do not fit the design objectives of the Concise Binary Object Representation (CBOR). 1.1. Objectives @@ -330,39 +329,40 @@ This basic generic data model comes pre-extended by the registration of a number of simple values and tags right in this document, such as: o "false", "true", "null", and "undefined" (simple values identified by 20..23) o integer and floating point values with a larger range and precision than the above (tags 2 to 5) - o application data types such as a point in time (tags 1, 0) + o application data types such as a point in time or an RFC 3339 + date/time string (tags 1, 0) Further elements of the extended generic data model can be (and have been) defined via the IANA registries created for CBOR. Even if such an extension is unknown to a generic encoder or decoder, data items using that extension can be passed to or from the application by representing them at the interface to the application within the basic generic data model, i.e., as generic values of a simple type or generic tagged items. In other words, the basic generic data model is stable as defined in this document, while the extended generic data model expands by the registration of new simple values or tags, but never shrinks. While there is a strong expectation that generic encoders and - decoders can represent "false", "true", and "null" in the form - appropriate for their programming environment, implementation of the - data model extensions created by tags is truly optional and a matter - of implementation quality. + decoders can represent "false", "true", and "null" ("undefined" is + intentionally omitted) in the form appropriate for their programming + environment, implementation of the data model extensions created by + tags is truly optional and a matter of implementation quality. 2.2. Specific Data Models The specific data model for a CBOR-based protocol usually subsets the extended generic data model and assigns application semantics to the data items within this subset and its components. When documenting such specific data models, where it is desired to specify the types of data items, it is preferred to identify the types by their names in the generic data model ("negative integer", "array") instead of by referring to aspects of their CBOR representation ("major type 1", @@ -378,122 +378,124 @@ duplicates and so invalid, and an encoder could encode integral- valued floats as integers or vice versa, perhaps to save encoded bytes. 3. Specification of the CBOR Encoding A CBOR data item (Section 2) is encoded to or decoded from a byte string as described in this section. The encoding is summarized in Table 5. - The initial byte of each data item contains both information about - the major type (the high-order 3 bits, described in Section 3.1) and - additional information (the low-order 5 bits). When the value of the - additional information is less than 24, it is directly used as a - small unsigned integer. When it is 24 to 27, the additional bytes - for a variable-length integer immediately follow; the values 24 to 27 - of the additional information specify that its length is a 1-, 2-, - 4-, or 8-byte unsigned integer, respectively. Additional information - value 31 is used for indefinite-length items, described in - Section 3.2. Additional information values 28 to 30 are reserved for - future expansion. + The initial byte of each encoded data item contains both information + about the major type (the high-order 3 bits, described in + Section 3.1) and additional information (the low-order 5 bits). + Additional information value 31 is used for indefinite-length items, + described in Section 3.2. Additional information values 28 to 30 are + reserved for future expansion. - In all additional information values, the resulting integer is - interpreted depending on the major type. It may represent the actual - data: for example, in integer types, the resulting integer is used - for the value itself. It may instead supply length information: for - example, in byte strings it gives the length of the byte string data - that follows. + Additional information values from 0 to 27 describes how to construct + an "argument", possibly consuming additional bytes. For major type 7 + and additional information 25 to 27 (floating point numbers), there + is a special case; in all other cases the additional information + value, possibly combined with following bytes, the argument + constructed is an unsigned integer. + + When the value of the additional information is less than 24, it is + directly used as the argument's value. When it is 24 to 27, the + argument's value is held in the following 1, 2, 4, or 8, + respectively, bytes, in network byte order. + + The meaning of this argument depends on the major type. For example, + in major type 0, the argument is the value of the data item itself + (and in major type 1 the value of the data item is computed from the + argument); in major type 2 and 3 it gives the length of the string + data in bytes that follows; and in major types 4 and 5 it is used to + determine the number of data items enclosed. + + If the encoded sequence of bytes ends before the end of a data item + would be reached, that encoding is not well-formed. If the encoded + sequence of bytes still has bytes remaining after the outermost + encoded item is parsed, that encoding is not a single well-formed + CBOR item. A CBOR decoder implementation can be based on a jump table with all 256 defined values for the initial byte (Table 5). A decoder in a constrained implementation can instead use the structure of the initial byte and following bytes for more compact code (see Appendix C for a rough impression of how this could look). 3.1. Major Types The following lists the major types and the additional information and other bytes associated with the type. - Major type 0: an unsigned integer. The 5-bit additional information - is either the integer itself (for additional information values 0 - through 23) or the length of additional data. Additional - information 24 means the value is represented in an additional - uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a - uint64_t. For example, the integer 10 is denoted as the one byte - 0b000_01010 (major type 0, additional information 10). The - integer 500 would be 0b000_11001 (major type 0, additional - information 25) followed by the two bytes 0x01f4, which is 500 in - decimal. + Major type 0: an integer in the range 0..2**64-1 inclusive. The + value of the encoded item is the argument itself. For example, + the integer 10 is denoted as the one byte 0b000_01010 (major type + 0, additional information 10). The integer 500 would be + 0b000_11001 (major type 0, additional information 25) followed by + the two bytes 0x01f4, which is 500 in decimal. - Major type 1: a negative integer. The encoding follows the rules - for unsigned integers (major type 0), except that the value is - then -1 minus the encoded unsigned integer. For example, the + Major type 1: a negative integer in the range -2**64..-1 inclusive. + The value of the item is -1 minus the argument. For example, the integer -500 would be 0b001_11001 (major type 1, additional information 25) followed by the two bytes 0x01f3, which is 499 in decimal. - Major type 2: a byte string. The string's length in bytes is - represented following the rules for positive integers (major type - 0). For example, a byte string whose length is 5 would have an - initial byte of 0b010_00101 (major type 2, additional information - 5 for the length), followed by 5 bytes of binary content. A byte - string whose length is 500 would have 3 initial bytes of - 0b010_11001 (major type 2, additional information 25 to indicate a - two-byte length) followed by the two bytes 0x01f4 for a length of - 500, followed by 500 bytes of binary content. + Major type 2: a byte string. The number of bytes in the string is + equal to the argument. For example, a byte string whose length is + 5 would have an initial byte of 0b010_00101 (major type 2, + additional information 5 for the length), followed by 5 bytes of + binary content. A byte string whose length is 500 would have 3 + initial bytes of 0b010_11001 (major type 2, additional information + 25 to indicate a two-byte length) followed by the two bytes 0x01f4 + for a length of 500, followed by 500 bytes of binary content. - Major type 3: a text string, specifically a string of Unicode - characters that is encoded as UTF-8 [RFC3629]. The format of this - type is identical to that of byte strings (major type 2), that is, - as with major type 2, the length gives the number of bytes. This - type is provided for systems that need to interpret or display - human-readable text, and allows the differentiation between - unstructured bytes and text that has a specified repertoire and - encoding. In contrast to formats such as JSON, the Unicode - characters in this type are never escaped. Thus, a newline - character (U+000A) is always represented in a string as the byte - 0x0a, and never as the bytes 0x5c6e (the characters "\" and "n") - or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and - "a"). + Major type 3: a text string (Section 2), encoded as UTF-8 + ([RFC3629]). The number of bytes in the string is equal to the + argument. A string containing an invalid UTF-8 sequence is well- + formed but invalid. This type is provided for systems that need + to interpret or display human-readable text, and allows the + differentiation between unstructured bytes and text that has a + specified repertoire and encoding. In contrast to formats such as + JSON, the Unicode characters in this type are never escaped. + Thus, a newline character (U+000A) is always represented in a + string as the byte 0x0a, and never as the bytes 0x5c6e (the + characters "\" and "n") or as 0x5c7530303061 (the characters "\", + "u", "0", "0", "0", and "a"). Major type 4: an array of data items. Arrays are also called lists, - sequences, or tuples. The array's length follows the rules for - byte strings (major type 2), except that the length denotes the - number of data items, not the length in bytes that the array takes - up. Items in an array do not need to all be of the same type. - For example, an array that contains 10 items of any type would - have an initial byte of 0b100_01010 (major type of 4, additional - information of 10 for the length) followed by the 10 remaining - items. + sequences, or tuples. The argument is the number of data items in + the array. Items in an array do not need to all be of the same + type. For example, an array that contains 10 items of any type + would have an initial byte of 0b100_01010 (major type of 4, + additional information of 10 for the length) followed by the 10 + remaining items. Major type 5: a map of pairs of data items. Maps are also called tables, dictionaries, hashes, or objects (in JSON). A map is comprised of pairs of data items, each pair consisting of a key - that is immediately followed by a value. The map's length follows - the rules for byte strings (major type 2), except that the length - denotes the number of pairs, not the length in bytes that the map - takes up. For example, a map that contains 9 pairs would have an - initial byte of 0b101_01001 (major type of 5, additional - information of 9 for the number of pairs) followed by the 18 - remaining items. The first item is the first key, the second item - is the first value, the third item is the second key, and so on. - A map that has duplicate keys may be well-formed, but it is not - valid, and thus it causes indeterminate decoding; see also - Section 4.7. + that is immediately followed by a value. The argument is the + number of _pairs_ of data items in the map. For example, a map + that contains 9 pairs would have an initial byte of 0b101_01001 + (major type of 5, additional information of 9 for the number of + pairs) followed by the 18 remaining items. The first item is the + first key, the second item is the first value, the third item is + the second key, and so on. A map that has duplicate keys may be + well-formed, but it is not valid, and thus it causes indeterminate + decoding; see also Section 4.7. - Major type 6: optional semantic tagging of other major types. See - Section 3.4. + Major type 6: a tagged data item whose tag is the argument and whose + value is the single following encoded item. See Section 3.4. - Major type 7: floating-point numbers and simple data types that need - no content, as well as the "break" stop code. See Section 3.3. + Major type 7: floating-point numbers and simple values, as well as + the "break" stop code. See Section 3.3. These eight major types lead to a simple table showing which of the 256 possible values for the initial byte of a data item are used (Table 5). In major types 6 and 7, many of the possible values are reserved for future specification. See Section 8 for more information on these values. 3.2. Indefinite Lengths for Some Major Types @@ -720,24 +721,24 @@ encode Null as the two-byte sequence of 0xf816, and MUST NOT encode Undefined value as the two-byte sequence of 0xf817. A decoder MUST treat these two-byte sequences as an error. Similar prohibitions apply to the unassigned simple values as well. 3.4. Optional Tagging of Items In CBOR, a data item can optionally be preceded by a tag to give it additional semantics while retaining its structure. The tag is major type 6, and represents an integer number as indicated by the tag's - integer value; the (sole) data item is carried as content data. If a - tag requires structured data, this structure is encoded into the - nested data item. The definition of a tag usually restricts what - kinds of nested data item or items can be carried by a tag. + argument (Section 3); the (sole) data item is carried as content + data. If a tag requires structured data, this structure is encoded + into the nested data item. The definition of a tag usually restricts + what kinds of nested data item or items are valid. The initial bytes of the tag follow the rules for positive integers (major type 0). The tag is followed by a single data item of any type. For example, assume that a byte string of length 12 is marked with a tag to indicate it is a positive bignum (Section 3.4.2). This would be marked as 0b110_00010 (major type 6, additional information 2 for the tag) followed by 0b010_01100 (major type 2, additional information of 12 for the length) followed by the 12 bytes of the bignum. @@ -965,25 +966,31 @@ other ways to encode binary data in strings. 3.4.4.3. Encoded Text Some text strings hold data that have formats widely used on the Internet, and sometimes those formats can be validated and presented to the application in appropriate form by the decoder. There are tags for some of these formats. o Tag 32 is for URIs, as defined in [RFC3986]; + o Tags 33 and 34 are for base64url- and base64-encoded text strings, as defined in [RFC4648]; - o Tag 35 is for regular expressions in Perl Compatible Regular - Expressions (PCRE) / JavaScript syntax [ECMA262]. + o Tag 35 is for regular expressions that are roughly in Perl + Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a + version of the JavaScript regular expression syntax [ECMA262]. + (Note that more specific identification may be necessary if the + actual version of the specification underlying the regular + expression, or more than just the text of the regular expression + itself, need to be conveyed.) o Tag 36 is for MIME messages (including all headers), as defined in [RFC2045]; Note that tags 33 and 34 differ from 21 and 22 in that the data is transported in base-encoded form for the former and in raw byte string form for the latter. 3.4.5. Self-Describe CBOR @@ -1005,102 +1012,20 @@ use as a distinguishing mark for frequently used file types. In particular, it is not a valid start of a Unicode text in any Unicode encoding if followed by a valid CBOR data item. For instance, a decoder might be able to parse both CBOR and JSON. Such a decoder would need to mechanically distinguish the two formats. An easy way for an encoder to help the decoder would be to tag the entire CBOR item with tag 55799, the serialization of which will never be found at the beginning of a JSON text. -3.5. CBOR Data Models - - CBOR is explicit about its generic data model, which defines the set - of all data items that can be represented in CBOR. Its basic generic - data model is extensible by the registration of simple type values - and tags. Applications can then subset the resulting extended - generic data model to build their specific data models. - - Within environments that can represent the data items in the generic - data model, generic CBOR encoders and decoders can be implemented - (which usually involves defining additional implementation data types - for those data items that do not already have a natural - representation in the environment). The ability to provide generic - encoders and decoders is an explicit design goal of CBOR; however - many applications will provide their own application-specific - encoders and/or decoders. - - In the basic (un-extended) generic data model, a data item is one of: - - o an integer in the range -2**64..2**64-1 inclusive - - o a simple value, identified by a number between 0 and 255, but - distinct from that number - - o a floating point value, distinct from an integer, out of the set - representable by IEEE 754 binary64 (including non-finites) - - o a sequence of zero or more bytes ("byte string") - - o a sequence of zero or more Unicode code points ("text string") - - o a sequence of zero or more data items ("array") - - o a mapping (mathematical function) from zero or more data items - ("keys") each to a data item ("values"), ("map") - - o a tagged data item, comprising a tag (an integer in the range - 0..2**64-1) and a value (a data item) - - Note that integer and floating-point values are distinct in this - model, even if they have the same numeric value. - - This basic generic data model comes pre-extended by the registration - of a number of simple values and tags right in this document, such - as: - - o "false", "true", "null", and "undefined" (simple values identified - by 20..23) - - o integer and floating point values with a larger range and - precision than the above (tags 2 to 5) - - o application data types such as a point in time or an RFC 3339 - date/time string (tags 1, 0) - - Further elements of the extended generic data model can be (and have - been) defined via the IANA registries created for CBOR. Even if such - an extension is unknown to a generic encoder or decoder, data items - using that extension can be passed to or from the application by - representing them at the interface to the application within the - basic generic data model, i.e., as generic values of a simple type or - generic tagged items. - - In other words, the basic generic data model is stable as defined in - this document, while the extended generic data model expands by the - registration of new simple values or tags, but never shrinks. - - While there is a strong expectation that generic encoders and - decoders can represent "false", "true", and "null" ("undefined" is - intentionally omitted) in the form appropriate for their programming - environment, implementation of the data model extensions created by - tags is truly optional and a matter of implementation quality. - - A specific data model usually subsets the extended generic data model - and assigns application semantics to the data items within this - subset and its components. When documenting such specific data - models, where it is desired to specify the types of data items, it is - preferred to identify the types by their names in the generic data - model ("negative integer", "array") instead of by referring to - aspects of their CBOR representation ("major type 1", "major type - 4"). - 4. Creating CBOR-Based Protocols Data formats such as CBOR are often used in environments where there is no format negotiation. A specific design goal of CBOR is to not need any included or assumed schema: a decoder can take a CBOR item and decode it with no other knowledge. Of course, in real-world implementations, the encoder and the decoder will have a shared view of what should be in a CBOR data item. For example, an agreed-to format might be "the item is an array whose @@ -1330,51 +1256,50 @@ interwork with JSON-based applications, keys probably should be limited to UTF-8 strings only; otherwise, there has to be a specified mapping from the other CBOR types to Unicode characters, and this often leads to implementation errors. In applications where keys are numeric in nature and numeric ordering of keys is important to the application, directly using the numbers for the keys is useful. If multiple types of keys are to be used, consideration should be given to how these types would be represented in the specific programming environments that are to be used. For example, in - JavaScript objects, a key of integer 1 cannot be distinguished from a - key of string "1". This means that, if integer keys are used, the - simultaneous use of string keys that look like numbers needs to be - avoided. Again, this leads to the conclusion that keys should be of - a single CBOR type. + JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished + from a key of floating point 1.0. This means that, if integer keys + are used, the protocol needs to avoid use of floating-point keys the + values of which happen to be integer numbers in the same map. Decoders that deliver data items nested within a CBOR data item immediately on decoding them ("streaming decoders") often do not keep the state that is necessary to ascertain uniqueness of a key in a map. Similarly, an encoder that can start encoding data items before the enclosing data item is completely available ("streaming encoder") may want to reduce its overhead significantly by relying on its data source to maintain uniqueness. A CBOR-based protocol should make an intentional decision about what to do when a receiving application does see multiple identical keys in a map. The resulting rule in the protocol should respect the CBOR data model: it cannot prescribe a specific handling of the entries with the identical keys, except that it might have a rule that having identical keys in a map indicates a malformed map and that the decoder has to stop with an error. Duplicate keys are also prohibited by CBOR decoders that are using strict mode (Section 4.10). The CBOR data model for maps does not allow ascribing semantics to - the order of the key/value pairs in the map representation. Thus, it - would be a very bad practice to define a CBOR-based protocol in such - a way that changing the key/value pair order in a map would change - the semantics, apart from trivial aspects (cache usage, etc.). (A - CBOR-based protocol can prescribe a specific order of serialization, - such as for canonicalization.) + the order of the key/value pairs in the map representation. Thus, a + CBOR-based protocol MUST NOT specify that changing the key/value pair + order in a map would change the semantics, except to specify that + some, e.g. non-canonical, orders are disallowed. Timing, cache + usage, and other side channels are not considered part of the + semantics. Applications for constrained devices that have maps with 24 or fewer frequently used keys should consider using small integers (and those with up to 48 frequently used keys should consider also using small negative integers) because the keys can then be encoded in a single byte. 4.7.1. Equivalence of Keys This notion of equivalence must be used to determine whether keys in @@ -1466,21 +1391,21 @@ 4. "z", encoded as 0x617a. 5. "aa", encoded as 0x626161. 6. [100], encoded as 0x811864. 7. [-1], encoded as 0x8120. 8. false, encoded as 0xf4. - o Indefinite-length items MUST not appear. They can be encoded as + o Indefinite-length items MUST NOT appear. They can be encoded as definite-length items instead. If a protocol allows for IEEE floats, then additional canonicalization rules might need to be added. One example rule might be to have all floats start as a 64-bit float, then do a test conversion to a 32-bit float; if the result is the same numeric value, use the shorter value and repeat the process with a test conversion to a 16-bit float. (This rule selects 16-bit float for positive and negative Infinity as well.) Also, there are many representations for NaN. If NaN is an allowed value, it must always @@ -1650,21 +1575,21 @@ advice deals with these by converting them to a single substitute value, such as a JSON null. o An integer (major type 0 or 1) becomes a JSON number. o A byte string (major type 2) that is not embedded in a tag that specifies a proposed encoding is encoded in base64url without padding and becomes a JSON string. o A UTF-8 string (major type 3) becomes a JSON string. Note that - JSON requires escaping certain characters (RFC 7159, Section 7): + JSON requires escaping certain characters ([RFC8259], Section 7): quotation mark (U+0022), reverse solidus (U+005C), and the "C0 control characters" (U+0000 through U+001F). All other characters are copied unchanged into the JSON UTF-8 string. o An array (major type 4) becomes a JSON array. o A map (major type 5) becomes a JSON object. This is possible directly only if all keys are UTF-8 strings. A converter might also convert other keys into UTF-8 strings (such as by converting integers into strings containing their decimal representation); @@ -1830,21 +1755,21 @@ human-readable diagnostic notation. All actual interchange always happens in the binary format. Note that this truly is a diagnostic format; it is not meant to be parsed. Therefore, no formal definition (as in ABNF) is given in this document. (Implementers looking for a text-based format for representing CBOR data items in configuration files may also want to consider YAML [YAML].) The diagnostic notation is loosely based on JSON as it is defined in - RFC 7159, extending it where needed. + RFC 8259, extending it where needed. The notation borrows the JSON syntax for numbers (integer and floating point), True (>true<), False (>false<), Null (>null<), UTF-8 strings, arrays, and maps (maps are called objects in JSON; the diagnostic notation extends JSON here by allowing any data item in the key position). Undefined is written >undefined< as in JavaScript. The non-finite floating-point numbers Infinity, -Infinity, and NaN are written exactly as in this sentence (this is also a way they can be written in JavaScript, although JSON does not allow them). A tagged item is written as an integer number for the @@ -1896,21 +1821,21 @@ encoding indicator "_" is thus an abbreviation of the full form "_7", which is not used.) As a special case, byte and text strings of indefinite length can be notated in the form (_ h'0123', h'4567') and (_ "foo", "bar"). 8. IANA Considerations IANA has created two registries for new CBOR values. The registries are separate, that is, not under an umbrella registry, and follow the - rules in [RFC5226]. IANA has also assigned a new MIME media type and + rules in [RFC8126]. IANA has also assigned a new MIME media type and an associated Constrained Application Protocol (CoAP) Content-Format entry. 8.1. Simple Values Registry IANA has created the "Concise Binary Object Representation (CBOR) Simple Values" registry. The initial values are shown in Table 2. New entries in the range 0 to 19 are assigned by Standards Action. It is suggested that these Standards Actions allocate values starting @@ -2084,25 +2008,25 @@ This document also incorporates suggestions made by many people, notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer. 11. References 11.1. Normative References - [ECMA262] European Computer Manufacturers Association, "ECMAScript - Language Specification 5.1 Edition", ECMA Standard ECMA- - 262, June 2011, . + [ECMA262] Ecma International, "ECMAScript 2018 Language + Specification", ECMA Standard ECMA-262, 9th Edition, June + 2018, . [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . @@ -2121,24 +2045,24 @@ . [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom Syndication Format", RFC 4287, DOI 10.17487/RFC4287, December 2005, . [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, . - [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an - IANA Considerations Section in RFCs", RFC 5226, - DOI 10.17487/RFC5226, May 2008, - . + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + . [TIME_T] The Open Group Base Specifications, "Vol. 1: Base Definitions, Issue 7", Section 4.15 'Seconds Since the Epoch', IEEE Std 1003.1, 2013 Edition, 2013, . 11.2. Informative References [ASN.1] International Telecommunication Union, "Information @@ -2146,42 +2070,46 @@ Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", ITU-T Recommendation X.690, 1994. [BSON] Various, "BSON - Binary JSON", 2013, . [MessagePack] Furuhashi, S., "MessagePack", 2013, . + [PCRE] Hazel, P., "PCRE - Perl Compatible Regular Expressions", + 2018, . + [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, . [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, January 2013, . [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013, . - [RFC7159] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data - Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March - 2014, . - [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for Constrained-Node Networks", RFC 7228, DOI 10.17487/RFC7228, May 2014, . + [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data + Interchange Format", STD 90, RFC 8259, + DOI 10.17487/RFC8259, December 2017, + . + [UBJSON] The Buzz Media, "Universal Binary JSON Specification", 2013, . [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup Language (YAML[TM]) Version 1.2", 3rd Edition, October 2009, . Appendix A. Examples The following table provides some CBOR-encoded values in hexadecimal @@ -2800,21 +2728,21 @@ +-------------+--------------------------+--------------------------+ Table 6: Examples for Different Levels of Conciseness Appendix F. Changes from RFC 7049 The following is a list of known changes from RFC 7049. This list is non-authoritative. It is meant to help reviewers see the significant differences. - o Updated reference for [RFC4267] to [RFC7159] in many places + o Updated reference for [RFC4267] to [RFC8259] in many places o Updated reference for [CNN-TERMS] to [RFC7228] o Added a comment to the last example in Section 2.2.1 (added "Second value") o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> "0b000_11001")