--- 1/draft-ietf-cbor-7049bis-08.txt 2019-11-04 17:13:09.854917601 -0800 +++ 2/draft-ietf-cbor-7049bis-09.txt 2019-11-04 17:13:10.002921371 -0800 @@ -1,19 +1,19 @@ Network Working Group C. Bormann Internet-Draft Universitaet Bremen TZI Obsoletes: 7049 (if approved) P. Hoffman Intended status: Standards Track ICANN Expires: May 8, 2020 November 05, 2019 Concise Binary Object Representation (CBOR) - draft-ietf-cbor-7049bis-08 + draft-ietf-cbor-7049bis-09 Abstract The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack. This document is a revised edition of RFC 7049, with editorial @@ -75,53 +75,52 @@ 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 3.3. Floating-Point Numbers and Values with No Content . . . . 16 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 18 - 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 20 - 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 20 + 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 21 + 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 21 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 21 - 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 21 + 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 22 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 24 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 24 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters . . . . . . . . . . . . . . . . . . . 24 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 25 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 26 4. Serialization Considerations . . . . . . . . . . . . . . . . 26 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 26 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 27 - 4.2.1. Core Deterministic Encoding Requirements . . . . . . 27 - 4.2.2. Additional Deterministic Encoding Considerations . . 28 + 4.2.1. Core Deterministic Encoding Requirements . . . . . . 28 + 4.2.2. Additional Deterministic Encoding Considerations . . 29 4.2.3. Length-first map key ordering . . . . . . . . . . . . 30 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 31 - 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 31 + 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 32 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 32 - 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 32 + 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 33 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 33 - 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 33 - 5.4. Handling Unknown Simple Values and Tag numbers . . . . . 33 - 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 34 - 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 35 - 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 36 - 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 37 - 5.8. Validity Checking and Robustness . . . . . . . . . . . . 37 + 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 34 + 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 34 + 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 35 + 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 36 + 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 37 + 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 38 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 38 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 38 - 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 40 - 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 41 + 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 39 + 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 40 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 41 7.2. Curating the Additional Information Space . . . . . . . . 42 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 42 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 43 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 44 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 44 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 45 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 46 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 46 @@ -277,31 +277,30 @@ Data Stream: A sequence of zero or more data items, not further assembled into a larger containing data item. The independent data items that make up a data stream are sometimes also referred to as "top-level data items". Well-formed: A data item that follows the syntactic structure of CBOR. A well-formed data item uses the initial bytes and the byte strings and/or data items that are implied by their values as defined in CBOR and does not include following extraneous data. - CBOR decoders by definition only return contents from well-formed data items. Valid: A data item that is well-formed and also follows the semantic restrictions that apply to CBOR data items. Expected: Besides its normal English meaning, the term "expected" is used to describe requirements beyond CBOR validity that an application has on its input data. Well-formed (processable at - all), valid (checked by a valdity-checking generic decoder), and + all), valid (checked by a validity-checking generic decoder), and expected (checked by the application) form a hierarchy of layers of acceptability. Stream decoder: A process that decodes a data stream and makes each of the data items in the sequence available to an application as they are received. Where bit arithmetic or data types are explained, this document uses the notation familiar from the programming language C, except that "**" denotes exponentiation. Similar to the "0x" notation for @@ -898,20 +896,33 @@ | | string | | | | | | | 36 | text | MIME message; see Section 3.4.6.3 | | | string | | | | | | | 55799 | multiple | Self-described CBOR; see Section 3.4.7 | +----------+----------+---------------------------------------------+ Table 4: Tag numbers defined in RFC 7049 + Conceptually, tags are interpreted in the generic data model, not at + (de-)serialization time. A small number of tags (specifically, tag + number 25 and tag number 29) have been registered with semantics that + do require processing at (de-)serialization time: The decoder needs + to be aware and the encoder needs to be under control of the exact + sequence in which data items are encoded into the CBOR data stream. + This means these tags cannot be implemented on top of every generic + CBOR encoder/decoder (which might not reflect the serialization order + for entries in a map at the data model level and vice versa); their + implementation therefore typically needs to be integrated into the + generic encoder/decoder. The definition of new tags with this + property is NOT RECOMMENDED. + 3.4.1. Date and Time Protocols using tag numbers 0 and 1 extend the generic data model (Section 2) with data items representing points in time. 3.4.2. Standard Date/Time String Tag number 0 contains a text string in the standard format described by the "date-time" production in [RFC3339], as refined by Section 3.3 of [RFC4287], representing the point in time described there. A @@ -1139,21 +1150,25 @@ (Note that more specific identification may be necessary if the actual version of the specification underlying the regular expression, or more than just the text of the regular expression itself, need to be conveyed.) Any contained string value is valid. o Tag number 36 is for MIME messages (including all headers), as defined in [RFC2045]. A text string that isn't a valid MIME message is invalid. (For this tag, validity checking may be particularly onerous for a generic decoder and might therefore not - be offered.) + be offered. Note that many MIME messages are general binary data + and can therefore not be represented in a text string; + [IANA.cbor-tags] lists a registration for tag number 257 that is + similar to tag number 36 but is used with an enclosed byte + string.) Note that tag numbers 33 and 34 differ from 21 and 22 in that the data is transported in base-encoded form for the former and in raw byte string form for the latter. 3.4.7. Self-Described CBOR In many applications, it will be clear from the context that CBOR is being employed for encoding a data item. For instance, a specific protocol might specify the use of CBOR, or a media type is indicated @@ -1285,23 +1300,21 @@ If a protocol allows for IEEE floats, then additional deterministic encoding rules might need to be added. One example rule might be to have all floats start as a 64-bit float, then do a test conversion to a 32-bit float; if the result is the same numeric value, use the shorter value and repeat the process with a test conversion to a 16-bit float. (This rule selects 16-bit float for positive and negative Infinity as well.) Although IEEE floats can represent both positive and negative zero as distinct values, the application might not distinguish these and might decide to represent all zero values - with a positive sign, disallowing negative zero. Also, there are - many representations for NaN. If NaN is an allowed value, it must - always be represented as 0xf97e00. + with a positive sign, disallowing negative zero. CBOR tags present additional considerations for deterministic encoding. If a CBOR-based protocol were to provide the same semantics for the presence and absence of a specific tag (e.g., by allowing both tag 1 data items and raw numbers in a date/time position, treating the latter as if they were tagged), the deterministic format would not allow them. In a protocol that requires tags in certain places to obtain specific semantics, the tag needs to appear in the deterministic format as well. @@ -1319,22 +1332,34 @@ major types 0 and 1, and other values as the smallest of 16-, 32-, or 64-bit floating point that accurately represents the value, 2. Encode all values as the smallest of 16-, 32-, or 64-bit floating point that accurately represents the value, even for integral values, or 3. Encode all values as 64-bit floating point. - If NaN is an allowed value, the protocol needs to pick a single - representation, for example 0xf97e00. + Rule 1 straddles the boundaries between integers and floating + point values, and Rule 3 does not use preferred encoding, so Rule + 2 may be a good choice in many cases. + + If NaN is an allowed value and there is no intent to support NaN + payloads or signaling NaNs, the protocol needs to pick a single + representation, for example 0xf97e00. If that simple choice is + not possible, specific attention will be needed for NaN handling. + + Subnormal numbers (nonzero numbers with the lowest possible + exponent of a given IEEE 754 number format) may be flushed to zero + outputs or be treated as zero inputs in some floating point + implementations. A protocol's deterministic encoding may want to + exclude them from interchange, interchanging zero instead. o If a protocol includes a field that can express integers with an absolute value of 2^64 or larger using tag numbers 2 or 3 (Section 3.4.4), the protocol's deterministic encoding needs to specify whether small integers are expressed using the tag or major types 0 and 1. o A protocol might give encoders the choice of representing a URL as either a text string or, using Section 3.4.6.3, tag number 32 containing a text string. This protocol's deterministic encoding @@ -1487,72 +1512,102 @@ 2. Issue an error and stop processing altogether. A CBOR-based protocol MUST specify which of these options its decoders take, for each kind of invalid item they might encounter. Such problems might occur at the basic validity level of CBOR or in the context of tags (tag validity). 5.3.1. Basic validity + Two kinds of validity errors can occur in the basic generic data + model: + Duplicate keys in a map: Generic decoders (Section 5.2) make data available to applications using the native CBOR data model. That data model includes maps (key-value mappings with unique keys), not multimaps (key-value mappings where multiple entries can have the same key). Thus, a generic decoder that gets a CBOR map item that has duplicate keys will decode to a map with only one instance of that key, or it might stop processing altogether. On the other hand, a "streaming decoder" may not even be able to notice (Section 5.6). Invalid UTF-8 string: A decoder might or might not want to verify that the sequence of bytes in a UTF-8 string (major type 3) is actually valid UTF-8 and react appropriately. 5.3.2. Tag validity + Two additional kinds of validity errors are introduced by adding tags + to the basic generic data model: + Inadmissible type for tag content: Tags (Section 3.4) specify what type of data item is supposed to be enclosed by the tag; for example, the tags for positive or negative bignums are supposed to be put on byte strings. A decoder that decodes the tagged data item into a native representation (a native big integer in this example) is expected to check the type of the data item being tagged. Even decoders that don't have such native representations available in their environment may perform the check on those tags known to them and react appropriately. Inadmissible value for tag content: The type of data item may be admissible for a tag's content, but the specific value may not be; e.g., a value of "yesterday" is not acceptable for the content of tag 0, even though it properly is a text string. A decoder that normally ingests such tags into equivalent platform types might present this tag to the application in a similar way to how it would present a tag with an unknown tag number (Section 5.4). -5.4. Handling Unknown Simple Values and Tag numbers +5.4. Validity and Evolution - A decoder that comes across a simple value (Section 3.3) that it does - not recognize, such as a value that was added to the IANA registry - after the decoder was deployed or a value that the decoder chose not - to implement, might issue a warning, might stop processing - altogether, might handle the error by making the unknown value - available to the application as such (as is expected of generic - decoders), or take some other type of action. + A decoder with validity checking will expend the effort to reliably + detect data items with validity errors. For example, such a decoder + needs to have an API that reports an error (and does not return data) + for a CBOR data item that contains any of the validity errors listed + in the previous subsection. - A decoder that comes across a tag number (Section 3.4) that it does - not recognize, such as a tag number that was added to the IANA - registry after the decoder was deployed or a tag number that the - decoder chose not to implement, might issue a warning, might stop - processing altogether, might handle the error and present the unknown - tag number together with the enclosed data item to the application - (as is expected of generic decoders), or take some other type of - action. + The set of tags defined in the tag registry (Section 9.2), as well as + the set of simple values defined in the simple values registry + (Section 9.1), can grow at any time beyond the set understood by a + generic decoder. A validity-checking decoder can do one of two + things when it encounters such a case that it does not recognize: + + o It can report an error (and not return data). Note that this + error is not a validity error per se. This kind of error is more + likely to be raised by a decoder that would be performing validity + checking if this were a known case. + + o It can emit the unknown item (type, value, and, for tags, the + decoded tagged data item) to the application calling the decoder, + with an indication that the decoder did not recognize that tag + number or simple value. + + The latter approach, which is also appropriate for decoders that do + not support validity checking, provides forward compatibility with + newly registered tags and simple values without the requirement to + update the encoder at the same time as the calling application. (For + this, the API for the decoder needs to have a way to mark unknown + items so that the calling application can handle them in a manner + appropriate for the program.) + + Since some of the processing needed for validity checking may have an + appreciable cost (in particular with duplicate detection for maps), + support of validity checking is not a requirement placed on all CBOR + decoders. + + Some encoders will rely on their applications to provide input data + in such a way that valid CBOR results from the encoder. A generic + encoder also may want to provide a validity-checking mode where it + reliably limits its output to valid CBOR, independent of whether or + not its application is indeed providing API-conformant data. 5.5. Numbers CBOR-based protocols should take into account that different language environments pose different restrictions on the range and precision of numbers that are representable. For example, the JavaScript number system treats all numbers as floating point, which may result in silent loss of precision in decoding integers with more than 53 significant bits. A protocol that uses numbers should define its expectations on the handling of non-trivial numbers in decoders and @@ -1610,21 +1665,21 @@ may want to reduce its overhead significantly by relying on its data source to maintain uniqueness. A CBOR-based protocol MUST define what to do when a receiving application does see multiple identical keys in a map. The resulting rule in the protocol MUST respect the CBOR data model: it cannot prescribe a specific handling of the entries with the identical keys, except that it might have a rule that having identical keys in a map indicates a malformed map and that the decoder has to stop with an error. Duplicate keys are also prohibited by CBOR decoders that - enforce validity (Section 5.8). + enforce validity (Section 5.4). The CBOR data model for maps does not allow ascribing semantics to the order of the key/value pairs in the map representation. Thus, a CBOR-based protocol MUST NOT specify that changing the key/value pair order in a map would change the semantics, except to specify that some, orders are disallowed, for example where they would not meet the requirements of a deterministic encoding (Section 4.2). (Any secondary effects of map ordering such as on timing, cache usage, and other potential side channels are not considered part of the semantics but may be enough reason on its own for a protocol to @@ -1679,84 +1735,20 @@ distinguish values for map keys that are equal for this purpose at the generic data model level. 5.7. Undefined Values In some CBOR-based protocols, the simple value (Section 3.3) of Undefined might be used by an encoder as a substitute for a data item with an encoding problem, in order to allow the rest of the enclosing data items to be encoded without harm. -5.8. Validity Checking and Robustness - - Some areas of application of CBOR do not require deterministic - encoding (Section 4.2) but may require that different decoders reach - the same (semantically equivalent) results, even in the presence of - potentially malicious data. This can be required if one application - (such as a firewall or other protecting entity) makes a decision - based on the data that another application, which independently - decodes the data, relies on. - - Normally, it is the responsibility of the sender to avoid ambiguously - decodable data. However, the sender might be an attacker specially - making up CBOR data such that it will be interpreted differently by - different decoders in an attempt to exploit that as a vulnerability. - Generic decoders used in applications where this might be a problem - can help by providing a validity-checking mode in which it is also - the responsibility of the generic decoder to reject invalid data. It - is expected that firewalls and other security systems that decode - CBOR will employ their decoders with validity checking applied. - - A decoder with validity checking will expend the effort to reliably - detect invalid data items (Section 5.3). For example, such a decoder - needs to have an API that reports an error (and does not return data) - for a CBOR data item that contains any of the following: - - o a map (major type 5) that has more than one entry with the same - key - - o a tag that is used on a data item of the incorrect type - - o a data item that is incorrectly formatted for the type given to - it, such as invalid UTF-8 in a text string or data that (even if - of the correct type) cannot be interpreted with the specific tag - number that it has been tagged with - - A validity-checking decoder can do one of two things when it - encounters a tag number or simple value that it does not recognize: - - o It can report an error (and not return data). - - o It can emit the unknown item (type, value, and, for tags, the - decoded tagged data item) to the application calling the decoder, - with an indication that the decoder did not recognize that tag - number or simple value. - - The latter approach, which is also appropriate for decoders that do - not support validity checking, provides forward compatibility with - newly registered tags and simple values without the requirement to - update the encoder at the same time as the calling application. (For - this, the API for the decoder needs to have a way to mark unknown - items so that the calling application can handle them in a manner - appropriate for the program.) - - Since some of the processing needed for validity checking may have an - appreciable cost (in particular with duplicate detection for maps), - support of validity checking is not a requirement placed on all CBOR - decoders. - - Some encoders will rely on their applications to provide input data - in such a way that valid CBOR results. A generic encoder also may - want to provide a validity-checking mode where it reliably limits its - output to valid CBOR, independent of whether or not its application - is providing API-conformant data. - 6. Converting Data between CBOR and JSON This section gives non-normative advice about converting between CBOR and JSON. Implementations of converters are free to use whichever advice here they want. It is worth noting that a JSON text is a sequence of characters, not an encoded sequence of bytes, while a CBOR data item consists of bytes, not characters. @@ -2238,21 +2229,21 @@ to disrupt the encoder. Protocols should be defined in such a way that potential multiple interpretations are reliably reduced to a single interpretation. For example, an attacker could make use of invalid input such as duplicate keys in maps, or exploit different precision in processing numbers to make one application base its decisions on a different interpretation than the one that will be used by a second application. To facilitate consistent interpretation, encoder and decoder implementations should provide a validity checking mode of - operation (Section 5.8). Note, however, that a generic decoder + operation (Section 5.4). Note, however, that a generic decoder cannot know about all requirements that an application poses on its input data; it is therefore not relieving the application from performing its own input checking. Also, since the set of defined tag numbers evolves, the application may employ a tag number that is not yet supported for validity checking by the generic decoder it uses. Generic decoders therefore need to provide documentation which tag numbers they support and what validity checking they can provide for each of them as well as for basic CBOR validity (UTF-8 checking, duplicate map key checking).