--- 1/draft-ietf-cbor-packed-04.txt 2022-04-20 06:13:14.971886646 -0700 +++ 2/draft-ietf-cbor-packed-05.txt 2022-04-20 06:13:15.011887651 -0700 @@ -1,33 +1,33 @@ Network Working Group C. Bormann Internet-Draft Universität Bremen TZI -Intended status: Informational 13 February 2022 -Expires: 17 August 2022 +Intended status: Informational 20 April 2022 +Expires: 22 October 2022 Packed CBOR - draft-ietf-cbor-packed-04 + draft-ietf-cbor-packed-05 Abstract - The Concise Binary Object Representation (CBOR, RFC 8949) is a data - format whose design goals include the possibility of extremely small - code size, fairly small message size, and extensibility without the - need for version negotiation. + The Concise Binary Object Representation (CBOR, RFC 8949 == STD 94) + is a data format whose design goals include the possibility of + extremely small code size, fairly small message size, and + extensibility without the need for version negotiation. CBOR does not provide any forms of data compression. CBOR data - items, in particular when generated from legacy data models often + items, in particular when generated from legacy data models, often allow considerable gains in compactness when applying data compression. While traditional data compression techniques such as DEFLATE (RFC 1951) can work well for CBOR encoded data items, their - disadvantage is that the receiver needs to unpack the compressed form - to make use of data. + disadvantage is that the receiver needs to uncompress the compressed + form to make use of the data. This specification describes Packed CBOR, a simple transformation of a CBOR data item into another CBOR data item that is almost as easy to consume as the original CBOR data item. A separate decompression step is therefore often not required at the receiver. Note to Readers This is a working-group draft of the CBOR working group of the IETF, https://datatracker.ietf.org/wg/cbor/about/ @@ -45,21 +45,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on 17 August 2022. + This Internet-Draft will expire on 22 October 2022. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights @@ -67,44 +67,62 @@ extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Packing Tables . . . . . . . . . . . . . . . . . . . . . 4 - 2.2. Referencing Shared Items . . . . . . . . . . . . . . . . 4 - 2.3. Referencing Affix Items . . . . . . . . . . . . . . . . . 5 + 2.2. Referencing Shared Items . . . . . . . . . . . . . . . . 5 + 2.3. Referencing Affix Items . . . . . . . . . . . . . . . . . 6 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 3. Table Setup . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. Basic Packed CBOR . . . . . . . . . . . . . . . . . . . . 9 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 - 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 - 6.1. Normative References . . . . . . . . . . . . . . . . . . 11 + 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 6.1. Normative References . . . . . . . . . . . . . . . . . . 12 6.2. Informative References . . . . . . . . . . . . . . . . . 12 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 13 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 17 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 1. Introduction - (TO DO, expand on text from abstract here; move references here and - neuter them in the abstract as per Section 4.3 of [RFC7322].) - The specification defines a transformation from a Packed CBOR data - item to the original CBOR data item; it does not define an algorithm - for an actual packer. Different packers can differ in the amount of - effort they invest in arriving at a minimal packed form. + The Concise Binary Object Representation (CBOR, [STD94]) is a data + format whose design goals include the possibility of extremely small + code size, fairly small message size, and extensibility without the + need for version negotiation. - Packed CBOR can employ two kinds of optimization: + CBOR does not provide any forms of data compression. CBOR data + items, in particular when generated from legacy data models, often + allow considerable gains in compactness when applying data + compression. While traditional data compression techniques such as + DEFLATE [RFC1951] can work well for CBOR encoded data items, their + disadvantage is that the receiver needs to uncompress the compressed + form to make use of the data. + + This specification describes Packed CBOR, a simple transformation of + a CBOR data item into another CBOR data item that is almost as easy + to consume as the original CBOR data item. A separate decompression + step is therefore often not required at the receiver. + + This document defines the Packed CBOR format by specifying the + transformation from a Packed CBOR data item to the original CBOR data + item; it does not define an algorithm for a packer. Different + packers can differ in the amount of effort they invest in arriving at + a minimal packed form; often, they simply employ the sharing that is + natural for a specific application. + + Packed CBOR can make use of two kinds of optimization: * item sharing: substructures (data items) that occur repeatedly in the original CBOR data item can be collapsed to a simple reference to a common representation of that data item. The processing required during consumption is limited to following that reference. * affix sharing: data items (strings, containers) that share a prefix or suffix (affix) can be replaced by a reference to a common affix plus the rest of the data item. For strings, the @@ -202,32 +219,34 @@ +---------------------------+--------------+ | Tag 6(negative integer N) | 16 - 2*N - 1 | +---------------------------+--------------+ Table 1: Referencing Shared Values As examples in CBOR diagnostic notation (Section 8 of [STD94]), the first 22 elements of the shared item table are referenced by simple(0), simple(1), ... simple(15), 6(0), 6(-1), 6(1), 6(-2), 6(2), 6(-3). (The alternation between unsigned and negative integers for - even/odd table index values makes systematic use of shorter integer - encodings first.) + even/odd table index values -- "zigzag encoding" -- makes systematic + use of shorter integer encodings first.) Taking into account the encoding of these referring data items, there are 16 one-byte references, 48 two-byte references, 512 three-byte references, 131072 four-byte references, etc. As CBOR integers can - grow to very large (or negative) values, there is no practical limit - to how many shared items might be used in a Packed CBOR item. + grow to very large (or very negative) values, there is no practical + limit to how many shared items might be used in a Packed CBOR item. - Note that the semantics of Tag 6 depend on its content: An integer - turns the tag into a shared item reference, a string or container - (map or array) into a prefix reference (see Table 2). + Note that the semantics of Tag 6 depend on its tag content: An + integer turns the tag into a shared item reference, whereas a string + or container (map or array) turns it into a prefix reference (see + Table 2). Note also that the tag content of Tag 6 may itself be + packed, so it may need to be unpacked to make this determination. 2.3. Referencing Affix Items Prefix items are stored in the prefix table of the Current Set; suffix items are stored in the suffix table of the Current Set. We collectively call these items affix items; when referencing, which of the tables is actually used depends on whether a prefix or a suffix reference was used. +===================================+================+ @@ -250,42 +269,43 @@ | Tag 216-223(prefix) | 0-7 | +-----------------------------------+---------------+ | Tag 27647-28671(prefix) | 8-1023 | +-----------------------------------+---------------+ | Tag 1811940352-1879048191(prefix) | 1024-67108863 | +-----------------------------------+---------------+ Table 3: Referencing Suffix Values Affix data items are referenced by using the data items in Table 2 - and Table 3. Each of these implies the table used (prefix or - suffix), a table index (an unsigned integer) and contains a "rump - item". When reconstructing the original data item, such a reference - is replaced by a data item constructed from the referenced affix data - item (affix, which might need to be recursively unpacked first) - "concatenated" with the tag content (rump, again possibly recursively - unpacked). + and Table 3. The tag number indicates the table used (prefix or + suffix) and a table index (an unsigned integer); the tag content + contains a "rump item". When reconstructing the original data item, + such a reference is replaced by a data item constructed from the + referenced affix data item (affix, which might need to be recursively + unpacked first) "concatenated" with the tag content (rump, again + possibly recursively unpacked). * For a rump of type array and map, the affix also needs to be an array or a map. For an array, the elements from the prefix are - prepended, and the elements from a suffix are appended to the rump - array. For a map, the entries in the affix are added to those of - the rump; prefix and suffix references differ in how entries with - identical keys are combined: for prefix references, an entry in - the rump with the same key as an entry in the affix overrides the - one in the affix, while for suffix references, an entry in the + prepended to the rump array, while the elements from a suffix are + appended. For a map, the entries in the affix are added to those + of the rump; prefix and suffix references differ in how entries + with identical keys are combined: for prefix references, an entry + in the rump with the same key as an entry in the affix overrides + the one in the affix, while for suffix references, an entry in the affix overrides an entry in the rump that has the same key. - | NOTE: Not sure that we want to use the efficiencies of - | overriding, but having default values supplied out of a - | dictionary to be overridden by a rump sounds rather handy. - | Note that there is no way to remove a map entry from the table. + | NOTE: One application of the rule for prefix references is to + | supply default values out of a dictionary, which can then be + | overridden by the entries in the map supplied as the rump + | value. Note that this pattern provides no way to remove a map + | entry from the prefix table entry. * For a rump of one of the string types, the affix also needs to be one of the string types; the bytes of the strings are concatenated as specified (prefix + rump, rump + suffix). The result of the concatenation gets the type of the rump; this way a single affix can be used to build both byte and text strings, depending on what type of rump is being used. As a contrived (but short) example, if the prefix table is ["foobar", "foob", "fo"], the following prefix references will all unpack to @@ -301,25 +321,25 @@ references are one quarter of those, except that there is no single- byte reference and 8 two-byte references. | Rationale: Experience suggests that prefix packing might be | more likely than suffix packing. Also for this reason, there | is no intent to spend a 1+0 tag value for suffix packing. 2.4. Discussion This specification uses up a large number of Simple Values and Tags, - in particular one of the rare one-byte tags and half of the one-byte - simple values. Since the objective is compression, this is warranted - if and only if there is consensus that this specific format could be - useful for a wide area of applications, while maintaining reasonable - simplicity in particular at the side of the consumer. + in particular one of the rare one-byte tags and two thirds of the + one-byte simple values. Since the objective is compression, this is + warranted only based on a consensus that this specific format could + be useful for a wide area of applications, while maintaining + reasonable simplicity in particular at the side of the consumer. A maliciously crafted Packed CBOR data item might contain a reference loop. A consumer/decompressor MUST protect against that. | Different strategies for decoding/consuming Packed CBOR are | available. | For example: | | * the decoder can decode and unpack the packed item, | presenting an unpacked data item to the application. In @@ -337,25 +357,26 @@ | CBOR. | | * hybrid models are possible, for instance: The decoder | builds a data item tree directly from the Packed CBOR as | if it were oblivious, but also provides accessors that | hide (resolve) the packing. In this specific case, the | onus of dealing with loops is on the accessors. | | In general, loop detection can be handled in a similar way in | which loops of symbolic links are handled in a file system: A - | system wide limit (often 31 or 40 indirections for symbolic + | system-wide limit (often 31 or 40 indirections for symbolic | links) is applied to any reference chase. - | ISSUE: The present specification does nothing to help with the - | packing of CBOR sequences [RFC8742]; maybe it should. + | NOTE: The present specification does nothing to help with the + | packing of CBOR sequences [RFC8742]; maybe such a specification + | should be added. 3. Table Setup The packing references described in Section 2 assume that packing tables have been set up. By default, all three tables are empty (zero-length arrays). Table setup can happen in one of two ways: @@ -409,21 +430,21 @@ content of the tag 51 are prepended to the tables for shared items, prefixes, and suffixes that apply to the entire tag (by default empty tables). The original CBOR data item can be reconstructed by recursively replacing shared, prefix, and suffix references encountered in the rump by their expansions. Packed item references in the newly constructed (low-numbered) parts of the table need to be interpreted in the number space of that table - (which includes the, now higher-numbered inherited parts), while + (which includes the, now higher-numbered, inherited parts), while references in any existing, inherited (higher-numbered) part continue to use the (more limited) number space of the inherited table. 4. IANA Considerations In the registry "CBOR Tags" [IANA.cbor-tags], IANA is requested to allocate the tags defined in Table 4. +=======================+========+=========+========================+ | Tag |Data |Semantics| Reference | @@ -535,33 +556,33 @@ JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, June 2019, . [STD94] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, December 2020, . 6.2. Informative References + [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification + version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, + . + [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, . [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013, . - [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, - DOI 10.17487/RFC7322, September 2014, - . - [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, . [STD63] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003, . Appendix A. Examples