draft-ietf-cbor-packed-04.txt   draft-ietf-cbor-packed-05.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universit├Ąt Bremen TZI Internet-Draft Universit├Ąt Bremen TZI
Intended status: Informational 13 February 2022 Intended status: Informational 20 April 2022
Expires: 17 August 2022 Expires: 22 October 2022
Packed CBOR Packed CBOR
draft-ietf-cbor-packed-04 draft-ietf-cbor-packed-05
Abstract Abstract
The Concise Binary Object Representation (CBOR, RFC 8949) is a data The Concise Binary Object Representation (CBOR, RFC 8949 == STD 94)
format whose design goals include the possibility of extremely small is a data format whose design goals include the possibility of
code size, fairly small message size, and extensibility without the extremely small code size, fairly small message size, and
need for version negotiation. extensibility without the need for version negotiation.
CBOR does not provide any forms of data compression. CBOR data CBOR does not provide any forms of data compression. CBOR data
items, in particular when generated from legacy data models often items, in particular when generated from legacy data models, often
allow considerable gains in compactness when applying data allow considerable gains in compactness when applying data
compression. While traditional data compression techniques such as compression. While traditional data compression techniques such as
DEFLATE (RFC 1951) can work well for CBOR encoded data items, their DEFLATE (RFC 1951) can work well for CBOR encoded data items, their
disadvantage is that the receiver needs to unpack the compressed form disadvantage is that the receiver needs to uncompress the compressed
to make use of data. form to make use of the data.
This specification describes Packed CBOR, a simple transformation of This specification describes Packed CBOR, a simple transformation of
a CBOR data item into another CBOR data item that is almost as easy a CBOR data item into another CBOR data item that is almost as easy
to consume as the original CBOR data item. A separate decompression to consume as the original CBOR data item. A separate decompression
step is therefore often not required at the receiver. step is therefore often not required at the receiver.
Note to Readers Note to Readers
This is a working-group draft of the CBOR working group of the IETF, This is a working-group draft of the CBOR working group of the IETF,
https://datatracker.ietf.org/wg/cbor/about/ https://datatracker.ietf.org/wg/cbor/about/
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 17 August 2022. This Internet-Draft will expire on 22 October 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 32 skipping to change at page 2, line 32
extracted from this document must include Revised BSD License text as extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Packed CBOR . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Packing Tables . . . . . . . . . . . . . . . . . . . . . 4 2.1. Packing Tables . . . . . . . . . . . . . . . . . . . . . 4
2.2. Referencing Shared Items . . . . . . . . . . . . . . . . 4 2.2. Referencing Shared Items . . . . . . . . . . . . . . . . 5
2.3. Referencing Affix Items . . . . . . . . . . . . . . . . . 5 2.3. Referencing Affix Items . . . . . . . . . . . . . . . . . 6
2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7
3. Table Setup . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Table Setup . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Basic Packed CBOR . . . . . . . . . . . . . . . . . . . . 9 3.1. Basic Packed CBOR . . . . . . . . . . . . . . . . . . . . 9
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.1. Normative References . . . . . . . . . . . . . . . . . . 11 6.1. Normative References . . . . . . . . . . . . . . . . . . 12
6.2. Informative References . . . . . . . . . . . . . . . . . 12 6.2. Informative References . . . . . . . . . . . . . . . . . 12
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 13 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 13
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 17 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 17
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
(TO DO, expand on text from abstract here; move references here and The Concise Binary Object Representation (CBOR, [STD94]) is a data
neuter them in the abstract as per Section 4.3 of [RFC7322].) format whose design goals include the possibility of extremely small
The specification defines a transformation from a Packed CBOR data code size, fairly small message size, and extensibility without the
item to the original CBOR data item; it does not define an algorithm need for version negotiation.
for an actual packer. Different packers can differ in the amount of
effort they invest in arriving at a minimal packed form.
Packed CBOR can employ two kinds of optimization: CBOR does not provide any forms of data compression. CBOR data
items, in particular when generated from legacy data models, often
allow considerable gains in compactness when applying data
compression. While traditional data compression techniques such as
DEFLATE [RFC1951] can work well for CBOR encoded data items, their
disadvantage is that the receiver needs to uncompress the compressed
form to make use of the data.
This specification describes Packed CBOR, a simple transformation of
a CBOR data item into another CBOR data item that is almost as easy
to consume as the original CBOR data item. A separate decompression
step is therefore often not required at the receiver.
This document defines the Packed CBOR format by specifying the
transformation from a Packed CBOR data item to the original CBOR data
item; it does not define an algorithm for a packer. Different
packers can differ in the amount of effort they invest in arriving at
a minimal packed form; often, they simply employ the sharing that is
natural for a specific application.
Packed CBOR can make use of two kinds of optimization:
* item sharing: substructures (data items) that occur repeatedly in * item sharing: substructures (data items) that occur repeatedly in
the original CBOR data item can be collapsed to a simple reference the original CBOR data item can be collapsed to a simple reference
to a common representation of that data item. The processing to a common representation of that data item. The processing
required during consumption is limited to following that required during consumption is limited to following that
reference. reference.
* affix sharing: data items (strings, containers) that share a * affix sharing: data items (strings, containers) that share a
prefix or suffix (affix) can be replaced by a reference to a prefix or suffix (affix) can be replaced by a reference to a
common affix plus the rest of the data item. For strings, the common affix plus the rest of the data item. For strings, the
skipping to change at page 5, line 26 skipping to change at page 5, line 38
+---------------------------+--------------+ +---------------------------+--------------+
| Tag 6(negative integer N) | 16 - 2*N - 1 | | Tag 6(negative integer N) | 16 - 2*N - 1 |
+---------------------------+--------------+ +---------------------------+--------------+
Table 1: Referencing Shared Values Table 1: Referencing Shared Values
As examples in CBOR diagnostic notation (Section 8 of [STD94]), the As examples in CBOR diagnostic notation (Section 8 of [STD94]), the
first 22 elements of the shared item table are referenced by first 22 elements of the shared item table are referenced by
simple(0), simple(1), ... simple(15), 6(0), 6(-1), 6(1), 6(-2), 6(2), simple(0), simple(1), ... simple(15), 6(0), 6(-1), 6(1), 6(-2), 6(2),
6(-3). (The alternation between unsigned and negative integers for 6(-3). (The alternation between unsigned and negative integers for
even/odd table index values makes systematic use of shorter integer even/odd table index values -- "zigzag encoding" -- makes systematic
encodings first.) use of shorter integer encodings first.)
Taking into account the encoding of these referring data items, there Taking into account the encoding of these referring data items, there
are 16 one-byte references, 48 two-byte references, 512 three-byte are 16 one-byte references, 48 two-byte references, 512 three-byte
references, 131072 four-byte references, etc. As CBOR integers can references, 131072 four-byte references, etc. As CBOR integers can
grow to very large (or negative) values, there is no practical limit grow to very large (or very negative) values, there is no practical
to how many shared items might be used in a Packed CBOR item. limit to how many shared items might be used in a Packed CBOR item.
Note that the semantics of Tag 6 depend on its content: An integer Note that the semantics of Tag 6 depend on its tag content: An
turns the tag into a shared item reference, a string or container integer turns the tag into a shared item reference, whereas a string
(map or array) into a prefix reference (see Table 2). or container (map or array) turns it into a prefix reference (see
Table 2). Note also that the tag content of Tag 6 may itself be
packed, so it may need to be unpacked to make this determination.
2.3. Referencing Affix Items 2.3. Referencing Affix Items
Prefix items are stored in the prefix table of the Current Set; Prefix items are stored in the prefix table of the Current Set;
suffix items are stored in the suffix table of the Current Set. We suffix items are stored in the suffix table of the Current Set. We
collectively call these items affix items; when referencing, which of collectively call these items affix items; when referencing, which of
the tables is actually used depends on whether a prefix or a suffix the tables is actually used depends on whether a prefix or a suffix
reference was used. reference was used.
+===================================+================+ +===================================+================+
skipping to change at page 6, line 32 skipping to change at page 6, line 40
| Tag 216-223(prefix) | 0-7 | | Tag 216-223(prefix) | 0-7 |
+-----------------------------------+---------------+ +-----------------------------------+---------------+
| Tag 27647-28671(prefix) | 8-1023 | | Tag 27647-28671(prefix) | 8-1023 |
+-----------------------------------+---------------+ +-----------------------------------+---------------+
| Tag 1811940352-1879048191(prefix) | 1024-67108863 | | Tag 1811940352-1879048191(prefix) | 1024-67108863 |
+-----------------------------------+---------------+ +-----------------------------------+---------------+
Table 3: Referencing Suffix Values Table 3: Referencing Suffix Values
Affix data items are referenced by using the data items in Table 2 Affix data items are referenced by using the data items in Table 2
and Table 3. Each of these implies the table used (prefix or and Table 3. The tag number indicates the table used (prefix or
suffix), a table index (an unsigned integer) and contains a "rump suffix) and a table index (an unsigned integer); the tag content
item". When reconstructing the original data item, such a reference contains a "rump item". When reconstructing the original data item,
is replaced by a data item constructed from the referenced affix data such a reference is replaced by a data item constructed from the
item (affix, which might need to be recursively unpacked first) referenced affix data item (affix, which might need to be recursively
"concatenated" with the tag content (rump, again possibly recursively unpacked first) "concatenated" with the tag content (rump, again
unpacked). possibly recursively unpacked).
* For a rump of type array and map, the affix also needs to be an * For a rump of type array and map, the affix also needs to be an
array or a map. For an array, the elements from the prefix are array or a map. For an array, the elements from the prefix are
prepended, and the elements from a suffix are appended to the rump prepended to the rump array, while the elements from a suffix are
array. For a map, the entries in the affix are added to those of appended. For a map, the entries in the affix are added to those
the rump; prefix and suffix references differ in how entries with of the rump; prefix and suffix references differ in how entries
identical keys are combined: for prefix references, an entry in with identical keys are combined: for prefix references, an entry
the rump with the same key as an entry in the affix overrides the in the rump with the same key as an entry in the affix overrides
one in the affix, while for suffix references, an entry in the the one in the affix, while for suffix references, an entry in the
affix overrides an entry in the rump that has the same key. affix overrides an entry in the rump that has the same key.
| NOTE: Not sure that we want to use the efficiencies of | NOTE: One application of the rule for prefix references is to
| overriding, but having default values supplied out of a | supply default values out of a dictionary, which can then be
| dictionary to be overridden by a rump sounds rather handy. | overridden by the entries in the map supplied as the rump
| Note that there is no way to remove a map entry from the table. | value. Note that this pattern provides no way to remove a map
| entry from the prefix table entry.
* For a rump of one of the string types, the affix also needs to be * For a rump of one of the string types, the affix also needs to be
one of the string types; the bytes of the strings are concatenated one of the string types; the bytes of the strings are concatenated
as specified (prefix + rump, rump + suffix). The result of the as specified (prefix + rump, rump + suffix). The result of the
concatenation gets the type of the rump; this way a single affix concatenation gets the type of the rump; this way a single affix
can be used to build both byte and text strings, depending on what can be used to build both byte and text strings, depending on what
type of rump is being used. type of rump is being used.
As a contrived (but short) example, if the prefix table is ["foobar", As a contrived (but short) example, if the prefix table is ["foobar",
"foob", "fo"], the following prefix references will all unpack to "foob", "fo"], the following prefix references will all unpack to
skipping to change at page 7, line 38 skipping to change at page 7, line 43
references are one quarter of those, except that there is no single- references are one quarter of those, except that there is no single-
byte reference and 8 two-byte references. byte reference and 8 two-byte references.
| Rationale: Experience suggests that prefix packing might be | Rationale: Experience suggests that prefix packing might be
| more likely than suffix packing. Also for this reason, there | more likely than suffix packing. Also for this reason, there
| is no intent to spend a 1+0 tag value for suffix packing. | is no intent to spend a 1+0 tag value for suffix packing.
2.4. Discussion 2.4. Discussion
This specification uses up a large number of Simple Values and Tags, This specification uses up a large number of Simple Values and Tags,
in particular one of the rare one-byte tags and half of the one-byte in particular one of the rare one-byte tags and two thirds of the
simple values. Since the objective is compression, this is warranted one-byte simple values. Since the objective is compression, this is
if and only if there is consensus that this specific format could be warranted only based on a consensus that this specific format could
useful for a wide area of applications, while maintaining reasonable be useful for a wide area of applications, while maintaining
simplicity in particular at the side of the consumer. reasonable simplicity in particular at the side of the consumer.
A maliciously crafted Packed CBOR data item might contain a reference A maliciously crafted Packed CBOR data item might contain a reference
loop. A consumer/decompressor MUST protect against that. loop. A consumer/decompressor MUST protect against that.
| Different strategies for decoding/consuming Packed CBOR are | Different strategies for decoding/consuming Packed CBOR are
| available. | available.
| For example: | For example:
| |
| * the decoder can decode and unpack the packed item, | * the decoder can decode and unpack the packed item,
| presenting an unpacked data item to the application. In | presenting an unpacked data item to the application. In
skipping to change at page 8, line 27 skipping to change at page 8, line 32
| CBOR. | CBOR.
| |
| * hybrid models are possible, for instance: The decoder | * hybrid models are possible, for instance: The decoder
| builds a data item tree directly from the Packed CBOR as | builds a data item tree directly from the Packed CBOR as
| if it were oblivious, but also provides accessors that | if it were oblivious, but also provides accessors that
| hide (resolve) the packing. In this specific case, the | hide (resolve) the packing. In this specific case, the
| onus of dealing with loops is on the accessors. | onus of dealing with loops is on the accessors.
| |
| In general, loop detection can be handled in a similar way in | In general, loop detection can be handled in a similar way in
| which loops of symbolic links are handled in a file system: A | which loops of symbolic links are handled in a file system: A
| system wide limit (often 31 or 40 indirections for symbolic | system-wide limit (often 31 or 40 indirections for symbolic
| links) is applied to any reference chase. | links) is applied to any reference chase.
| ISSUE: The present specification does nothing to help with the | NOTE: The present specification does nothing to help with the
| packing of CBOR sequences [RFC8742]; maybe it should. | packing of CBOR sequences [RFC8742]; maybe such a specification
| should be added.
3. Table Setup 3. Table Setup
The packing references described in Section 2 assume that packing The packing references described in Section 2 assume that packing
tables have been set up. tables have been set up.
By default, all three tables are empty (zero-length arrays). By default, all three tables are empty (zero-length arrays).
Table setup can happen in one of two ways: Table setup can happen in one of two ways:
skipping to change at page 10, line 7 skipping to change at page 10, line 11
content of the tag 51 are prepended to the tables for shared items, content of the tag 51 are prepended to the tables for shared items,
prefixes, and suffixes that apply to the entire tag (by default empty prefixes, and suffixes that apply to the entire tag (by default empty
tables). tables).
The original CBOR data item can be reconstructed by recursively The original CBOR data item can be reconstructed by recursively
replacing shared, prefix, and suffix references encountered in the replacing shared, prefix, and suffix references encountered in the
rump by their expansions. rump by their expansions.
Packed item references in the newly constructed (low-numbered) parts Packed item references in the newly constructed (low-numbered) parts
of the table need to be interpreted in the number space of that table of the table need to be interpreted in the number space of that table
(which includes the, now higher-numbered inherited parts), while (which includes the, now higher-numbered, inherited parts), while
references in any existing, inherited (higher-numbered) part continue references in any existing, inherited (higher-numbered) part continue
to use the (more limited) number space of the inherited table. to use the (more limited) number space of the inherited table.
4. IANA Considerations 4. IANA Considerations
In the registry "CBOR Tags" [IANA.cbor-tags], IANA is requested to In the registry "CBOR Tags" [IANA.cbor-tags], IANA is requested to
allocate the tags defined in Table 4. allocate the tags defined in Table 4.
+=======================+========+=========+========================+ +=======================+========+=========+========================+
| Tag |Data |Semantics| Reference | | Tag |Data |Semantics| Reference |
skipping to change at page 12, line 36 skipping to change at page 12, line 40
JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
June 2019, <https://www.rfc-editor.org/info/rfc8610>. June 2019, <https://www.rfc-editor.org/info/rfc8610>.
[STD94] Bormann, C. and P. Hoffman, "Concise Binary Object [STD94] Bormann, C. and P. Hoffman, "Concise Binary Object
Representation (CBOR)", STD 94, RFC 8949, Representation (CBOR)", STD 94, RFC 8949,
DOI 10.17487/RFC8949, December 2020, DOI 10.17487/RFC8949, December 2020,
<https://www.rfc-editor.org/info/rfc8949>. <https://www.rfc-editor.org/info/rfc8949>.
6.2. Informative References 6.2. Informative References
[RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification
version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996,
<https://www.rfc-editor.org/info/rfc1951>.
[RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B.,
Keranen, A., and P. Hallam-Baker, "Naming Things with Keranen, A., and P. Hallam-Baker, "Naming Things with
Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013,
<https://www.rfc-editor.org/info/rfc6920>. <https://www.rfc-editor.org/info/rfc6920>.
[RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object
Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
October 2013, <https://www.rfc-editor.org/info/rfc7049>. October 2013, <https://www.rfc-editor.org/info/rfc7049>.
[RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322,
DOI 10.17487/RFC7322, September 2014,
<https://www.rfc-editor.org/info/rfc7322>.
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
<https://www.rfc-editor.org/info/rfc8742>. <https://www.rfc-editor.org/info/rfc8742>.
[STD63] Yergeau, F., "UTF-8, a transformation format of ISO [STD63] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003, <https://www.rfc-editor.org/info/rfc3629>. 2003, <https://www.rfc-editor.org/info/rfc3629>.
Appendix A. Examples Appendix A. Examples
 End of changes. 22 change blocks. 
59 lines changed or deleted 81 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/