1004 lines
42 KiB
Plaintext
1004 lines
42 KiB
Plaintext
Extensible Dynamic Binary XML,
|
||
Client/Server Binary XML Format
|
||
(XDBX)
|
||
Version 1.0
|
||
(July 14, 2010)
|
||
|
||
Permission to copy and display the Extensible Dynamic Binary XML, Client/Server
|
||
Binary XML Format (XDBX) (the "Specification"), in any medium without fee or
|
||
royalty is hereby granted by IBM (collectively, the "Authors"), provided that you include
|
||
the following on ALL copies of the Specification, or portions thereof, that you make:
|
||
1. A link or URL to the Specification at one of the Authors websites.
|
||
2. The copyright notice as shown in the Specification.
|
||
The Authors each agree to grant you a royalty-free license, under reasonable, non-
|
||
discriminatory terms and conditions to their respective patents that they deem necessary
|
||
to implement the Specification.
|
||
THE SPECIFICATION IS PROVIDED "AS IS," AND THE AUTHORS MAKE NO
|
||
REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING,
|
||
BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR
|
||
A PARTICULAR PURPOSE, NON-INFRINGEMENT, OR TITLE; THAT THE
|
||
CONTENTS OF THE SPECIFICATION ARE SUITABLE FOR ANY PURPOSE; NOR
|
||
THAT THE IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE
|
||
ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER
|
||
RIGHTS.
|
||
THE AUTHORS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL,
|
||
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR
|
||
RELATING TO ANY USE OR DISTRIBUTION OF THE SPECIFICATION.
|
||
The name and trademarks of the Authors may NOT be used in any manner, including
|
||
advertising or publicity pertaining to the Specification or its contents without specific,
|
||
written prior permission. Title to copyright in the Specification will at all times remain
|
||
with the Authors.
|
||
No other rights are granted by implication, estoppel or otherwise.
|
||
© Copyright IBM Corporation 2010.
|
||
|
||
Abstract
|
||
The solution that is presented in this document allows an encoder to produce the binary
|
||
XML format using one or more of a set of attributes. The encoder can choose which
|
||
attributes to include based on knowledge of the receiver. The receiver that reads the
|
||
binary XML format can inspect the format header to determine the attributes with which
|
||
it is encoded. This can be purely informational, or allow the receiver the opportunity to
|
||
optimize its configuration to more efficiently process the attributes contained in the
|
||
format.
|
||
|
||
Table of Contents
|
||
1 Motivation 1
|
||
2 Encoding overview 2
|
||
3 Format Header 3
|
||
3.1 Layout of the format header 3
|
||
3.2 XDBX Major Version 3
|
||
3.3 Encoding Flags 4
|
||
3.3.1 Document Type 4
|
||
3.3.2 StringID Flags 4
|
||
3.3.3 Valid Flag 4
|
||
3.4 Example of a Format header 5
|
||
4 Format Content 6
|
||
4.1 Conventions 7
|
||
4.1.1 How Values and Lengths are Encoded 7
|
||
4.2 Encoding of Single Documents and Sequences 9
|
||
4.3 Encoding of XML Declarations 10
|
||
4.4 Encoding of Elements 11
|
||
4.5 Encoding of Attributes 12
|
||
4.6 Encoding of Namespace Mappings 13
|
||
4.7 Encoding of Text 13
|
||
4.8 Encoding of Comments 14
|
||
4.9 Encoding of Processing Instructions 14
|
||
4.10 Encoding of Other Information 14
|
||
4.11 Reserved Values for Tags 15
|
||
5 Format details 16
|
||
5.1 Encoding Single Documents and Sequences 16
|
||
5.2 StringIDs 16
|
||
5.2.1 Examples of StringID Usage 16
|
||
5.3 StringID Notes 20
|
||
5.4 Text Notes 20
|
||
5.4.1 White Space 20
|
||
5.5 XML Declaration Tag Notes 21
|
||
5.6 DTD and DOCTYPE 21
|
||
i
|
||
5.7 Namespace Notes 21
|
||
5.8 Hint Tag Notes 22
|
||
5.9 Empty Sequence 22
|
||
5.10 Escaping of Characters 22
|
||
5.11 Private Extensions 23
|
||
5.12 Reserved Tags 24
|
||
6 Examples 25
|
||
6.1 Example 1 – Default encoding 25
|
||
6.2 Example 2 – Sequence 26
|
||
6.3 Example 3 – StringIDs 27
|
||
6.4 Example 4 – Namespaces with StringIDs 28
|
||
6.5 Example 5 – Mixed Content 29
|
||
6.6 Example 6 – White Space 30
|
||
Appendix A Complete XDBX BNF 31
|
||
ii
|
||
1 Motivation
|
||
Binary serialization of XML is desirable because it allows encoding of XML data in a
|
||
smaller and more efficient form than textual XML format. The binary XML format is
|
||
more efficient for various reasons. These include:
|
||
• Multiple occurrences of repeated text are condensed through the use of StringIDs.
|
||
StringIDs are integer identifiers that replace text strings.
|
||
• When a parser processes data in a pretokenized format, the parser does not need
|
||
to search for as many token delimiters in the content, or handle as many edge
|
||
cases.
|
||
• All values are prefixed with their length. When the parser has length information,
|
||
it does not need to search for the ends of element names or values.
|
||
• All entity references are expanded in binary XML format. The XML parser does
|
||
not need to expand entity references.
|
||
The binary XML format has the following disadvantages:
|
||
• Loss of XML interoperability. Data that is in a proprietary format can be used
|
||
only on systems that have the software to decode it.
|
||
• The encoder must do extra processing to:
|
||
o Perform validation
|
||
o Perform well-formedness checking
|
||
o Resolve all entity references
|
||
o Identify repeated tags for replacement with StringIDs
|
||
This binary XML format is not intended as a replacement for XML. It can provide better
|
||
performance than XML when it is used in the implementation of some APIs.
|
||
In general, the benefits of the binary XML format outweigh the disadvantages. The
|
||
additional processing time that the encoder requires is usually less than the processing
|
||
time that is used for parsing an XML document , especially when the XML document
|
||
must be parsed more than once.
|
||
1
|
||
2 Encoding overview
|
||
This binary XML representation contains a format header followed by a number of tags.
|
||
The format header has encoding attributes which give the receiver some useful properties
|
||
of the binary XML.
|
||
The following characteristics of the binary encoding are constant, regardless of the source
|
||
document or how the binary encoding is performed:
|
||
• All text is encoded as UTF-8.
|
||
• All entity references in the source document are replaced by their values.
|
||
• Line breaks are normalized.
|
||
• Attributes are normalized.
|
||
• Where applicable, data is encoded in big-endian format.
|
||
The binary XML format is made up of various tokens (tags) and values. When binary
|
||
XML format is viewed with a standard text editor or as ASCII in a debugger, the tags
|
||
display as single ASCII characters. This can aid in debugging while making the binary
|
||
XML format more humanly readable.
|
||
2
|
||
3 Format Header
|
||
3.1 Layout of the format header
|
||
The binary XML format contains a header with information about how the format was
|
||
constructed. The header information allows the parser to configure itself in order to
|
||
process the message most efficiently.
|
||
To identify the format and its attributes, the following scheme is used for the first set of
|
||
bytes of the document:
|
||
(2 bytes) – Binary XML document identifier (“magic number”)
|
||
(1 byte) – Header length (not including magic number or the length byte itself)
|
||
(1 byte) – XDBX major version
|
||
(4 byte Integer) – Encoding flags
|
||
The “magic number” will always be this value in binary: 11001010 00111011
|
||
DocumentContent follows the Header. HeaderLength determines the length of the
|
||
Header.
|
||
BNF
|
||
XDBX ::= Header DocumentContent
|
||
Header ::= DocIdentifier HeaderLength MajorVersion
|
||
EncodingFlags HeaderFill
|
||
DocIdentifier ::= #xCA #x3B /* In binary: 11001010 00111011 */
|
||
HeaderLength ::= #x5
|
||
MajorVersion ::= #x1
|
||
EncodingFlags ::= FourBytes
|
||
HeaderFill ::= Byte*
|
||
FourBytes ::= Byte Byte Byte Byte
|
||
Byte ::= [#x0-#xFF]
|
||
3.2 XDBX Major Version
|
||
There is just one major version of XDBX, identified by the XDBX major version value of
|
||
0x01 (version 1). In this version, the HeaderLength must be at least 5.
|
||
XDBX version 1 streams contain any of the following tags: 'e', 'X', 'x', 'z', 'a', 'Y', 'y', 'b',
|
||
'm', 'T', 'U', 'C', ‘W’, 'V', 'L', 'D', 't', 'I', 'Z', '@', 'd', 'P', 'c', 'H'.
|
||
The set of tags that an XDBX encoder generates is implementation defined. However, an
|
||
XDBX encoder must assign a valid XDBX major version number to each generated
|
||
stream, and ensure that each stream contains only tags that are allowed for that XDBX
|
||
major version.
|
||
3
|
||
An XDBX decoder is required to fully support the tag set assigned to an implementation-
|
||
defined XDBX major version level. It must be able to decode all valid tags from the
|
||
corresponding tag set. However, XDBX decoders can reject XDBX streams that are
|
||
identified by an XDBX major version that is higher than the version that the decoder
|
||
supports.
|
||
3.3 Encoding Flags
|
||
The format for encoding flags allows for future expansion. Encoding flags, or features,
|
||
can be added as needed. The header consists of indicators that signal to a processor how
|
||
the format is encoded. Each encoding flag is a bit in a four-byte integer field in the
|
||
header.
|
||
The following encoding flags can be used in the binary XML format. Each encoding flag
|
||
is listed along with its value in the four-byte integer header field.
|
||
3.3.1 Document Type
|
||
This attribute indicates whether the binary stream represents one complete well-formed
|
||
XML document or a sequence of items, as defined by the XQuery 1.0 specification.
|
||
• XML document (Value: x00000000)
|
||
• XML sequence (Value: x00000001)
|
||
3.3.2 StringID Flags
|
||
The flags that are associated with stringIDs are:
|
||
• StringID flag
|
||
• Dense stringIDs used
|
||
3.3.2.1 StringID Flag (required)
|
||
This encoding flag (x00000002) must be set.
|
||
3.3.2.2 Dense StringIDs Used Flag
|
||
Certain implementations might require the stringIDs that are used in the binary XML to
|
||
be small numbers so that they can be used as indexes in an array (as opposed to a hash
|
||
table).
|
||
When specified (x00000020), this encoding flag notifies the receiver that the stringIDs
|
||
are small numbers. In general, small numbers are monotonically increasing numbers. The
|
||
stringID value 0 (zero) is reserved.
|
||
3.3.3 Valid Flag
|
||
When specified (x00000080), this encoding flag notifies the receiver that the XML
|
||
document or sequence of items conforms to a schema. This may have been determined by
|
||
4
|
||
the use of a validating XML parser, or by construction from objects that are associated
|
||
with a schema.
|
||
The use of this information by the receiver is beyond the scope of this specification. A
|
||
receiver may choose to ignore this information.
|
||
3.4 Example of a Format header
|
||
Binary XML Document Identifier: 11001010 00111011
|
||
Header Length: 00000101
|
||
XDBX major version: 00000001
|
||
Encoding flags:
|
||
• Document Type (Bit 1): XML Document
|
||
• StringID (Bit 2): On
|
||
Magic Num Hdr Len Version Encoding flags
|
||
11001010 00111011 00000101 00000001 00000000 00000000 00000000 00000010
|
||
5
|
||
4 Format Content
|
||
The following combinations of information are used in binary XML document encoding:
|
||
• TLV - Tag-Length-Value
|
||
• TV - Tag-Value
|
||
• LV - Length-Value
|
||
• TLVid - Tag-Length-Value-StringID
|
||
• ID - StringID
|
||
Some content is denoted via a TLV, while other content uses the shorter LV. This is
|
||
done for compactness, where a second tag is unnecessary and can be inferred from the
|
||
previous tag. The specification also uses TV when the length is known to be one. In
|
||
addition, TLVid is used when StringIDs are used, and is how a first occurrence of a string
|
||
value is assigned its ID. Finally, there is an ID format if only the stringID is needed.
|
||
6
|
||
4.1 Conventions
|
||
All the lengths are expressed as a number of bytes.
|
||
A summary of each tag in the format and its meaning is contained in the tables that
|
||
follow. The values in the Tag column are the decimal values of the tags. The values in
|
||
the ASCII column are the ASCII encoding of the tag values.
|
||
The following conventions are used:
|
||
• TLV(localname) - a TLV for the localname is defined, where 'Value' is the text of
|
||
the localname.
|
||
• TLV(localname) /LV(prefix)/LV(uri)- a TLV for the localname, followed by an
|
||
LV for the namespace prefix, followed by an LV for the namespace URI.
|
||
• TLVid(localname) - a TLVid for the localname is defined where stringID is the
|
||
ID assigned to the text for localname.
|
||
• Tid(localname)/id(prefix)/id(uri) - a Tag-StringID for the localname, followed by
|
||
the stringID of the namespace prefix, followed by the stringID of the namespace
|
||
URI. The StringID references a string in the dictionary.
|
||
BNF
|
||
LengthValue ::= Length Value
|
||
Length ::= VariableInteger
|
||
Value ::= Byte*
|
||
/* Number of bytes governed by preceding
|
||
length */
|
||
StringID ::= VariableInteger
|
||
VariableInteger ::= (LongLeading | ShortLeading)? LastByte
|
||
LongLeading ::= [#x81-#x8F]
|
||
[#x80-#xFF]? [#x80-#xFF]? [#x80-#xFF]?
|
||
ShortLeading ::= [#x90-#xFF] [#x80-#xFF]? [#x80-#xFF]?
|
||
LastByte ::= [#x0-#x7F]
|
||
4.1.1 How Values and Lengths are Encoded
|
||
Encoding Attributes in the format header are always encoded as signed four-byte integers
|
||
in big endian format
|
||
For space efficiency, all other values and lengths are encoded as a variable number of
|
||
bytes, with the first byte containing the highest order bits for the integer, the next byte
|
||
containing the next highest order bits, and so on. This allows the encoding to represent
|
||
any arbitrary integer in as few bytes as possible. However, this specification limits the
|
||
integer to a value representable in a signed 32 bit integer, which is 2Gbytes. Each byte
|
||
contains seven bits of the integer's value, with the highest order bit of each byte
|
||
7
|
||
designated as a flag bit. A byte's flag bit is off if the byte is the last byte (lowest order
|
||
byte) of a variable length byte sequence for a number. Because only as many bytes as
|
||
necessary to represent an integer are used, integers between 0 and 127 are represented in
|
||
one byte with the flag bit off. Integers between 128 and 16,383 are represented in two
|
||
bytes with the flag bit set in the first byte, and so on.
|
||
Examples:
|
||
• A length of binary 00000101 means 5
|
||
• A length of binary 10000101 00100001 means 673 (binary 1010100001)
|
||
8
|
||
4.2 Encoding of Single Documents and Sequences
|
||
A binary stream can represent one complete well-formed XML document or a sequence
|
||
of items, as defined by the XQuery specification. This information is encoded in the
|
||
format header with the following encoding flags:
|
||
• XML Document (Value: x00000000)
|
||
• XML Sequence (Value: x00000001)
|
||
Each item in the sequence can be a complete document, a subtree, or an atomic value.
|
||
BNF
|
||
DocumentContent ::= (XMLDocument | XMLSequence) DocumentEnd
|
||
/* Which branch to choose is controlled
|
||
by EncodingFlags */
|
||
DocumentEnd ::= 'Z'
|
||
XMLDocument ::= (Anywhere XMLDecl)? Misc*
|
||
(DocType | Misc*)? Element Misc*
|
||
XMLSequence ::= (SequenceItem
|
||
(SequenceSeparator SequenceItem)*)?
|
||
SequenceItem ::= Anywhere
|
||
(CompleteDoc | Comment | PI
|
||
| AtomicValue | Element)
|
||
Anywhere
|
||
SequenceSeparator ::= '@'
|
||
CompleteDoc ::= 'd' XMLDocument
|
||
Anywhere ::= (SI | Hint | Reserved)*
|
||
Misc ::= Comment | PI | SI | Hint
|
||
DocType ::= 'F' StringID StringID StringID
|
||
Tags
|
||
Value ASCII Meaning
|
||
90 Z End of the binary stream
|
||
64 @ Separator for items in an XML sequence
|
||
100 d Document node (assumed for XML documents, not assumed in XML
|
||
sequences)
|
||
70 F DOCTYPE in Tid(rootElementName) /id(systemID)/id(publicID)
|
||
9
|
||
4.3 Encoding of XML Declarations
|
||
BNF
|
||
XMLDecl ::= XMLVersion Encoding? Standalone?
|
||
XMLVersion ::= 'L' LengthValue
|
||
/* The value is a valid XML version.
|
||
"1.0" or "1.1" for now */
|
||
Encoding ::= 'D' LengthValue
|
||
Standalone ::= 't' BooleanValue
|
||
BooleanValue ::= False | True
|
||
False ::= #x0
|
||
True ::= #x1
|
||
Tags
|
||
Value ASCII Meaning
|
||
76 L XML version in TLV(version) form.
|
||
68 D Encoding in TLV(encoding) form.
|
||
116 t Standalone in TV(standalone) form where the value of 'standalone' is
|
||
either 0 or 1.
|
||
10
|
||
4.4 Encoding of Elements
|
||
BNF
|
||
Element ::= (ElementI | ElementSII | ElementIII)
|
||
ElementContent
|
||
EndElement
|
||
ElementI ::= 'e' StringID
|
||
ElementSII ::= 'X' LengthValue StringID StringID StringID
|
||
ElementIII ::= 'x' StringID StringID StringID
|
||
EndElement ::= 'z'
|
||
ElementContent ::= NSDecls Attributes Children
|
||
Children ::= (Misc | Element | Text)*
|
||
Tags
|
||
Value ASCII Meaning
|
||
101 e Tid(localname)
|
||
Used when the element is not associated with a namespace.
|
||
88 X TLVid(localname) / id(prefix) / id(uri)
|
||
Used when the stringID for the element name is not yet defined. If the
|
||
element is in the default namespace, then the prefix stringID is zero. If
|
||
the element is not in a namespace, then the URI stringID is zero.
|
||
120 x Tid(localname) / id(prefix) / id(uri)
|
||
Used when the stringID for the element name is already defined. If the
|
||
element is in the default namespace, then the prefix stringID will be
|
||
zero. If the element is not in a namespace, then the URI stringID is zero.
|
||
122 z End Element
|
||
11
|
||
4.5 Encoding of Attributes
|
||
BNF
|
||
Attributes ::= (Anywhere Attribute)*
|
||
Attribute ::= (AttributeI | AttributeSII | AttributeIII)
|
||
AttributeValue
|
||
AttributeI ::= 'a' StringID
|
||
AttributeSII ::= 'Y' LengthValue StringID StringID StringID
|
||
AttributeIII ::= ('y' | 'b') StringID StringID StringID
|
||
AttributeValue ::= LengthValue
|
||
/* If 'b' is used, then no &,',",<,
|
||
>,#xD,#xA,#x9 can appear in value */
|
||
Tags
|
||
Value ASCII Meaning
|
||
97 a Tid(localname) / LV(attribute-value)
|
||
Used when the attribute is not associated with a namespace.
|
||
89 Y TLVid(localname) / id(prefix) / id(uri) / LV(attribute-value)
|
||
Used when the stringID for the attribute name is not yet defined. If the
|
||
attribute is not in a namespace, then the prefix stringID and URI
|
||
stringID is zero.
|
||
121 y Tid(localname) / id(prefix) / id(uri) / LV(attribute-value)
|
||
Used when the stringID for the attribute name is already defined. If the
|
||
attribute is not in a namespace, then the prefix stringID and URI
|
||
stringID is zero.
|
||
98 b Tid(localname) / id(prefix) / id(uri) / LV(attribute-value)
|
||
Similar to the 'y' tag. Characters that cannot be used in the value are:
|
||
• '<' (#x3c)
|
||
• '>' (#x3e)
|
||
• '&' (#x26)
|
||
• carriage return (#x0d)
|
||
• single quote (#x27)
|
||
• double quote (#x22)
|
||
• tab (#x09)
|
||
• linefeed (#x0a)
|
||
Because no characters need to be escaped when this attribute node is
|
||
serialized, this feature should speed up serialization.
|
||
12
|
||
4.6 Encoding of Namespace Mappings
|
||
BNF
|
||
NSDecls ::= (Anywhere NSDecl)*
|
||
NSDecl ::= NSDeclII
|
||
NSDeclII ::= 'm' StringID StringID
|
||
Tags
|
||
Value ASCII Meaning
|
||
109 m Tid(prefix) /id(namespace-uri)
|
||
Declares a namespace mapping of a prefix stringID to a namespace URI
|
||
stringID. For default namespace declarations, the stringID for the prefix
|
||
is zero.
|
||
4.7 Encoding of Text
|
||
BNF
|
||
Text ::= ('T' | 'U' | 'C' | 'W') LengthValue
|
||
AtomicValue ::= 'V' LengthValue
|
||
Tags
|
||
Value ASCII Meaning
|
||
84 T Text node in TLV(text) form.
|
||
85 U Text node in TLV(text) form. The '<' (#x3c), '>' (#x3e), '&' (#x26), and
|
||
carriage return (#x0d) characters cannot be used in the value. Because
|
||
no characters need to be escaped when this text node is serialized, this
|
||
feature should speed up serialization.
|
||
67 C CDATA string in TLV(text) form.
|
||
87 W Text node containing only white space in TLV(text) form. White space
|
||
consists of one or more space (#x20) characters, carriage returns (#x0d),
|
||
line feeds (#x0a), tabs (#x09), Unicode line separator characters
|
||
(#x2028), or NELs (#x85).
|
||
Used when a text node contains only white space, unless the nearest
|
||
containing element with an xml:space attribute specifies
|
||
xml:space='preserve'.
|
||
86 V Atomic Value in TLV(text) form.
|
||
13
|
||
4.8 Encoding of Comments
|
||
BNF
|
||
Comment ::= 'c' LengthValue
|
||
Tags
|
||
Value ASCII Meaning
|
||
99 c Comment in TLV(comment) form.
|
||
4.9 Encoding of Processing Instructions
|
||
BNF
|
||
PI ::= PII
|
||
PII ::= 'P' StringID LengthValue
|
||
Tags
|
||
Value ASCII Meaning
|
||
80 P Processing instruction in Tid(target)/LV(value) form.
|
||
The 'P' tag cannot declare an ID for the target of the processing instruction. Instead, an 'I'
|
||
tag should be used to define the stringID for the target. Then the 'P' tag is used to define
|
||
the processing instruction itself.
|
||
Although this is unlike the behavior for element and attribute tags, this was done to avoid
|
||
creating several tags to describe a processing instruction.
|
||
4.10 Encoding of Other Information
|
||
BNF
|
||
SI ::= 'I' LengthValue StringID
|
||
Hint ::= 'H' LengthValue LengthValue
|
||
Tags
|
||
Value ASCII Meaning
|
||
73 I Definition of a stringID in TLVid(string) form. Used only when the
|
||
StringID flag is set.
|
||
72 H Hint in TLV/LV form.
|
||
14
|
||
4.11 Reserved Values for Tags
|
||
BNF
|
||
Reserved ::= [#xC9 - #xFA] Byte*
|
||
Tags
|
||
Value ASCII Meaning
|
||
201
|
||
-250
|
||
Reserved for use by applications.
|
||
Values 201 through 250 are reserved for use by applications, and will not be used as tags
|
||
in future versions of this specification. These reserved values can be used to define
|
||
private extensions to the format for features not accounted for in this version of the
|
||
specification. See the Private Extensions section on page 23 for more information.
|
||
15
|
||
5 Format details
|
||
This section provides additional details on the binary XML format.
|
||
5.1 Encoding Single Documents and Sequences
|
||
Whether an XDBX instance represents an XML document or a sequence of items is
|
||
encoded in the XDBX header. Most commonly, the binary stream represents an XML
|
||
Document. In this case, the document node as defined by the XML data models is
|
||
assumed. In other words, there is no need to start the document with a 'd' tag. If the binary
|
||
stream represents an XML Sequence, then the document node is not assumed, and any
|
||
document node in the stream needs to be denoted with a 'd' tag. Note that XPath behaves
|
||
differently whether there is a document node or not.
|
||
It is important to note that if stringIDs are used, the encoder must ensure that all stringIDs
|
||
are valid from one item to the next. In other words, the stringIDs are global to the binary
|
||
XML stream. Combining multiple documents together as items in a sequence could have
|
||
a size advantage, because the stringIDs would need to be defined only once.
|
||
5.2 StringIDs
|
||
Usage of stringIDs results in a smaller encoding, because the StringIDs are typically
|
||
smaller than the text they represent. In addition, the use of StringIDs can allow the data
|
||
in binary XML format to be processed more efficiently. The receiver must be prepared to
|
||
manage the StringIDs that appear in the document. This requires establishing and
|
||
managing lookup tables to efficiently reconcile StringIDs with the text they represent.
|
||
In some encodings the first occurrence of the text is written as text, then where that text
|
||
appears again, it is replaced with an ID that is computed during the processing of the first
|
||
occurrence. In other encodings all text, or only a portion of the text, could be represented
|
||
by an ID, where the ID is a reference to a dictionary that is contained in the message.
|
||
A StringID can be used only after the tag that defines it.
|
||
5.2.1 Examples of StringID Usage
|
||
The following shows example encodings of namespace declarations, elements, and
|
||
attributes when StringIDs are used.
|
||
Namespace Declaration:
|
||
The namespace declaration portion of the element tag: <root xmlns:foo="bar"> is
|
||
encoded as I3foo1I3bar2m12, where:
|
||
• 'I' assigns the StringID '1' to "foo" and '2' to "bar"
|
||
• 'm' declares the namespace mapping of "foo" to '1' and "bar" to '2'.
|
||
16
|
||
Suppose that the namespace prefix is reassigned to a different uri later in the document.
|
||
For example:
|
||
<Address xmlns:foo= "baz">
|
||
The encoding of the namespace declaration is:
|
||
I3baz3m13, where '3' is the StringID assigned to "baz".
|
||
Element with no prefix and no namespace:
|
||
The first occurrence of <Address> is encoded as: X7Address100, where:
|
||
• 'X' is the tag indicating an element name is encoded with StringIDs, and that a
|
||
length/value/ID tuple follows defining the localname and its associated ID,
|
||
followed by the stringIDs for the namespace prefix and namespace uri.
|
||
• '7' is the length of the localname string "Address" and '1' is the assigned ID for
|
||
that string.
|
||
• '0' is the stringID for "no namespace prefix".
|
||
• '0' is the stringID for "no namespace uri".
|
||
Subsequent occurrences of <Address> are encoded more compactly as e1, where '1' is the
|
||
StringID for the string "Address".
|
||
Element with no prefix and the default namespace:
|
||
The first occurrence of <Address> is encoded as: X7Address104 where:
|
||
• 'X' is the tag indicating an element name is encoded with StringIDs, and that a
|
||
length/value/ID tuple follows defining the localname and its associated ID.
|
||
• '0' is the stringID for the namespace prefix (because there is none).
|
||
• '4' is the stringID of the namespace uri.
|
||
Subsequent occurrences of <Address> are encoded more compactly as x104, where
|
||
• '1' is the StringID for the string "Address".
|
||
• '4' is the stringID for the namespace uri.
|
||
Element with prefix:
|
||
The first occurrence of <foo:Address> is encoded as X7Address154, where:
|
||
• '1' is the StringID assigned to the string "Address".
|
||
• '5' is the stringID that was previously assigned to "foo".
|
||
• '4' is the stringID that was previously assigned to the namespace uri.
|
||
Subsequent occurrences of <foo:Address> are encoded more compactly as x154, where
|
||
'1' is the StringID for the string "Address".
|
||
17
|
||
Attribute with no prefix (and thus no namespace):
|
||
The first occurrence of the attribute portion of <name mgr="NO"> is encoded as
|
||
Y3mgr9002NO where:
|
||
• 'Y' is the tag indicating an attribute name is encoded with StringIDs followed by a
|
||
length/value/id tuple for the attribute name.
|
||
• '3' is the length of the attribute name "mgr".
|
||
• '9' is the StringID assigned the string "mgr".
|
||
• '0' for the stringID of the prefix.
|
||
• '0' for the stringID of the URI.
|
||
• '2' is the length of the attribute value: "NO".
|
||
Subsequent occurrences of the attribute portion of <name mgr="NO"> are encoded as
|
||
a92NO, where:
|
||
• 'a' indicates an attribute declaration with StringIDs.
|
||
• '9' is the stringID of the attribute name.
|
||
• '2' is the length/value of the attribute value: "NO".
|
||
Attribute with prefix:
|
||
The first occurrence of the attribute portion of <name foo:mgr="NO"> is encoded as:
|
||
Y3mgr9542NO where:
|
||
• 'Y' is the tag indicating an attribute name is encoded with StringIDs followed by a
|
||
length/value/id tuple for the attribute name.
|
||
• '5' for the stringID for prefix.
|
||
• '4' for the stringIDs for URI.
|
||
• '3' is the length of the attribute name "mgr".
|
||
• '9' is the StringID assigned the string "mgr".
|
||
• '5' is the stringID for the prefix.
|
||
• '4' is the stringID for the URI.
|
||
• '2' is the length of the attribute value "NO".
|
||
Subsequent occurrences of the attribute portion of <name foo:mgr= "NO"> are encoded
|
||
more compactly as: y9542NO, where:
|
||
• 'y' is the tag indicating an attribute declaration with StringIDs.
|
||
• '9' the stringID for the attribute name
|
||
• '5' is the stringID for prefix.
|
||
• '4' is the stringID for URI.
|
||
• '2' the length/value of the attribute value "NO".
|
||
Elements, Text, and namespaceIDs:
|
||
This section ties together some of the concepts described above and assumes StringIDs
|
||
are used. For example:
|
||
18
|
||
<root xmlns:foo="bar">
|
||
<foo:Address>ABC</foo:Address>
|
||
<foo:Address><![CDATA[DEF]]</foo:Address>
|
||
</root>
|
||
The namespace declaration in the above XML is encoded as: I3foo1I3bar2m12, where:
|
||
• '1' represents the StringID for "foo".
|
||
• '2' is the StringID for "bar".
|
||
• 'm12' is the structure to identify a mapping of foo ('1') to bar ('2').
|
||
Therefore, the first occurrence of foo:Address is encoded as follows:
|
||
X7Address912T3ABCz where:
|
||
• 'X' indicates an element name expressed in LVid form.
|
||
• '7Address' is the LV for the localname.
|
||
• '9' is the StringID for "Address".
|
||
• '12' is a reference to the namespace mapping of foo to bar.
|
||
• 'T3ABC' is the TLV for the text node and 'z' represents the end element tag.
|
||
The subsequent occurrence of foo:Address are encoded more compactly as follows:
|
||
x912C3DEFz where:
|
||
• 'x' indicates an element name expressed in id form.
|
||
• '9' is the StringID for "Address".
|
||
• '12' is a reference to the namespace mapping of foo to bar.
|
||
• 'C3DEF' is the TLV for the CDATA.
|
||
• 'z' represents the end element tag. (NOTE: The encoder could choose to encode
|
||
the CDATA as a text node via 'T'.)
|
||
The first occurrence of foo:Address must use the more expansive form of an element
|
||
name 'X', where the second occurrence can use the more compact version 'x' because the
|
||
element name is already encoded with a stringID.
|
||
The following table summarizes the encoding of an element in various forms with
|
||
StringIDs on:
|
||
No Namespace Namespace
|
||
First Occurrence Subsequent
|
||
Occurrences
|
||
First Occurrence Subsequent
|
||
Occurrences
|
||
<Address> X7Address100 e1 X7Address902 x902
|
||
<foo:Address> N/A N/A X7Address912 x912
|
||
The following table summarizes the encoding of an attribute in various forms with
|
||
StringIDs on:
|
||
No Namespace Namespace
|
||
First
|
||
Occurrence
|
||
Subsequent
|
||
Occurrences
|
||
First
|
||
Occurrence
|
||
Subsequent
|
||
Occurrences
|
||
<mgr="NO"> Y3mgr9002NO a92NO Y3mgr9022NO y9022NO
|
||
<foo:mgr="NO"> N/A N/A Y3mgr9122NO y9122NO
|
||
19
|
||
5.3 StringID Notes
|
||
StringIDs are considered global. For example, if the string "Person" is given the stringID
|
||
4, this value will exist for the entire binary XML document. It is invalid for "Person" to
|
||
be given a different stringID, or for 4 to be assigned another string in the same binary
|
||
XML document.
|
||
The stringID value 0 (zero) is reserved and is used to mark "no namespace prefix" and
|
||
"no namespace URI".
|
||
5.4 Text Notes
|
||
Multiple text and/or CDATA tags can appear one after another in order to handle
|
||
arbitrarily large amounts of data. They are also used to encode mixed content.
|
||
It is up to the encoder whether to encode CDATA using the 'C' tag or a 'T' tag, because
|
||
they are semantically identical. The 'C' tag exists for applications that want to preserve
|
||
the CDATA syntax. Beyond the difference between CDATA and text as described in the
|
||
XML specification, this binary XML specification treats them identical.
|
||
The 'U' tag is similar to the 'T' tag, except that the encoder guarantees that none of the
|
||
characters in the 'U' tag need to be replaced with entity references if this text is serialized
|
||
as XML. In other words, none of the following four characters are present in the text
|
||
node: less-then “<” [<], greater-than “>” [>], ampersand “&” [&], and
|
||
carriage-return [
].
|
||
5.4.1 White Space
|
||
The XMLPARSE function, which may be applied to an XML document that is passed to
|
||
the receiver, offers the options of STRIP WHITESPACE and PRESERVE
|
||
WHITESPACE. STRIP WHITESPACE removes text nodes that contain only white
|
||
space unless the nearest containing element with an xml:space attribute specifies
|
||
xml:space='preserve'.
|
||
To facilitate the processing of STRIP WHITESPACE, text nodes that would be stripped
|
||
by this operation must be identified by the 'W' tag.
|
||
CDATA sections that contain white space that would be stripped by STRIP
|
||
WHITESPACE must be identified by a 'W' tag rather than a 'C' tag. This is seen in the
|
||
following examples:
|
||
Serialized XML: <a> <![CDATA[bcd]]> </a>
|
||
Binary XML: X1a100T1 C3bcdT1 z
|
||
Serialized XML: <a> <![CDATA[ ]]> </a>
|
||
Binary XML: X1a100W1 W1 W1 z
|
||
or
|
||
X1a100W3 z
|
||
If a processor determines that certain white space characters can be removed (e.g.
|
||
ignorable whitespace SAX events), they should be removed instead of being encoded in a
|
||
'W' tag.
|
||
20
|
||
5.5 XML Declaration Tag Notes
|
||
Typically, there is no XML declaration in binary XML. After all, the binary XML
|
||
encoding is always UTF-8. However, if the XML version is not 1.0, then the XML
|
||
declaration is mandatory, just like in serialized XML.
|
||
If the XML declaration tags are present in the binary XML, the tags must include the
|
||
version tag, however, the encoding and standalone tags are optional.
|
||
Example encodings:
|
||
Serialized XML: <?xml version="1.0" encoding="UTF-8"
|
||
standalone="no" ?>
|
||
Binary XML: L31.0D5UTF-8t0
|
||
Serialized XML: <?xml version="1.1" encoding="UTF-16" ?>
|
||
Binary XML: L31.1D6UTF-16
|
||
Serialized XML: <?xml version="1.0" standalone="yes" ?>
|
||
Binary XML: L31.0t1
|
||
Serialized XML: <?xml version="1.1" ?>
|
||
Binary XML: L31.1
|
||
The XML declaration tags are informational only and therefore optional. They provide
|
||
the binary encoding with the information provided in the XML declaration of the source
|
||
document. For example, all text is encoded as UTF-8 in the binary encoding, even if the
|
||
source document used UTF-16. The fact that the source document used UTF-16 can be
|
||
communicated using these tags.
|
||
5.6 DTD and DOCTYPE
|
||
This specification defines a tag for the DOCTYPE. This tag cannot describe an internal
|
||
DTD.
|
||
5.7 Namespace Notes
|
||
Each namespace declaration in the source XML document needs to have a corresponding
|
||
'm' tag in the binary encoding, even if the namespace mapping is being declared again.
|
||
For example:
|
||
<Name xmlns:foo="bar">
|
||
...
|
||
</Name>
|
||
<Person xmlns:foo="bar">
|
||
...
|
||
</Person>
|
||
For the encoding of the Name and Person elements, both must contain an explicit
|
||
namespace mapping using the 'm' tag.
|
||
The namespace declarations appear immediately after the element tag in which they were
|
||
declared.
|
||
21
|
||
An undeclared default namespace is encoded as m00. Elements within undeclared
|
||
namespaces can be encoded with 'e' tag, 'X' tag, or 'x' tags with 00 for prefix and URI
|
||
StringIDs. Attributes with undeclared namespaces can be encoded with a tag, or the 'Y'
|
||
tag or 'y' tag with 00 for prefix and URI StringIDs.
|
||
5.8 Hint Tag Notes
|
||
The hint tag is a way to add arbitrary information to the binary encoding. This is
|
||
analogous to the use of the XML schema's xsd:appinfo. It consists of a TLV followed by
|
||
an LV. The 'H' tag indicates that some information is contained in its value field that
|
||
defines what is contained in the following LV. If the reader sees the initial TLV and does
|
||
not understand or want to process it, it can use the length of the following LV to skip it.
|
||
Otherwise, the reader can consume the information. For example, if validation was
|
||
performed in a database with a schema in the database's schema repository, then the
|
||
encoder may want to record exactly which schema it was validated with and could do so
|
||
using this form. Therefore, the encoding could be:
|
||
H11schema-used12http://x.y.z
|
||
5.9 Empty Sequence
|
||
XQuery defines an empty sequence. This is represented in the binary stream as a header
|
||
followed by a 'Z' tag.
|
||
5.10 Escaping of Characters
|
||
The tags U and b enable XDBX to record that none of the characters in a text node or
|
||
attribute value need to be escaped via an entity reference. The goal of this feature is to
|
||
speed up serialization of the XDBX binary stream. When any of these tags are used, none
|
||
of the characters in the text or attribute value need to be examined to determine if they
|
||
need escaping.
|
||
The 'U' tag can only be used if none of the characters in the text nodes are:
|
||
• carriage return
|
||
• ampersand
|
||
• greater than
|
||
• less than.
|
||
The 'b' tag can only be used if none of the characters in the attribute values are:
|
||
• carriage return
|
||
• ampersand
|
||
• greater than
|
||
• less than
|
||
• single quote
|
||
• double quote
|
||
• tab
|
||
• linefeed
|
||
22
|
||
Note that this only applies to serialization to Unicode. Serialization to other encodings
|
||
might require numeric character references due to the lack of encodings for certain
|
||
characters in certain codepages.
|
||
5.11 Private Extensions
|
||
Assuming agreement between a sender and receiver, the specification allows for the
|
||
definition and use of private extensions. This allows the format to support additional
|
||
features that are not currently and explicitly documented. An example of this is for type
|
||
encoding data in elements and attributes in a specific, non-text format. This allows the
|
||
encoder to encode the data in the most optimal form for the receiver. For example,
|
||
consider the element "weight" that is of type float:
|
||
<weight>75.4</weight>
|
||
Using one of the reserved tags, the encoder can inform the receiver of an alternative,
|
||
more efficient, encoding. This is also useful for user-defined types. Assuming StringIDs
|
||
are off, the preceding element could be encoded as:
|
||
2016weight002407xxxxxxxz
|
||
Where:
|
||
• '#x201' is a reserved tag defined by the encoder and receiver to define this special
|
||
element encoding.
|
||
• '6' is the length of the string "weight"
|
||
• '0' is the prefix length.
|
||
• '0' is the URI length.
|
||
• '#x240' is another reserved tag used to indicate that the data is encoded as an IEEE
|
||
float.
|
||
• '7' is the length of the encoded data, and 'xxxxxxx' is used to represent the binary
|
||
encoding of the value as a float.
|
||
Similarly, to encode attribute values, another reserved tag is used. For example:
|
||
<Person weight = "75.4">Joe</Person>
|
||
Assuming StringIDs are off, the attribute portion of this element could be encoded as:
|
||
2106weight002407xxxxxxx
|
||
Where:
|
||
• '#x210' is the reserved tag defined by the encoder and receiver to define this
|
||
special attribute encoding.
|
||
• '6' is the length of the string "weight".
|
||
• '0' is the prefix length.
|
||
• '0' is the URI length.
|
||
• '#x240' is another reserved tag used to indicate that the data is encoded as an IEEE
|
||
float.
|
||
• '7' is the length of the encoded data, and 'xxxxxxx' is used to represent the binary
|
||
encoding of the value as a float.
|
||
23
|
||
5.12 Reserved Tags
|
||
The set of reserved tags is for use by encoders that have agreement with the receivers on
|
||
their meaning. These reserved tags will not be reassigned for use in future versions of
|
||
this specification, thus ensuring forward and backward compatibility for implementations
|
||
that choose to use them.
|
||
24
|
||
6 Examples
|
||
The following section documents examples of serialized XML and the corresponding
|
||
binary XML format when various encoding attributes are used.
|
||
Note: The serialized XML values provided in these examples are shown with line breaks
|
||
and indentation to make them more readable. These characters are not included in the
|
||
byte counts shown in the example statistics.
|
||
6.1 Example 1 – Default encoding
|
||
This example shows an XML document and its binary encoding with all the default
|
||
encoding flags.
|
||
Encoding Flags:
|
||
Document Type: XML Document
|
||
StringIDs (required): On
|
||
XML:
|
||
<root>
|
||
<name mgr = "NO">Joe</name>
|
||
<name>Susan</name>
|
||
<name>Bill</name>
|
||
</root>
|
||
Binary XML (Excluding the header):
|
||
X4root100X4name200Y3mgr3002NOT3Joezx200T5Susanzx200T4BillzzZ
|
||
Element, attribute, prefix, and URI IDs are:
|
||
1==root, 2==name, 3==mgr
|
||
Statistics:
|
||
75 bytes of XML
|
||
60 bytes of binary + 8 byte header = 68 bytes of binary XML
|
||
25
|
||
6.2 Example 2 – Sequence
|
||
This example shows an XML sequence with multiple items, including a comment node, a
|
||
document node, an element node, and an atomic value. In the binary XML, 'b' is used to
|
||
denote blanks.
|
||
Encoding Flags:
|
||
Document Type: XML Sequence
|
||
StringIDs (required): On
|
||
XML:
|
||
<!--comment-->
|
||
<name mgr = "NO"> Joe </name>
|
||
Susan
|
||
<name>Bill</name>
|
||
Binary XML (Excluding the header):
|
||
c7comment@dX4name100Y3mgr2002NOT7bbJoebbz@V5Susan@x100T4BillzZ
|
||
Element, attribute, prefix, and URI IDs are:
|
||
1==name, 2==mgr
|
||
Statistics
|
||
67 bytes of XML
|
||
62 bytes of binary + 8 byte header = 70 bytes of binary XML
|
||
26
|
||
6.3 Example 3 – StringIDs
|
||
This example shows an XML document and its binary encoding with stringIDs on.
|
||
Encoding Flags:
|
||
Document Type: XML Document
|
||
StringIDs (required): On
|
||
XML:
|
||
<root xmlns:foo = "bar">
|
||
<Person>
|
||
<name mgr = "NO">Bill</name>
|
||
<foo:age>35</foo:age>
|
||
</Person>
|
||
<Person>
|
||
<name mgr = "NO">Joe</name>
|
||
<foo:age>45</foo:age>
|
||
</Person>
|
||
</root>
|
||
Binary XML (Excluding the header):
|
||
I3foo1I3bar2X4root300m12X6Person400X4name500Y3mgr6002NOT4BillzX3age712T
|
||
235zze4e5a62NOT3Joezx712T245zzzZ
|
||
Element, attribute, prefix, and URI IDs are:
|
||
1==foo, 2==bar, 3==root, 4==Person, 5==name, 6==mgr, 7==age
|
||
Statistics:
|
||
162 bytes of XML
|
||
103 bytes of binary + 8 byte header = 111 bytes of binary XML
|
||
27
|
||
6.4 Example 4 – Namespaces with StringIDs
|
||
This example shows an XML document with multiple namespaces and its binary
|
||
encoding with stringIDs.
|
||
Encoding Flags:
|
||
Document Type: XML Document
|
||
StringIDs (required): On
|
||
XML:
|
||
<root>
|
||
<Person xmlns:foo = "bar">
|
||
<name mgr = "NO">Bill</name>
|
||
<foo:age>35</foo:age>
|
||
</Person>
|
||
<Person xmlns:foo = "baz">
|
||
<name foo:mgr = "NO">Joe</name>
|
||
<foo:age>45</foo:age>
|
||
</Person>
|
||
<Person xmlns:bar = "food">
|
||
<name bar:mgr = "YES">Susan</name>
|
||
</Person>
|
||
<Person xmlns:bar = "foo">
|
||
<name bar:exec = "YES">Amy</name>
|
||
</Person>
|
||
</root>
|
||
Binary XML (Excluding the header):
|
||
X4root100I3foo2I3bar3X6Person400m23X4name500Y3mgr6002NOT4BillzX3age723T
|
||
235zzI3baz8e4m28e5y6282NOT3Joezx728T245zzI4food9e4m39e5y6393YEST5Susanz
|
||
ze4m32e5Y4exec10323YEST3AmyzzzZ
|
||
Element, attribute, prefix, and uri IDs are:
|
||
1==root, 2==foo, 3==bar, 4==Person, 5==name, 6==mgr, 7==age, 8==baz, 9==food,
|
||
10==exec
|
||
Statistics:
|
||
322 bytes of XML
|
||
173bytes of binary + 8 byte header = 181 bytes of binary XML
|
||
28
|
||
6.5 Example 5 – Mixed Content
|
||
This example shows how mixed content is encoded.
|
||
Encoding Attributes:
|
||
Document Type: XML Document
|
||
StringIDs (required): On
|
||
XML:
|
||
<a>text<b/>more text</a>
|
||
Binary XML (Excluding the header):
|
||
X1a100T4textX1b200zT9more textzZ
|
||
Element, attribute, prefix, and URI IDs are:
|
||
1==a, 2==b
|
||
Statistics
|
||
24 bytes of XML
|
||
32 bytes of binary + 8 byte header = 40 bytes of binary XML
|
||
29
|
||
6.6 Example 6 – White Space
|
||
This example shows a binary XML document with all of the white space characters that
|
||
are shown in the corresponding serialized XML document. In the binary XML, 'b' is used
|
||
to denote a blank and 'a' is used to indicate a linefeed character.
|
||
Encoding Flags:
|
||
Document Type: XML Document
|
||
StringIDs (required): On
|
||
XML:
|
||
<employee>
|
||
<name xml:space="preserve"><fn>Susan</fn> <ln>Smith</ln></name>
|
||
<address xml:space="default">
|
||
<state>MA</state>
|
||
</address>
|
||
</employee>
|
||
Binary XML (Excluding the header):
|
||
X8employee100W4abbbX4name200I3xml3I5space4y4308preserveX2fn600T5SusanzT1
|
||
bX2ln700T5SmithzzW4abbbX7address800y4307defaultW7abbbbbbX5state900T2MAz
|
||
W4abbbzW1azZ
|
||
Element, attribute, prefix, and URI IDs are:
|
||
1==employee, 2==name, 3==xml, 4==space, 5==name, 6==fn, 7==ln, 8==address,
|
||
9==state
|
||
Statistics:
|
||
160 bytes of XML
|
||
155 bytes of binary + 8 byte header = 163 bytes of binary XML
|
||
30
|
||
Appendix A Complete XDBX BNF
|
||
XDBX ::= Header DocumentContent
|
||
Header ::= DocIdentifier HeaderLength MajorVersion
|
||
EncodingFlags HeaderFill
|
||
DocIdentifier ::= #xCA #x3B /* In binary: 11001010 00111011 */
|
||
HeaderLength ::= #x5
|
||
MajorVersion ::= #x1
|
||
EncodingFlags ::= FourBytes
|
||
HeaderFill ::= Byte*
|
||
DocumentContent ::= (XMLDocument | XMLSequence) DocumentEnd
|
||
/* Which branch to choose is controlled
|
||
by EncodingFlags */
|
||
DocumentEnd ::= 'Z'
|
||
XMLSequence ::= (SequenceItem ('@' SequenceItem)*)?
|
||
SequenceItem ::= Anywhere
|
||
(CompleteDoc | Comment | PI | AtomicValue
|
||
| Element)
|
||
Anywhere
|
||
CompleteDoc ::= 'd' XMLDocument
|
||
AtomicValue ::= 'V' LengthValue
|
||
XMLDocument ::= (Anywhere XMLDecl)? Misc*
|
||
(DocType | Misc*)? Element Misc*
|
||
Anywhere ::= (SI | Hint | Reserved)*
|
||
Misc ::= Comment | PI | SI | Hint
|
||
DocType ::= 'F' StringID StringID StringID
|
||
XMLDecl ::= XMLVersion Encoding? Standalone?
|
||
XMLVersion ::= 'L' LengthValue
|
||
/* The value is a valid XML version. "1.0"
|
||
or "1.1" for now */
|
||
Encoding ::= 'D' LengthValue
|
||
Standalone ::= 't' BooleanValue
|
||
Element ::= (ElementI | ElementSII | ElementIII)
|
||
ElementContent
|
||
EndElement
|
||
ElementI ::= 'e' StringID
|
||
ElementSII ::= 'X' LengthValue StringID StringID StringID
|
||
31
|
||
ElementIII ::= 'x' StringID StringID StringID
|
||
EndElement ::= 'z'
|
||
ElementContent ::= NSDecls Attributes Children
|
||
NSDecls ::= (Anywhere NSDecl)*
|
||
NSDecl ::= NSDeclII
|
||
NSDeclII ::= 'm' StringID StringID
|
||
Attributes ::= (Anywhere Attribute)*
|
||
Attribute ::= (AttributeI | AttributeSII | AttributeIII)
|
||
AttributeValue
|
||
AttributeI ::= 'a' StringID
|
||
AttributeSII ::= 'Y' LengthValue StringID StringID StringID
|
||
AttributeIII ::= ('y' | 'b') StringID StringID StringID
|
||
AttributeValue ::= LengthValue
|
||
/* If 'b' is used, then no &,',",
|
||
<,>,#xD,#xA,#x9 can appear in value */
|
||
Children ::= (Misc | Element | Text)*
|
||
Text ::= ('T' | 'U' | 'C' | 'W' ) LengthValue
|
||
Comment ::= 'c' LengthValue
|
||
PI ::= PII
|
||
PII ::= 'P' StringID LengthValue
|
||
SI ::= 'I' LengthValue StringID
|
||
Hint ::= 'H' LengthValue LengthValue
|
||
Reserved ::= [#xC9 - #xFA] Byte*
|
||
LengthValue ::= Length Value
|
||
Length ::= VariableInteger
|
||
Value ::= Byte*
|
||
/* Number of bytes governed by preceding
|
||
length */
|
||
StringID ::= VariableInteger
|
||
TypeID ::= VariableInteger
|
||
VariableInteger ::= (LongLeading | ShortLeading)? LastByte
|
||
LongLeading ::= [#x81-#x8F] [#x80-#xFF]? [#x80-#xFF]?
|
||
[#x80-#xFF]?
|
||
ShortLeading ::= [#x90-#xFF] [#x80-#xFF]? [#x80-#xFF]?
|
||
LastByte ::= [#x0-#x7F]
|
||
32
|
||
BooleanValue ::= False | True
|
||
False ::= #x0
|
||
True ::= #x1
|
||
FourBytes ::= Byte Byte Byte Byte
|
||
Byte ::= [#x0-#xFF]
|
||
33 |