123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206 |
- <!-- neon XML interface -*- text -*- -->
- <sect1 id="xml">
- <title>Parsing XML</title>
- <para>The &neon; XML interface is exposed by the
- <filename>ne_xml.h</filename> header file. This interface gives a
- wrapper around the standard <ulink
- url="http://www.saxproject.org/">SAX</ulink> API used by XML
- parsers, with an additional abstraction, <firstterm>stacked SAX
- handlers</firstterm>, and also giving consistent <ulink
- url="http://www.w3.org/TR/REC-xml-names">XML Namespace</ulink> support.</para>
- <sect2 id="xml-sax">
- <title>Introduction to SAX</title>
- <para>A SAX-based parser works by emitting a sequence of
- <firstterm>events</firstterm> to reflect the tokens being parsed
- from the XML document. For example, parsing the following document
- fragment:
- <programlisting><![CDATA[
- <hello>world</hello>
- ]]></programlisting>
- results in the following events:
- <orderedlist>
- <listitem>
- <simpara>&startelm; "hello"</simpara>
- </listitem>
- <listitem>
- <simpara>&cdata; "world"</simpara>
- </listitem>
- <listitem>
- <simpara>&endelm; "hello"</simpara>
- </listitem>
- </orderedlist>
- This example demonstrates the three event types used used in the
- subset of SAX exposed by the &neon; XML interface: &startelm;,
- &cdata; and &endelm;. In a C API, an <quote>event</quote> is
- implemented as a function callback; three callback types are used in
- &neon;, one for each type of event.</para>
- </sect2>
- <sect2 id="xml-stacked">
- <title>Stacked SAX handlers</title>
- <para>WebDAV property values are represented as fragments of XML,
- transmitted as parts of larger XML documents over HTTP (notably in
- the body of the response to a <literal>PROPFIND</literal> request).
- When &neon; parses such documents, the SAX events generated for
- these property value fragments may need to be handled by the
- application, since &neon; has no knowledge of the structure of
- properties used by the application.</para>
- <para>To solve this problem<footnote id="foot.xml.sax"><para>This
- <quote>problem</quote> only needs solving because the SAX interface
- is so inflexible when implemented as C function callbacks; a better
- approach would be to use an XML parser interface which is not based
- on callbacks.</para></footnote> the &neon; XML interface introduces
- the concept of a <firstterm>SAX handler</firstterm>. A SAX handler
- comprises a &startelm;, &cdata; and &endelm; callback; the
- &startelm; callback being defined such that each handler may
- <emphasis>accept</emphasis> or <emphasis>decline</emphasis> the
- &startelm; event. Handlers are composed into a <firstterm>handler
- stack</firstterm> before parsing a document. When a new &startelm;
- event is generated by the XML parser, &neon; invokes each &startelm;
- callback in the handler stack in turn until one accepts the event.
- The handler which accepts the event will then be subsequently be
- passed &cdata; events if the element contains character data,
- followed by an &endelm; event when the element is closed. If no
- handler in the stack accepts a &startelm; event, the branch of the
- tree is ignored.</para>
- <para>To illustrate, given a handler A, which accepts the
- <literal>cat</literal> and <literal>age</literal> elements, and a
- handler B, which accepts the <literal>name</literal> element, the
- following document:
- <example id="xml-example">
- <title>An example XML document</title>
- <programlisting><![CDATA[
- <cat>
- <age>3</age>
- <name>Bob</name>
- </cat>
- ]]></programlisting></example>
- would be parsed as follows:
-
- <orderedlist>
- <listitem>
- <simpara>A &startelm; "cat" → <emphasis>accept</emphasis></simpara>
- </listitem>
- <listitem>
- <simpara>A &startelm; "age" → <emphasis>accept</emphasis></simpara>
- </listitem>
- <listitem>
- <simpara>A &cdata; "3"</simpara>
- </listitem>
- <listitem>
- <simpara>A &endelm; "age"</simpara>
- </listitem>
- <listitem>
- <simpara>A &startelm; "name" → <emphasis>decline</emphasis></simpara>
- </listitem>
- <listitem>
- <simpara>B &startelm; "name" → <emphasis>accept</emphasis></simpara>
- </listitem>
- <listitem>
- <simpara>B &cdata; "Bob"</simpara>
- </listitem>
- <listitem>
- <simpara>B &endelm; "name"</simpara>
- </listitem>
- <listitem>
- <simpara>A &endelm; "cat"</simpara>
- </listitem>
- </orderedlist></para>
- <para>The search for a handler which will accept a &startelm; event
- begins at the handler of the parent element and continues toward the
- top of the stack. For the root element, it begins at the base of
- the stack. In the above example, handler A is at the base, and
- handler B at the top; if the <literal>name</literal> element had any
- children, only B's &startelm; would be invoked to accept
- them.</para>
- </sect2>
- <sect2 id="xml-state">
- <title>Maintaining state</title>
- <para>To facilitate communication between independent handlers, a
- <firstterm>state integer</firstterm> is associated with each element
- being parsed. This integer is returned by &startelm; callback and
- is passed to the subsequent &cdata; and &endelm; callbacks
- associated with the element. The state integer of the parent
- element is also passed to each &startelm; callback, the value zero
- used for the root element (which by definition has no
- parent).</para>
- <para>To further extend <xref linkend="xml-example"/>: if handler A
- defines that the state of the root element <sgmltag>cat</sgmltag>
- will be <literal>42</literal>, the event trace would be as
- follows:
- <orderedlist>
- <listitem>
- <simpara>A &startelm; (parent = 0, "cat") →
- <emphasis>accept</emphasis>, state = 42
- </simpara>
- </listitem>
- <listitem>
- <simpara>A &startelm; (parent = 42, "age") →
- <emphasis>accept</emphasis>, state = 50
- </simpara>
- </listitem>
- <listitem>
- <simpara>A &cdata; (state = 50, "3")</simpara>
- </listitem>
- <listitem>
- <simpara>A &endelm; (state = 50, "age")</simpara>
- </listitem>
- <listitem>
- <simpara>A &startelm; (parent = 42, "name") →
- <emphasis>decline</emphasis></simpara>
- </listitem>
- <listitem>
- <simpara>B &startelm; (parent = 42, "name") →
- <emphasis>accept</emphasis>, state = 99</simpara>
- </listitem>
- <listitem>
- <simpara>B &cdata; (state = 99, "Bob")</simpara>
- </listitem>
- <listitem>
- <simpara>B &endelm; (state = 99, "name")</simpara>
- </listitem>
- <listitem>
- <simpara>A &endelm; (state = 42, "cat")</simpara>
- </listitem>
- </orderedlist></para>
- <para>To avoid collisions between state integers used by different
- handlers, the interface definition of any handler includes the range
- of integers it will use.</para>
- </sect2>
- <sect2 id="xml-ns">
- <title>XML namespaces</title>
- <para>To support XML namespaces, every element name is represented
- as a <emphasis>(namespace, name)</emphasis> pair. The &startelm;
- and &endelm; callbacks are passed namespace and name strings
- accordingly. If an element in the XML document has no declared
- namespace, the namespace given will be the empty string,
- <literal>""</literal>.</para>
- </sect2>
- </sect1>
|