xml.xml 7.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206
  1. <!-- neon XML interface -*- text -*- -->
  2. <sect1 id="xml">
  3. <title>Parsing XML</title>
  4. <para>The &neon; XML interface is exposed by the
  5. <filename>ne_xml.h</filename> header file. This interface gives a
  6. wrapper around the standard <ulink
  7. url="http://www.saxproject.org/">SAX</ulink> API used by XML
  8. parsers, with an additional abstraction, <firstterm>stacked SAX
  9. handlers</firstterm>, and also giving consistent <ulink
  10. url="http://www.w3.org/TR/REC-xml-names">XML Namespace</ulink> support.</para>
  11. <sect2 id="xml-sax">
  12. <title>Introduction to SAX</title>
  13. <para>A SAX-based parser works by emitting a sequence of
  14. <firstterm>events</firstterm> to reflect the tokens being parsed
  15. from the XML document. For example, parsing the following document
  16. fragment:
  17. <programlisting><![CDATA[
  18. <hello>world</hello>
  19. ]]></programlisting>
  20. results in the following events:
  21. <orderedlist>
  22. <listitem>
  23. <simpara>&startelm; "hello"</simpara>
  24. </listitem>
  25. <listitem>
  26. <simpara>&cdata; "world"</simpara>
  27. </listitem>
  28. <listitem>
  29. <simpara>&endelm; "hello"</simpara>
  30. </listitem>
  31. </orderedlist>
  32. This example demonstrates the three event types used used in the
  33. subset of SAX exposed by the &neon; XML interface: &startelm;,
  34. &cdata; and &endelm;. In a C API, an <quote>event</quote> is
  35. implemented as a function callback; three callback types are used in
  36. &neon;, one for each type of event.</para>
  37. </sect2>
  38. <sect2 id="xml-stacked">
  39. <title>Stacked SAX handlers</title>
  40. <para>WebDAV property values are represented as fragments of XML,
  41. transmitted as parts of larger XML documents over HTTP (notably in
  42. the body of the response to a <literal>PROPFIND</literal> request).
  43. When &neon; parses such documents, the SAX events generated for
  44. these property value fragments may need to be handled by the
  45. application, since &neon; has no knowledge of the structure of
  46. properties used by the application.</para>
  47. <para>To solve this problem<footnote id="foot.xml.sax"><para>This
  48. <quote>problem</quote> only needs solving because the SAX interface
  49. is so inflexible when implemented as C function callbacks; a better
  50. approach would be to use an XML parser interface which is not based
  51. on callbacks.</para></footnote> the &neon; XML interface introduces
  52. the concept of a <firstterm>SAX handler</firstterm>. A SAX handler
  53. comprises a &startelm;, &cdata; and &endelm; callback; the
  54. &startelm; callback being defined such that each handler may
  55. <emphasis>accept</emphasis> or <emphasis>decline</emphasis> the
  56. &startelm; event. Handlers are composed into a <firstterm>handler
  57. stack</firstterm> before parsing a document. When a new &startelm;
  58. event is generated by the XML parser, &neon; invokes each &startelm;
  59. callback in the handler stack in turn until one accepts the event.
  60. The handler which accepts the event will then be subsequently be
  61. passed &cdata; events if the element contains character data,
  62. followed by an &endelm; event when the element is closed. If no
  63. handler in the stack accepts a &startelm; event, the branch of the
  64. tree is ignored.</para>
  65. <para>To illustrate, given a handler A, which accepts the
  66. <literal>cat</literal> and <literal>age</literal> elements, and a
  67. handler B, which accepts the <literal>name</literal> element, the
  68. following document:
  69. <example id="xml-example">
  70. <title>An example XML document</title>
  71. <programlisting><![CDATA[
  72. <cat>
  73. <age>3</age>
  74. <name>Bob</name>
  75. </cat>
  76. ]]></programlisting></example>
  77. would be parsed as follows:
  78. <orderedlist>
  79. <listitem>
  80. <simpara>A &startelm; "cat" &rarr; <emphasis>accept</emphasis></simpara>
  81. </listitem>
  82. <listitem>
  83. <simpara>A &startelm; "age" &rarr; <emphasis>accept</emphasis></simpara>
  84. </listitem>
  85. <listitem>
  86. <simpara>A &cdata; "3"</simpara>
  87. </listitem>
  88. <listitem>
  89. <simpara>A &endelm; "age"</simpara>
  90. </listitem>
  91. <listitem>
  92. <simpara>A &startelm; "name" &rarr; <emphasis>decline</emphasis></simpara>
  93. </listitem>
  94. <listitem>
  95. <simpara>B &startelm; "name" &rarr; <emphasis>accept</emphasis></simpara>
  96. </listitem>
  97. <listitem>
  98. <simpara>B &cdata; "Bob"</simpara>
  99. </listitem>
  100. <listitem>
  101. <simpara>B &endelm; "name"</simpara>
  102. </listitem>
  103. <listitem>
  104. <simpara>A &endelm; "cat"</simpara>
  105. </listitem>
  106. </orderedlist></para>
  107. <para>The search for a handler which will accept a &startelm; event
  108. begins at the handler of the parent element and continues toward the
  109. top of the stack. For the root element, it begins at the base of
  110. the stack. In the above example, handler A is at the base, and
  111. handler B at the top; if the <literal>name</literal> element had any
  112. children, only B's &startelm; would be invoked to accept
  113. them.</para>
  114. </sect2>
  115. <sect2 id="xml-state">
  116. <title>Maintaining state</title>
  117. <para>To facilitate communication between independent handlers, a
  118. <firstterm>state integer</firstterm> is associated with each element
  119. being parsed. This integer is returned by &startelm; callback and
  120. is passed to the subsequent &cdata; and &endelm; callbacks
  121. associated with the element. The state integer of the parent
  122. element is also passed to each &startelm; callback, the value zero
  123. used for the root element (which by definition has no
  124. parent).</para>
  125. <para>To further extend <xref linkend="xml-example"/>: if handler A
  126. defines that the state of the root element <sgmltag>cat</sgmltag>
  127. will be <literal>42</literal>, the event trace would be as
  128. follows:
  129. <orderedlist>
  130. <listitem>
  131. <simpara>A &startelm; (parent = 0, "cat") &rarr;
  132. <emphasis>accept</emphasis>, state = 42
  133. </simpara>
  134. </listitem>
  135. <listitem>
  136. <simpara>A &startelm; (parent = 42, "age") &rarr;
  137. <emphasis>accept</emphasis>, state = 50
  138. </simpara>
  139. </listitem>
  140. <listitem>
  141. <simpara>A &cdata; (state = 50, "3")</simpara>
  142. </listitem>
  143. <listitem>
  144. <simpara>A &endelm; (state = 50, "age")</simpara>
  145. </listitem>
  146. <listitem>
  147. <simpara>A &startelm; (parent = 42, "name") &rarr;
  148. <emphasis>decline</emphasis></simpara>
  149. </listitem>
  150. <listitem>
  151. <simpara>B &startelm; (parent = 42, "name") &rarr;
  152. <emphasis>accept</emphasis>, state = 99</simpara>
  153. </listitem>
  154. <listitem>
  155. <simpara>B &cdata; (state = 99, "Bob")</simpara>
  156. </listitem>
  157. <listitem>
  158. <simpara>B &endelm; (state = 99, "name")</simpara>
  159. </listitem>
  160. <listitem>
  161. <simpara>A &endelm; (state = 42, "cat")</simpara>
  162. </listitem>
  163. </orderedlist></para>
  164. <para>To avoid collisions between state integers used by different
  165. handlers, the interface definition of any handler includes the range
  166. of integers it will use.</para>
  167. </sect2>
  168. <sect2 id="xml-ns">
  169. <title>XML namespaces</title>
  170. <para>To support XML namespaces, every element name is represented
  171. as a <emphasis>(namespace, name)</emphasis> pair. The &startelm;
  172. and &endelm; callbacks are passed namespace and name strings
  173. accordingly. If an element in the XML document has no declared
  174. namespace, the namespace given will be the empty string,
  175. <literal>""</literal>.</para>
  176. </sect2>
  177. </sect1>