| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990 |
- <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Parsing XML</title><link rel="stylesheet" type="text/css" href="../manual.css"><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="neon HTTP/WebDAV client library"><link rel="up" href="api.html" title="Chapter 2. The neon C language interface"><link rel="prev" href="api.html" title="Chapter 2. The neon C language interface"><link rel="next" href="ref.html" title="neon API reference"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Parsing XML</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="api.html">Prev</a> </td><th width="60%" align="center">Chapter 2. The neon C language interface</th><td width="20%" align="right"> <a accesskey="n" href="ref.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="xml"></a>Parsing XML</h2></div></div></div><p>The neon XML interface is exposed by the
- <code class="filename">ne_xml.h</code> header file. This interface gives a
- wrapper around the standard <a class="ulink" href="http://www.saxproject.org/" target="_top">SAX</a> API used by XML
- parsers, with an additional abstraction, <em class="firstterm">stacked SAX
- handlers</em>, and also giving consistent <a class="ulink" href="http://www.w3.org/TR/REC-xml-names" target="_top">XML Namespace</a> support.</p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="xml-sax"></a>Introduction to SAX</h3></div></div></div><p>A SAX-based parser works by emitting a sequence of
- <em class="firstterm">events</em> to reflect the tokens being parsed
- from the XML document. For example, parsing the following document
- fragment:
- </p><pre class="programlisting">
- <hello>world</hello>
- </pre><p>
- results in the following events:
- </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><span class="emphasis"><em>start-element</em></span> "hello"</li><li class="listitem"><span class="emphasis"><em>character-data</em></span> "world"</li><li class="listitem"><span class="emphasis"><em>end-element</em></span> "hello"</li></ol></div><p>
- This example demonstrates the three event types used used in the
- subset of SAX exposed by the neon XML interface: <span class="emphasis"><em>start-element</em></span>,
- <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span>. In a C API, an <span class="quote">“<span class="quote">event</span>”</span> is
- implemented as a function callback; three callback types are used in
- neon, one for each type of event.</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="xml-stacked"></a>Stacked SAX handlers</h3></div></div></div><p>WebDAV property values are represented as fragments of XML,
- transmitted as parts of larger XML documents over HTTP (notably in
- the body of the response to a <code class="literal">PROPFIND</code> request).
- When neon parses such documents, the SAX events generated for
- these property value fragments may need to be handled by the
- application, since neon has no knowledge of the structure of
- properties used by the application.</p><p>To solve this problem<a href="#ftn.foot.xml.sax" class="footnote" name="foot.xml.sax"><sup class="footnote">[1]</sup></a> the neon XML interface introduces
- the concept of a <em class="firstterm">SAX handler</em>. A SAX handler
- comprises a <span class="emphasis"><em>start-element</em></span>, <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span> callback; the
- <span class="emphasis"><em>start-element</em></span> callback being defined such that each handler may
- <span class="emphasis"><em>accept</em></span> or <span class="emphasis"><em>decline</em></span> the
- <span class="emphasis"><em>start-element</em></span> event. Handlers are composed into a <em class="firstterm">handler
- stack</em> before parsing a document. When a new <span class="emphasis"><em>start-element</em></span>
- event is generated by the XML parser, neon invokes each <span class="emphasis"><em>start-element</em></span>
- callback in the handler stack in turn until one accepts the event.
- The handler which accepts the event will then be subsequently be
- passed <span class="emphasis"><em>character-data</em></span> events if the element contains character data,
- followed by an <span class="emphasis"><em>end-element</em></span> event when the element is closed. If no
- handler in the stack accepts a <span class="emphasis"><em>start-element</em></span> event, the branch of the
- tree is ignored.</p><p>To illustrate, given a handler A, which accepts the
- <code class="literal">cat</code> and <code class="literal">age</code> elements, and a
- handler B, which accepts the <code class="literal">name</code> element, the
- following document:
- </p><div class="example"><a name="xml-example"></a><p class="title"><b>Example 2.1. An example XML document</b></p><div class="example-contents"><pre class="programlisting">
- <cat>
- <age>3</age>
- <name>Bob</name>
- </cat>
- </pre></div></div><p><br class="example-break">
- would be parsed as follows:
-
- </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">A <span class="emphasis"><em>start-element</em></span> "cat" → <span class="emphasis"><em>accept</em></span></li><li class="listitem">A <span class="emphasis"><em>start-element</em></span> "age" → <span class="emphasis"><em>accept</em></span></li><li class="listitem">A <span class="emphasis"><em>character-data</em></span> "3"</li><li class="listitem">A <span class="emphasis"><em>end-element</em></span> "age"</li><li class="listitem">A <span class="emphasis"><em>start-element</em></span> "name" → <span class="emphasis"><em>decline</em></span></li><li class="listitem">B <span class="emphasis"><em>start-element</em></span> "name" → <span class="emphasis"><em>accept</em></span></li><li class="listitem">B <span class="emphasis"><em>character-data</em></span> "Bob"</li><li class="listitem">B <span class="emphasis"><em>end-element</em></span> "name"</li><li class="listitem">A <span class="emphasis"><em>end-element</em></span> "cat"</li></ol></div><p>The search for a handler which will accept a <span class="emphasis"><em>start-element</em></span> event
- begins at the handler of the parent element and continues toward the
- top of the stack. For the root element, it begins at the base of
- the stack. In the above example, handler A is at the base, and
- handler B at the top; if the <code class="literal">name</code> element had any
- children, only B's <span class="emphasis"><em>start-element</em></span> would be invoked to accept
- them.</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="xml-state"></a>Maintaining state</h3></div></div></div><p>To facilitate communication between independent handlers, a
- <em class="firstterm">state integer</em> is associated with each element
- being parsed. This integer is returned by <span class="emphasis"><em>start-element</em></span> callback and
- is passed to the subsequent <span class="emphasis"><em>character-data</em></span> and <span class="emphasis"><em>end-element</em></span> callbacks
- associated with the element. The state integer of the parent
- element is also passed to each <span class="emphasis"><em>start-element</em></span> callback, the value zero
- used for the root element (which by definition has no
- parent).</p><p>To further extend <a class="xref" href="xml.html#xml-example" title="Example 2.1. An example XML document">Example 2.1, “An example XML document”</a>: if handler A
- defines that the state of the root element <code class="sgmltag-element">cat</code>
- will be <code class="literal">42</code>, the event trace would be as
- follows:
- </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">A <span class="emphasis"><em>start-element</em></span> (parent = 0, "cat") →
- <span class="emphasis"><em>accept</em></span>, state = 42
- </li><li class="listitem">A <span class="emphasis"><em>start-element</em></span> (parent = 42, "age") →
- <span class="emphasis"><em>accept</em></span>, state = 50
- </li><li class="listitem">A <span class="emphasis"><em>character-data</em></span> (state = 50, "3")</li><li class="listitem">A <span class="emphasis"><em>end-element</em></span> (state = 50, "age")</li><li class="listitem">A <span class="emphasis"><em>start-element</em></span> (parent = 42, "name") →
- <span class="emphasis"><em>decline</em></span></li><li class="listitem">B <span class="emphasis"><em>start-element</em></span> (parent = 42, "name") →
- <span class="emphasis"><em>accept</em></span>, state = 99</li><li class="listitem">B <span class="emphasis"><em>character-data</em></span> (state = 99, "Bob")</li><li class="listitem">B <span class="emphasis"><em>end-element</em></span> (state = 99, "name")</li><li class="listitem">A <span class="emphasis"><em>end-element</em></span> (state = 42, "cat")</li></ol></div><p>To avoid collisions between state integers used by different
- handlers, the interface definition of any handler includes the range
- of integers it will use.</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="xml-ns"></a>XML namespaces</h3></div></div></div><p>To support XML namespaces, every element name is represented
- as a <span class="emphasis"><em>(namespace, name)</em></span> pair. The <span class="emphasis"><em>start-element</em></span>
- and <span class="emphasis"><em>end-element</em></span> callbacks are passed namespace and name strings
- accordingly. If an element in the XML document has no declared
- namespace, the namespace given will be the empty string,
- <code class="literal">""</code>.</p></div><div class="footnotes"><br><hr style="width:100; text-align:left;margin-left: 0"><div id="ftn.foot.xml.sax" class="footnote"><p><a href="#foot.xml.sax" class="para"><sup class="para">[1] </sup></a>This
- <span class="quote">“<span class="quote">problem</span>”</span> only needs solving because the SAX interface
- is so inflexible when implemented as C function callbacks; a better
- approach would be to use an XML parser interface which is not based
- on callbacks.</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="api.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="api.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ref.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 2. The neon C language interface </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> neon API reference</td></tr></table></div></body></html>
|