xmlwf.xml 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473
  1. <!DOCTYPE refentry [
  2. <!-- Fill in your name for FIRSTNAME and SURNAME. -->
  3. <!ENTITY dhfirstname "<firstname>Scott</firstname>">
  4. <!ENTITY dhsurname "<surname>Bronson</surname>">
  5. <!-- Please adjust the date whenever revising the manpage. -->
  6. <!ENTITY dhdate "<date>March 11, 2016</date>">
  7. <!-- SECTION should be 1-8, maybe w/ subsection other parameters are
  8. allowed: see man(7), man(1). -->
  9. <!ENTITY dhsection "<manvolnum>1</manvolnum>">
  10. <!ENTITY dhemail "<email>[email protected]</email>">
  11. <!ENTITY dhusername "Scott Bronson">
  12. <!ENTITY dhucpackage "<refentrytitle>XMLWF</refentrytitle>">
  13. <!ENTITY dhpackage "xmlwf">
  14. <!ENTITY debian "<productname>Debian GNU/Linux</productname>">
  15. <!ENTITY gnu "<acronym>GNU</acronym>">
  16. ]>
  17. <refentry>
  18. <refentryinfo>
  19. <address>
  20. &dhemail;
  21. </address>
  22. <author>
  23. &dhfirstname;
  24. &dhsurname;
  25. </author>
  26. <copyright>
  27. <year>2001</year>
  28. <holder>&dhusername;</holder>
  29. </copyright>
  30. &dhdate;
  31. </refentryinfo>
  32. <refmeta>
  33. &dhucpackage;
  34. &dhsection;
  35. </refmeta>
  36. <refnamediv>
  37. <refname>&dhpackage;</refname>
  38. <refpurpose>Determines if an XML document is well-formed</refpurpose>
  39. </refnamediv>
  40. <refsynopsisdiv>
  41. <cmdsynopsis>
  42. <command>&dhpackage;</command>
  43. <arg><option>-s</option></arg>
  44. <arg><option>-n</option></arg>
  45. <arg><option>-p</option></arg>
  46. <arg><option>-x</option></arg>
  47. <arg><option>-e <replaceable>encoding</replaceable></option></arg>
  48. <arg><option>-w</option></arg>
  49. <arg><option>-d <replaceable>output-dir</replaceable></option></arg>
  50. <arg><option>-c</option></arg>
  51. <arg><option>-m</option></arg>
  52. <arg><option>-r</option></arg>
  53. <arg><option>-t</option></arg>
  54. <arg><option>-N</option></arg>
  55. <arg><option>-v</option></arg>
  56. <arg>file ...</arg>
  57. </cmdsynopsis>
  58. </refsynopsisdiv>
  59. <refsect1>
  60. <title>DESCRIPTION</title>
  61. <para>
  62. <command>&dhpackage;</command> uses the Expat library to
  63. determine if an XML document is well-formed. It is
  64. non-validating.
  65. </para>
  66. <para>
  67. If you do not specify any files on the command-line, and you
  68. have a recent version of <command>&dhpackage;</command>, the
  69. input file will be read from standard input.
  70. </para>
  71. </refsect1>
  72. <refsect1>
  73. <title>WELL-FORMED DOCUMENTS</title>
  74. <para>
  75. A well-formed document must adhere to the
  76. following rules:
  77. </para>
  78. <itemizedlist>
  79. <listitem><para>
  80. The file begins with an XML declaration. For instance,
  81. <literal>&lt;?xml version="1.0" standalone="yes"?&gt;</literal>.
  82. <emphasis>NOTE:</emphasis>
  83. <command>&dhpackage;</command> does not currently
  84. check for a valid XML declaration.
  85. </para></listitem>
  86. <listitem><para>
  87. Every start tag is either empty (&lt;tag/&gt;)
  88. or has a corresponding end tag.
  89. </para></listitem>
  90. <listitem><para>
  91. There is exactly one root element. This element must contain
  92. all other elements in the document. Only comments, white
  93. space, and processing instructions may come after the close
  94. of the root element.
  95. </para></listitem>
  96. <listitem><para>
  97. All elements nest properly.
  98. </para></listitem>
  99. <listitem><para>
  100. All attribute values are enclosed in quotes (either single
  101. or double).
  102. </para></listitem>
  103. </itemizedlist>
  104. <para>
  105. If the document has a DTD, and it strictly complies with that
  106. DTD, then the document is also considered <emphasis>valid</emphasis>.
  107. <command>&dhpackage;</command> is a non-validating parser --
  108. it does not check the DTD. However, it does support
  109. external entities (see the <option>-x</option> option).
  110. </para>
  111. </refsect1>
  112. <refsect1>
  113. <title>OPTIONS</title>
  114. <para>
  115. When an option includes an argument, you may specify the argument either
  116. separately ("<option>-d</option> output") or concatenated with the
  117. option ("<option>-d</option>output"). <command>&dhpackage;</command>
  118. supports both.
  119. </para>
  120. <variablelist>
  121. <varlistentry>
  122. <term><option>-c</option></term>
  123. <listitem>
  124. <para>
  125. If the input file is well-formed and <command>&dhpackage;</command>
  126. doesn't encounter any errors, the input file is simply copied to
  127. the output directory unchanged.
  128. This implies no namespaces (turns off <option>-n</option>) and
  129. requires <option>-d</option> to specify an output directory.
  130. </para>
  131. </listitem>
  132. </varlistentry>
  133. <varlistentry>
  134. <term><option>-d output-dir</option></term>
  135. <listitem>
  136. <para>
  137. Specifies a directory to contain transformed
  138. representations of the input files.
  139. By default, <option>-d</option> outputs a canonical representation
  140. (described below).
  141. You can select different output formats using <option>-c</option>,
  142. <option>-m</option> and <option>-N</option>.
  143. </para>
  144. <para>
  145. The output filenames will
  146. be exactly the same as the input filenames or "STDIN" if the input is
  147. coming from standard input. Therefore, you must be careful that the
  148. output file does not go into the same directory as the input
  149. file. Otherwise, <command>&dhpackage;</command> will delete the
  150. input file before it generates the output file (just like running
  151. <literal>cat &lt; file &gt; file</literal> in most shells).
  152. </para>
  153. <para>
  154. Two structurally equivalent XML documents have a byte-for-byte
  155. identical canonical XML representation.
  156. Note that ignorable white space is considered significant and
  157. is treated equivalently to data.
  158. More on canonical XML can be found at
  159. http://www.jclark.com/xml/canonxml.html .
  160. </para>
  161. </listitem>
  162. </varlistentry>
  163. <varlistentry>
  164. <term><option>-e encoding</option></term>
  165. <listitem>
  166. <para>
  167. Specifies the character encoding for the document, overriding
  168. any document encoding declaration. <command>&dhpackage;</command>
  169. supports four built-in encodings:
  170. <literal>US-ASCII</literal>,
  171. <literal>UTF-8</literal>,
  172. <literal>UTF-16</literal>, and
  173. <literal>ISO-8859-1</literal>.
  174. Also see the <option>-w</option> option.
  175. </para>
  176. </listitem>
  177. </varlistentry>
  178. <varlistentry>
  179. <term><option>-m</option></term>
  180. <listitem>
  181. <para>
  182. Outputs some strange sort of XML file that completely
  183. describes the input file, including character positions.
  184. Requires <option>-d</option> to specify an output file.
  185. </para>
  186. </listitem>
  187. </varlistentry>
  188. <varlistentry>
  189. <term><option>-n</option></term>
  190. <listitem>
  191. <para>
  192. Turns on namespace processing. (describe namespaces)
  193. <option>-c</option> disables namespaces.
  194. </para>
  195. </listitem>
  196. </varlistentry>
  197. <varlistentry>
  198. <term><option>-N</option></term>
  199. <listitem>
  200. <para>
  201. Adds a doctype and notation declarations to canonical XML output.
  202. This matches the example output used by the formal XML test cases.
  203. Requires <option>-d</option> to specify an output file.
  204. </para>
  205. </listitem>
  206. </varlistentry>
  207. <varlistentry>
  208. <term><option>-p</option></term>
  209. <listitem>
  210. <para>
  211. Tells xmlwf to process external DTDs and parameter
  212. entities.
  213. </para>
  214. <para>
  215. Normally <command>&dhpackage;</command> never parses parameter
  216. entities. <option>-p</option> tells it to always parse them.
  217. <option>-p</option> implies <option>-x</option>.
  218. </para>
  219. </listitem>
  220. </varlistentry>
  221. <varlistentry>
  222. <term><option>-r</option></term>
  223. <listitem>
  224. <para>
  225. Normally <command>&dhpackage;</command> memory-maps the XML file
  226. before parsing; this can result in faster parsing on many
  227. platforms.
  228. <option>-r</option> turns off memory-mapping and uses normal file
  229. IO calls instead.
  230. Of course, memory-mapping is automatically turned off
  231. when reading from standard input.
  232. </para>
  233. <para>
  234. Use of memory-mapping can cause some platforms to report
  235. substantially higher memory usage for
  236. <command>&dhpackage;</command>, but this appears to be a matter of
  237. the operating system reporting memory in a strange way; there is
  238. not a leak in <command>&dhpackage;</command>.
  239. </para>
  240. </listitem>
  241. </varlistentry>
  242. <varlistentry>
  243. <term><option>-s</option></term>
  244. <listitem>
  245. <para>
  246. Prints an error if the document is not standalone.
  247. A document is standalone if it has no external subset and no
  248. references to parameter entities.
  249. </para>
  250. </listitem>
  251. </varlistentry>
  252. <varlistentry>
  253. <term><option>-t</option></term>
  254. <listitem>
  255. <para>
  256. Turns on timings. This tells Expat to parse the entire file,
  257. but not perform any processing.
  258. This gives a fairly accurate idea of the raw speed of Expat itself
  259. without client overhead.
  260. <option>-t</option> turns off most of the output options
  261. (<option>-d</option>, <option>-m</option>, <option>-c</option>, ...).
  262. </para>
  263. </listitem>
  264. </varlistentry>
  265. <varlistentry>
  266. <term><option>-v</option></term>
  267. <listitem>
  268. <para>
  269. Prints the version of the Expat library being used, including some
  270. information on the compile-time configuration of the library, and
  271. then exits.
  272. </para>
  273. </listitem>
  274. </varlistentry>
  275. <varlistentry>
  276. <term><option>-w</option></term>
  277. <listitem>
  278. <para>
  279. Enables support for Windows code pages.
  280. Normally, <command>&dhpackage;</command> will throw an error if it
  281. runs across an encoding that it is not equipped to handle itself. With
  282. <option>-w</option>, &dhpackage; will try to use a Windows code
  283. page. See also <option>-e</option>.
  284. </para>
  285. </listitem>
  286. </varlistentry>
  287. <varlistentry>
  288. <term><option>-x</option></term>
  289. <listitem>
  290. <para>
  291. Turns on parsing external entities.
  292. </para>
  293. <para>
  294. Non-validating parsers are not required to resolve external
  295. entities, or even expand entities at all.
  296. Expat always expands internal entities (?),
  297. but external entity parsing must be enabled explicitly.
  298. </para>
  299. <para>
  300. External entities are simply entities that obtain their
  301. data from outside the XML file currently being parsed.
  302. </para>
  303. <para>
  304. This is an example of an internal entity:
  305. <literallayout>
  306. &lt;!ENTITY vers '1.0.2'&gt;
  307. </literallayout>
  308. </para>
  309. <para>
  310. And here are some examples of external entities:
  311. <literallayout>
  312. &lt;!ENTITY header SYSTEM "header-&amp;vers;.xml"&gt; (parsed)
  313. &lt;!ENTITY logo SYSTEM "logo.png" PNG&gt; (unparsed)
  314. </literallayout>
  315. </para>
  316. </listitem>
  317. </varlistentry>
  318. <varlistentry>
  319. <term><option>--</option></term>
  320. <listitem>
  321. <para>
  322. (Two hyphens.)
  323. Terminates the list of options. This is only needed if a filename
  324. starts with a hyphen. For example:
  325. </para>
  326. <literallayout>
  327. &dhpackage; -- -myfile.xml
  328. </literallayout>
  329. <para>
  330. will run <command>&dhpackage;</command> on the file
  331. <filename>-myfile.xml</filename>.
  332. </para>
  333. </listitem>
  334. </varlistentry>
  335. </variablelist>
  336. <para>
  337. Older versions of <command>&dhpackage;</command> do not support
  338. reading from standard input.
  339. </para>
  340. </refsect1>
  341. <refsect1>
  342. <title>OUTPUT</title>
  343. <para>
  344. If an input file is not well-formed,
  345. <command>&dhpackage;</command> prints a single line describing
  346. the problem to standard output. If a file is well formed,
  347. <command>&dhpackage;</command> outputs nothing.
  348. </para>
  349. </refsect1>
  350. <refsect1>
  351. <title>EXIT STATUS</title>
  352. <para>For option <option>-v</option> or <option>-h</option>, <command>&dhpackage;</command> always exits with status code 0. For other cases, the following exit status codes are returned:
  353. <variablelist>
  354. <varlistentry>
  355. <term><option>0</option></term>
  356. <listitem><para>The input files are well-formed.</para>
  357. </listitem>
  358. </varlistentry>
  359. <varlistentry>
  360. <term><option>1</option></term>
  361. <listitem><para>An internal error occurred.</para>
  362. </listitem>
  363. </varlistentry>
  364. <varlistentry>
  365. <term><option>2</option></term>
  366. <listitem><para>An input file was not well-formed or could not be parsed.</para>
  367. </listitem>
  368. </varlistentry>
  369. <varlistentry>
  370. <term><option>3</option></term>
  371. <listitem><para>If using the <option>-d</option> option, an error occurred opening an output file.</para>
  372. </listitem>
  373. </varlistentry>
  374. </variablelist>
  375. </para>
  376. </refsect1>
  377. <refsect1>
  378. <title>BUGS</title>
  379. <para>
  380. The errors should go to standard error, not standard output.
  381. </para>
  382. <para>
  383. There should be a way to get <option>-d</option> to send its
  384. output to standard output rather than forcing the user to send
  385. it to a file.
  386. </para>
  387. <para>
  388. I have no idea why anyone would want to use the
  389. <option>-d</option>, <option>-c</option>, and
  390. <option>-m</option> options. If someone could explain it to
  391. me, I'd like to add this information to this manpage.
  392. </para>
  393. </refsect1>
  394. <refsect1>
  395. <title>ALTERNATIVES</title>
  396. <para>
  397. Here are some XML validators on the web:
  398. <literallayout>
  399. http://www.hcrc.ed.ac.uk/~richard/xml-check.html
  400. http://www.stg.brown.edu/service/xmlvalid/
  401. http://www.scripting.com/frontier5/xml/code/xmlValidator.html
  402. http://www.xml.com/pub/a/tools/ruwf/check.html
  403. </literallayout>
  404. </para>
  405. </refsect1>
  406. <refsect1>
  407. <title>SEE ALSO</title>
  408. <para>
  409. <literallayout>
  410. The Expat home page: http://www.libexpat.org/
  411. The W3 XML specification: http://www.w3.org/TR/REC-xml
  412. </literallayout>
  413. </para>
  414. </refsect1>
  415. <refsect1>
  416. <title>AUTHOR</title>
  417. <para>
  418. This manual page was written by &dhusername; &dhemail; for
  419. the &debian; system (but may be used by others). Permission is
  420. granted to copy, distribute and/or modify this document under
  421. the terms of the <acronym>GNU</acronym> Free Documentation
  422. License, Version 1.1.
  423. </para>
  424. </refsect1>
  425. </refentry>