[
Lists Home |
Date Index |
Thread Index
]
At 10:21 AM -0700 6/4/04, Howard Katz wrote:
>I don't understand this last point, Elliotte. How can a properly designed
>application ask whether a document contains the information it needs without
>knowing about the document's structure? If you add information, you're most
>likely changing the structure, and consequently the schema. How can an
>application cope with ad hoc changes like that w/out looking at the schema,
>ie without doing validation?
Let me answer with an example. Suppose you want to extract today's
news from Cafe con Leche, an invalid XHTML document. The following
XPath will do it:
//html:today
(assuming the html prefix has been bound to the XHTML namespace in
whatever environment you're using). You need to know nothing else
about what surrounds the today element, where it's positioned in the
document, or even how many today elements there are. You don't care
what the today elements contain. You don't care what contains them.
It is a very robust solution, much more so than solutions based on
explicit knowledge that the today element is the seventh child of a
td element that is is the first child of a tr element that is the
first child of a table element that is a child of the only table
element that is the second child of a body element that is the only
body child of an html element which is the root element of the
document.
At no point do you need to know the scheme for the page in order to
extract information from it. Indeed if you tried to do that, you'd
fail because the page is invalid and the relevant information is
found in elements that don't even exist in the schema.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
|