Lists Home |
Date Index |
Elliotte Rusty Harold wrote:
> At 1:11 PM +0200 6/5/04, Bjoern Hoehrmann wrote:
[a given XPath being useful to extract a certain thing from a certain
URL's response to GET]
>> How do you know?
> View source.
That's the thing. Viewing the source once (or, indeed, N times) and
seeing a pattern (today's stories being at //html:today) and assuming it
will work in future is, indeed, a rather informal kind of schema.
At least, by my definition of schema as learnt from the database world,
which is something like "a convention on how a given abstract piece of
information is represented" - in this case, I'm not talking about schema
in the sense perhaps more normally found in XML, as a "validity constraint".
As well as that XPath, there's probably more to the informal schema
being used here - unless the software that uses that XPath to extract
today's news is a totally generic XSLT/CSS/etc supporting XML browser,
then there's probably also an assumption that it's in XHTML, and that
it's human-readable text in some human language as well (perhaps even an
assumption of it being a specific dialect of English).
Information about the structure of a site gleaned from viewing the
source may be subject to random change; if the site published a schema
(be it a formal machine-readable schema or a paragraph of text like
above), they would then have the opportunity to also state how far users
can rely on that not changing in future. They may lie, of course, but
people will have more cause to complain if they "said" they wouldn't
change it; so when some software that relies on it breaks, the author of
the software can say "Hey! The news site broke its promise" rather than
"Uh, I made an assumption that no longer holds"...