Lists Home |
Date Index |
- To: "Elliotte Rusty Harold" <email@example.com>
- Subject: RE: [xml-dev] What is the rule for parsing XML in a namespace inside HTML?
- From: "Joshua Allen" <firstname.lastname@example.org>
- Date: Thu, 15 Jul 2004 10:41:50 -0700
- Cc: "XML Developers List" <email@example.com>
- Thread-index: AcRqXoYZqKPi1p73RwewJD2k35hvdQAMWXmQ
- Thread-topic: [xml-dev] What is the rule for parsing XML in a namespace inside HTML?
> You persist in seeing this as two different things, which is
> twice as much work. People want to read web pages in their
> browsers. They also want to be able to process it with
You have perfectly described where our disagreement is. People want to
be able to write web pages which can be read in web browsers. That is
the overwhelming majority use case.
It is not a common goal for page authors to make web pages which can
optionally be imported into contacts databases, consumed in a news
aggregator, imported into a sales order system, and so on. This is not
only not a common use-case, it is a terrible idea. If anything, people
who provide data for such applications might choose to make their data
optionally visible within a web browser -- this can be done by using
XML+XSLT+CSS as I explained; and I have seen it done with all three of
the examples I gave.
So, yes, I *do* feel that machine-readable and human-readable are
normally separate scenarios. Furthermore, I feel that my suggestion of
XML+XSLT+CSS is the only reasonable choice when the two are combined.
When you want a document to serve *both* human-readable and machine
purposes, you have to decide whether human-readable is primary or
secondary. The history of the industry is littered with failed attempts
to force machines to read human-primal documents. How many times did we
try to force document authors to embed various markers in their
documents, use specific fonts, wrap boxes around fields, etc. in order
to assist the work of OCR "form readers"? How well did that work?
Screen scraping is not much different. Screen scraping is something you
use when the system was improperly designed and you have no other
choice; it is not something you intentionally design for.
Another way to look at it. Out of 100 random cases where someone
decides to create a new document (in HTML or XML), what number do you
think falls among these 4 scenarios? Here are my guesses:
A) Web page intended primarily for human consumption: 68
B) Same as A, but might like to support import into contacts database,
purchasing system, etc: 2
C) Data file intended primarily for use in a contacts database,
purchasing system, etc: 20
D) Same as C, but might want to let people view in a web browser: 10