Lists Home |
Date Index |
- From: Rick JELLIFFE <firstname.lastname@example.org>
- To: email@example.com
- Date: Tue, 01 Aug 2000 20:45:36 +0800
Sean McGrath wrote:
> [Rick Jelliffe]
> >That there can be several different lexical forms in XML for the same
> >information item
> >allows one to use text-based tools such as UNIX tools. (The one I
> >recommend is
> >always to keep markup and data for titles and searchable strings on a
> >single line, so that greps will work.)
> But the fact that the "same item" can have so many different lexical
> forms means that getting the right answer every time necessitates
> parsing the XML.
Err, yes. If you parse some text as a series of lines you will get one
result, if you parse it as XML you will get another, and if you parse it
as C yet another. So what?
> >The infoset lets people know what information will be in the parsed XML,
> >regardless of
> >which lexical form was used.
> All forms of accurate XML data processing - even the dumbest
> lexical processing - involve parsing of some form. The idea
> non-parsed <--> parsed
> are two ends of an extreme with clear blue water in
> between does not seem right to me.
I would agree, but in discussing XML processing,
"parsed" is short-hand for "parsed-as-XML" and "non-parsed" is shorthand
for "not parsed-as-xml". There is no need for me to write "what
information will be in the parsed-as-XML XML": in the context of talking
about XML processing what is meant by "parsed XML" should be obvious.
The XML Infoset is not a set of categories determined by science or
nature, it is a policy document derived by engineering and negotiation
which identifies and grades the various kinds of information that a
parsed XML document has, for use in various W3C specs. Having an infoset
spec gives spec-making groups an indication of what mainstream
requirements are: for example, "should the DOM report line-numbers?" is
an example of something that the infoset could help in.