[
Lists Home |
Date Index |
Thread Index
]
> ___Note that a SystemLiteral can be
> parsed without scanning for markup.___
> I wonder what the last sentence really wants to say.
>
> I think I must have missed certain backgroud knowledge about it.
>
> Could you give me some explanation ?
>
> Thank you!
Well, I can't tell you why this sentence is _there_ but I suspect it is
referring in the most part to entity literal values. Entity literals may
contain markup constructs as in:
<!ENTITY foo "<foo></foo>">
What the sentence is saying is that you can "parse" this literal without
scanning for the markup. While this is technically true there are some
caveats. You see, entity literals when referenced (i.e., &foo;...) must
be wellformed on their own-- but only when they are referenced-- (this
one I learned from Richard Tobin)... so technically you can parse
without scanning for markup, but eventually you need to at insert this
at the point that it is referenced and then check that it is wellformed
(which can be done by simply doing a standard parse and checking that
the start and end tags are well balanced-- which I learned from Bob Foster).
Another important caveat is that while parsing but not scanning-- you do
actually need to scan for entity references-- character references and
parameter entities need to be expanded and you need to check that
regular entities actually refer to declared entities (even if the
current entity your are parsing is never referenced). Though very few
parsers actually go to the trouble on that point...
I am babbling... I spent a ton of time on this about a month and a half
ago... it is useful stuff, which if you want to read-- simply look for a
bunch of silly questions from me and look at the very brilliant
responses from the smart folks...
Cheers,
Jeff Rafter
|