OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX and Locator

[ Lists Home | Date Index | Thread Index ]

>>I am new to XML so if this is a dumb question, please don't be too
hard on me. <<

Of course not!

>>I need to know exactly where in the original document certain SAX
events originate, so I am using the Locator mechanism.<<

SAX has never been very concerned with lexical information. The locator is
intended to give accurate information but the implementations I have tried
are fairly non-conformant (if there is "conformance" at that level). This is
something some people would like to see improved (myself included) but it is
not high priority. That said, the spec itself should provide enough

>>Would it be possible to have Locator return the starting character
offset and the ending character offset of the sequence of characters
that generated the SAX event? For example, after a tag is processed,
within the corresponding startElement(...) callback a call to
Locator.getStartCharacter() would return the location of the "<" at
the start of the tag and a call to Locator.getEndCharacter() would
return the location  of the ">" at the end. The location would simply
be the index of the characters within the original stream.<<

You could get the start character for the whole section of character data
(preseumably) as that is where startElement is called, and is more reliable:

<foo>Some data</foo>

Again it is because SAX is trying to omit ignorable/unimportant lexical
information like the amount of space between attributes:

<foo att1="1"

    att2="2">Some data</foo>

The space between att1 and att2 is unimportant (as is the start element
itself). The SAX spec, wrt to locator info, (from my experience) takes the
view that document structures are mostly syntactic and ignorable-- the text
is the renderable portion (in say, HTML). I don't know if any of this helps.
What also won't help (at least not right away) is to add an RFE on the SAX
Project page at sourceforge. I am not sure how soon this type of thing will
be addressed though-- historically SAX has avoided any type of lexical

Your best bet would be to (bite my tongue) modify a parser if you need the
info. Of course interop goes out the window then.

All the best,
Jeff Rafter
Defined Systems
XML Development and Developer Web Hosting


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS