XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Determining the text of a leaf node is wicked hard

Hi Folks,

What's the content of this leaf element:

<Test>Hello, world</Test>

The content is "Hello, world"

Easy, right?

Not so fast.

Let's look at some other leaf elements.

What's the content of this leaf element:

<Test>Harper &amp; Row</Test>

The text inside the tag is interrupted by an XML entity. The XML entity must be resolved and then spliced together with the text before and after the entity. The content is this: "Harper & Row"

How about this, what's its content:

<Test>Equation <![CDATA[A < B]]> done</Test>

The text inside the tag is interrupted by a CDATA section. The data inside the CDATA section must be extracted, the CDATA syntax discarded, and then the remaining items spliced together. The content is this: "Equation A < B done"

Here's a leaf element that has a comment:

<Test>John, <!-- blah, blah -->Paul, and Ringo</Test>

The text inside the tag is interrupted by a comment. The comment must be discarded, and the remaining items spliced together. The content is this: "John, Paul, and Ringo"

Now let's mix things together:

<Test> Harper &amp; Row. Equation <![CDATA[A < B]]> done. John, <!-- blah, blah --> Paul, and Ringo</Test>

The text inside the tag is interrupted by an entity, a CDATA section, and a comment. The entity must be resolved, the data in the CDATA section extracted, the CDATA syntax discarded, the comment must be discarded, and then the remaining items spliced together. The content is this: "Harper & Row. Equation A < B done. John, Paul, and Ringo"

There are also numerical entities and PIs to handle. Anything else?

Imagine trying to write a lexical analyzer (scanner) to handle all these cases, and generate a single text node. Not a trivial task. It will be wicked hard.

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS