[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Generic XML Tag Closer </> (GXTC)
- From: <juanrgonzaleza@xxxxxxxxxxxxxxxxxxxx>
- To: <xml-dev@xxxxxxxxxxxxx>
- Date: Sat, 26 Aug 2006 04:22:12 -0700 (PDT)
Rick Marshall said:
>
> juanrgonzaleza@canonicalscience.com wrote:
>
>>Rick Marshall said:
>>
>>
>>>my 5c
>>>
>>></> is a syntax element and as long as something else understands the
>>> semantics - it will do fine
>>>
>>>however...
>>>
>>></tag> is semantic which means the parser/processor does not need
>>> external information to make a descision about the correctness and
>>> completeness of the information.
>>>
>>>
>>
>><tag1>content1<tag2>content2</tag2></tag1>
>>
>>Once parsed <tag1> and <tag2> the parser finds the "</" and *wait* a
>> "tag2" because consistency of the XML. The same when finds the </tag1>.
>>
>><tag1>content1<tag2>content2</></>
>>
>>Once parsed <tag1> and <tag2> the parser finds the "</" and
>> knows/assumes is closing the open tag2 because consistency of XML. The
>> same when finds the last </>.
>>
>>
> assuming is not the same as knowing.
Sure, see below.
> i mean lets have some real fun - like in html and allow </> to close off
> all unclosed tags. saves a few more keystrokes.
>
> <tag1>...<tag2>...</>
>
> now that's even more compact and it's sort of what html does - but we
> all know how often it gets it wrong.
>
> you see the problem is that when the parser comes across </> it has to
> ASSUME that it is closing the last declared tag. however if you
> accidentally left out a close tag then it's wrong and it will take
> counting opening and closing tags right to the end of the document to
> know that the document is valid.
>
> bye bye sax
>
> rick
Right! Parser assume that end empty tag is closing last open tag. If the
doc is sintatically correct (i.e. tags, parenteses, curly braces...
matching) then the parser knows what tag is closing each one. If some
start or end tag is missing, then you obtain error. Is that an advantage
of full empty tags over short forms as </>? Again no.
I repeat again that i discussed a bit this in
[http://canonicalscience.blogspot.com/2006/04/canonml-markup-language-beyond-tex-xml.html]
The doc is a bit outdated by recent improvements but there I review Paul
Prescod arguments in pro of full end tags and that part is still valid. He
compared XML syntax and S-expressions, but the same apply to the special
XML syntax discussed here because </> plays the role of the ")" of LISP.
Well he omitted a </footnote> and proved that XML parser does not need
parse the entire doc. Three comments:
- He used an XML oriented example. If missing the </para> after the
</footnote> instead then the XML parser needs to parse the entire doc
(except the root) before finding the error. Therefore the advantage of the
full end tag is lost.
- Contrary to XML, it is trivial to run a pre-parsing step where verified
syntaxis correctness, simply counting number of "{" and "}" (TeX, C...) or
number of "(" and ")" (LISP, Scheme...) or "[" and "]" (CanonML) in the
full doc.
- It is also simpler typping [] and next type the content inside than
<my-tag></my-tag>. By my own experience the number of errors typping XML
is more than double. This can be illustrated as,
Case "[]": Error posible? omision of some tag.
Case <tag></tag>: Errors possible? i) omision of some tag ii) incorrect
writting of tag, e.g. <tag><tag> , <tag></tah>, <tag><7tag>...
This is also noted in
[http://www-128.ibm.com/developerworks/xml/library/x-syntax.html]
<blockquote>
The extra typing required to open and close tags and escape special
characters not only wastes time, but introduces more possibility for
error.
</blockquote>
I would also remark that there are ways to improve readability of () {} []
cases over the XML syntax and this also applies to the <tag></> case.
Therefore i summarize.
Advantages of XML standard syntax:
1) Better readability in some special cases
2) In some special cases, the parser can find errors without parsing the
full document. This is real advantage only for large documents iff the
error is at the very beggining. One can assume 1/4 errors in 4th (last)
part of docs and 1/4 in 1st part. Therefore, for each case of XML is real
advantage, you could find case where is not.
Disadvantages of XML standard syntax:
1) It is easy to generate more errors.
2) For avoiding above point you need special editor.
3) It is more verbose and less efficient for parsing.
4) Cannot deal with all cases, as those addressed by ConciseXML (and others).
5) The (small) advantage of full end tags is lost when generalizing the
system to non-hierarchies, one of non-soved problems in XML. For instance
GODDAG approach
<a/ ... <b/ ... /a> ... /b>
Advantages of special syntaxes:
I mean, SXML (), CanonML [], ConciseXML <tag></>, ..., XUL-C...
1) Very good readability, specially with large naming tags and or
namespaces or when markup length is several order of magnitude that of
content.
In my experience, XSLT become more readable when omiting full end tags and
code is indented as shown here. Others thing the same:
<blockquote>
XSLT is often considered to be too verbose. As stylesheet code grows, it
tends to be unreadable.
</blockquote>
[http://www.xml.com/lpt/a/1226]
2) Lightweight parsers, specially with SXML and CanonML.
3) Less verbosity. And this _is_ a point with very large datuments,
therein some guides in the famous element vs. attribute dilema recommend
usage of attributes when size is an issue.
4) Do not need for special options as <tag></tag> vs. <tag/>. Therefore
the small advantage of simple XML approaches is lost with special
syntaxes.
Disadvantages of special syntaxes:
1) In some cases it is more difficult to find sintax errors.
2) In some cases and for some pople the XML end tags increases readability.
3) If end tags are left as option, then parsers become more complex than
when parsing only XML.
5) Cannot deal with non-hierarchies in a direct way.
Each one can take a decision by her/himself. In some cases XML is good in
others you need alternative sintaxes or the </> option... There is not
absolute answer even if some XML folks desire one.
Juan R.
Center for CANONICAL |SCIENCE)
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]