xml-dev - exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis-

exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis-

[ Lists Home | Date Index | Thread Index ]

To: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis- parse and regexp]
From: Aleksander Slominski <aslom@cs.indiana.edu>
Date: Tue, 01 Apr 2003 14:13:04 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <r01050400-1024-F4680634645111D7B1500003937A08C2@[192.168.124.11]>
References: <r01050400-1024-F4680634645111D7B1500003937A08C2@[192.168.124.11]>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2) Gecko/20030210

Simon St.Laurent wrote:

>sean.mcgrath@propylon.com (Sean McGrath) writes:
>  
>
>>Correctness or input fidelity - pick one - you cannot have both.
>>    
>>
>
>Of course you can have both, if you haven't been lulled to sleep by
>chants of "Infoset, Infoset" or "XPath is the data model."  Heck, you
>can even have both and deal with the PSVI, if you're that much of a
>masochist.
>
>When XML first appeared, it seemed important that parsers be small and
>easy to write.  XML 1.0 gave parser writers escape hatches on a number
>of things, and developers frequently wrote to that minimum.  XML 1.0
>locked some functionality in the parser, and developers never went to
>the effort of exposing it.
>
>Since then, we've built huge edificies of code on top of these parsers,
>but I haven't seen anyone go back to retrieve what was thrown away in
>the first round.  The Desperate Perl Hacker has been quite thoroughly
>betrayed, first by XML 1.0, then by namespaces, then by a variety of
>other devices that further separated the text from its supposed meaning.
>
>There's nothing inherent in XML or in the languages used to process XML
>that requires this division.  Java is plenty capable of providing text
>renditions to accompany events or objects, if anyone thinks it valuable.
>Perl, Python, C# - heck, I think I could do this in Pascal or AppleSoft
>BASIC if I really had to do it.  The problem isn't the code - it's the
>will.  It certainly takes extra effort.
>
>I've been poking at this for years now, stuffing bits of code between
>books and other projects.  I wrote up pretty much my whole process at
>http://lists.xml.org/archives/xml-dev/200303/msg00568.html, and I'm
>finally reaching the point where a framework is emerging that supports
>text, events, and objects.  
>
i am not sure what are exactly your requirements but XmlPull API 
provides optional freature to enable exact roundtrip that i implemented 
in MXP1. this can be used as an efficient lower level layer on top of 
which higher layers of events, trees or whatever can be built.

>When I'm done, you'll be able to collect a series of parsing events into
>an object tree, play with the text, re-serialize that into a tree, and
>drop that tree into events.  You'll be able to make changes to the
>events or the object tree and have your changes made with minimal impact
>on the original surrounding text - no need to obliterate all your entity
>references to make changes in a document.
>
>I'm not claiming that this framework will be the most efficient way to
>process XML, or that it will solve all problems.  There's a huge amount
>of work yet to do (an XPath implementation is crucial, and I've not yet
>started that), and the primary interface for it is still through javadoc
>and code.  
>
>I intend, however, to demonstrate that "you can have both", and
>hopefully other programmers will pick up on that and let more of us have
>the benefits of both.
>
and yes you can have both as by default this feature is turned off to 
keep compatibility with XML as XML is better dealt on infoset level for 
most of applications but when enabled you will  not miss anything from 
original XML input (if you *really* need to do roundtripping ...)

thanks,

alek

-- 
"Mr. Pauli, we in the audience are all agreed that your theory is crazy. 
What divides us is whether it is crazy enough to be true." Niels H. D. Bohr

Follow-Ups:
- Re: [xml-dev] exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp]
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

References:
- Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

Prev by Date: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Next by Date: Re: [xml-dev] exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp]
Previous by thread: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Next by thread: Re: [xml-dev] exact input reporting for XML [Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp]
Index(es):
- Date
- Thread