Re: [xml-dev] Do you enjoy neighborhoods where every house looks the sam

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Do you enjoy neighborhoods where every house looks the same?

From: "Jeremy H. Griffith" <jeremy@omsys.com>
To: xml-dev@lists.xml.org
Date: Wed, 28 Aug 2013 13:41:58 -0500

On Wed, 28 Aug 2013 10:23:59 -0700, Lauren Wood <lauren@textuality.com> wrote:

>The hard part with these fixes is knowing when to stop. The law of 
>diminishing returns kicks in fairly quickly on error conditions, 

Yes, it does.  But not as quickly as the Draconian
error handling that is the practice for XML, as
opposed to MicroXML:

https://dvcs.w3.org/hg/microxml/raw-file/tip/spec/microxml.html#characters

4.2 Parser Conformance
 ...
A MicroXML parser MAY perform error correction, by providing 
an abstract data model even for sequences of bytes that are 
not conforming MicroXML documents. It MUST, however, still 
comply with the requirement of the first paragraph to report 
that the sequence of bytes is not a conforming MicroXML 
document.

We find that the ability to report and continue, 
rather than fall over dead at the first error,
saves us a lot of time.  When we get all the
errors reported on the first pass of a new doc, 
we can correct them all at once, instead of
one at a time with reprocessing in between.

>especially when the schema isn't constrained. For example, it's much 
>easier to correctly correct the missing end tags when the schema is 
>constrained (e.g., you at least know which elements are meant to be 
>empty, and which not). 

We have two layers here.  The parser just creates
a data model, so it is concerned only with the
text being well-formed MicroXML.  Then the processor
sees whether it is also sensible.  

If for example the slash is omitted at the end of 
an empty-element tag, the parser would treat it 
as a start tag, and provide an end tag before the 
next end tag it saw.  That would put the intervening 
text inside the empty element, where it would probably 
not be sensible.

But the processor, which knows what elements should
be empty based on their properties, would complain too.
With the well-formedness error from the XML parser,
immediately followed by the "Non-text element <data> 
has text:" complaint from the processor, the writer
has a very good idea of what happened.

>In my experience if your parser makes the wrong 
>choice and therefore 'corrects' the wrong thing, or corrects it in the 
>wrong way, the resulting mess can be difficult to fix properly. Of 
>course, depending on your downstream processing, that may or may not matter.

If you get an error from parser or processor, it
is best to act on it immediately and fix the source
of the problem.  Since we are talking about processing
times that are typically seconds, not even minutes,
what happened downstream doesn't really matter.  For
example, the module that reads the MicroXML produces
a parse tree as a file that the next module, that
writes the HTML or Word file, digests.  If the XML
reader reports a problem, you don't even bother looking
at the output the second module made five seconds later.

And the error report is hard to overlook; we put
it up in your editor right in front of you at the 
end of processing.  With source line numbers.  ;-)

-- Jeremy H. Griffith <jeremy@omsys.com>
   DITA2Go site:  http://www.dita2go.com/

References:
- Do you enjoy neighborhoods where every house looks the same?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Do you enjoy neighborhoods where every house looks the same?
  - From: "Jeremy H. Griffith" <jeremy@omsys.com>
- RE: [xml-dev] Do you enjoy neighborhoods where every house looksthe same?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Do you enjoy neighborhoods where every house looks the same?
  - From: "Jeremy H. Griffith" <jeremy@omsys.com>
- Re: [xml-dev] Do you enjoy neighborhoods where every house looksthe same?
  - From: Lauren Wood <lauren@textuality.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]