OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [off-topic] xtext -- encoding declarations for text

[ Lists Home | Date Index | Thread Index ]

From: "John Cowan" <jcowan@reutershealth.com>
> Rick Jelliffe scripsit:
>
> > Comments welcome.
>
> The following regex (to be interpreted as referring to bytes, not
characters)
> should reliably detect the presence of an xtext declaration.
>
> \A(\0*.){1,4}\0*(x\0*t\0*e\0*x\0*t\0* |\xA7\xA3\x85\xA7\xA3\x40)
>
> \A means the beginning of the string, \0* skips any number of null bytes
> introduced by UTF-16 or UTF-32 encodings, and the \xA7...\x40 is "xtext "
> in the common subset of EBCDIC.

No space at the end of ascii 'xtext'. It's not just that null bytes are
noise, but that the characters may be re-ordered along big/little-endian
lines. How is the requirement enforced that the 1-4 leading characters not
contain a-z etc.?

It would be interesting to see a regex that actually solved this problem,
including the optional bom, but I doubt I would want to use it. ;-}

Bob





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS