[
Lists Home |
Date Index |
Thread Index
]
From: "Richard Tobin" <richard@cogsci.ed.ac.uk>
> > * else look for EBCDIC/ASCII signature (use string
"[^a-zA-Z01-9]{1-4}xtext\b"
> > rather than "<?xml\b"
>
> For XML, it's only necessary to look at the first four bytes to cover
> Unicode encodings, ascii supersets and ebcdic. In the xtext case, you
> will have to compare a string at several different positions or apply
> a regular expression. Certainly doable, but certainly more complex too!
1. Based on the zero patterns in the first four octets, fill a 10-octet
array with candidate characters. (This may require reading up to 40 octets.)
2. Scan zero-based array locations 3 through 6 inclusive for 'e' in both
ASCII and EBCDIC. If neither is found, not an xtext. If EBCDIC is found, and
first four octets contained a zero byte, not an xtext. Otherwise, using the
appropriate charset verify the 'e' appears in the correct context.
Bob
|