Lists Home |
Date Index |
> From: Mike Champion
> I'd be interested in hearing others' reactions to this IBM
> DeveloperWorks article and about actual experiences in the field.
Deja vu for me. About two years ago, we built a text-to-XML converter for a
medical transcription company that uses implicit markup and some (very cool
IMHO) lexicon-driven technology to convert around 25,000 documents per day
from text in a proprietary word processor format (DOS Medrite, for those who
may be familiar) to XML and back. The XML->text portion includes fancy
formatting that used to waste lots or transcriptionist time, so this was a
big operational win for the transcription provider.
The current incaration of the parser is Java with a SAX back-end (it
implements XMLReader), runs very fast, and generates fewer than 10
exceptions (i.e., documents with indecipherable markup) on an average day.
As an odd parallel to the developerWorks article, the original
proof-of-concept parser (implemented around 3 years ago...) was in Python.