Lists Home |
Date Index |
Yes. The problem is familiar in systems that pay
a lot of attention to the metadata but for the meat
of the record have a few (usually two, request
and response) text boxes.
XMLers focus on the tags and the structure. Everyone
else is focused on the text nodes and the CDATA.
Schemas scale as well as the domain has bounds.
Content assembly systems (rules for merging
boilerplate) are there for the overlaps. If
there are no overlaps, they aren't applicable.
The trick here isn't scalability but decidability.
From: Cox, Bruce [mailto:Bruce.Cox@USPTO.GOV]
I think CAM is not useful for me in that patents are not assembled from
boilerplate. Each one is unique. Even in a large organization that
produces many patents, only the most trivial of content is reused from
one patent to the next (company name, attorney name). I could be
mistaken, but I did not see a really rich content validation mechanism
in CAM, but a framework within which, in my case, there would still be
lots of custom work to do. Xpath is cool for validating across
elements, but most of what I want to do is within a single element in a
single document (even though there are six to eight thousand per week).