[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Auto schema/xpath generation from doc collection
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: xml-dev@lists.xml.org
- Date: Wed, 20 May 2009 09:47:55 -0400
At 2009-05-20 06:20 -0700, Paul M wrote:
>Say one has a collection of docs:
>
>doc1
><para><sentence><bold>That</bold></sentence></para>
>doc2
>
><para><sentence><bold>That</bold></sentence></para>
>doc3
>
><paragraph><strong>That</strong></paragraph>
>....doc20000 (many docs)
>
>I am looking for a solution(application, ideas, designs) that would return:
>1. A listing of xpaths to elements
>para
>para/sentence
>paragraph/strong
I formalized that by creating an XML vocabulary for what I've termed
"An XPath file". Such an instance turns out to be very useful in
specifying the required behaviours in a stylesheet
specification. The first support of XPath files I released converted
a W3C Schema for UBL into an XPath file, but this proves unwieldy for
document models such as UBL Order with 880,000 elements and
attributes, not including recursion, for a single instance.
To make things more manageable, but also more fragile, I created a
stylesheet to read an XML document and enumerate the elements and
attributes found there-in:
http://www.CraneSoftwrights.com/resources/ubl/index.htm#xpathins
I say "fragile" because changing an instance happens more frequently
than changing the model. The UBL document model hasn't changed in
over two years, while adding a single element to an instance will
change the enumeration of subsequent elements in that instance's XPath file.
>OR
>2. A schema from the docs in a collection.
>OR
>3. Other ideas?
For what? You've described a facility you need ... is that the
entire problem or are you using this in a particular context that
might spark other ideas?
As I said, the context for me for creating XPath files was/is for the
specification of stylesheet behaviours: one prints off a blank UN
Layout Key form and manually annotates the form with the reference
numbers enumerating the desired elements and attributes that belong
in each box. That becomes one component of the specification of what
goes where. One of the outputs from an "XPath file" is an XML
instance that instantiates every element and attribute, using the
element and attributes ordinal as its content. Then when you run
that instance through your development stylesheet, the result should
be filled with numbers that match your manually-created specification.
This was presented at XML Europe 2004 but I see that the IDEAlliance
archives cannot be accessed to review a copy of my paper.
I hope this helps.
. . . . . . . . . . . . Ken
--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]