[
Lists Home |
Date Index |
Thread Index
]
- From: Len Bullard <cbullard@hiwaay.net>
- To: "Gorden-Ozgul, Patricia E" <gorden@bnl.gov>
- Date: Tue, 09 Nov 1999 20:06:28 -0600
Gorden-Ozgul, Patricia E wrote:
>
> This may seem like a dumb question, but I am new to this 'document
> processing' field.
Len writes to Warren?
> . How should an XML DTD designer signify that this DTD is a
> . conformant subset or a variant of a another DTD? Are
> . these one, two, or three namespaces IF a namespace identifier
> . resolves to a schema? This matters if the FPI is ROA for the
> . DTD and a label for the namespace identifier.
Is the FormalPublicIdentifier the legal name of the namespace?
Outermost parentheses in the oldTongue? ROA: record of
authority. dominant namespace for aggregate. If in contract,
I must cite the record of authority for the defined item,
I require it to be a singleton. No Bifurcation at Root.
Sorry, mad'm, if the thread avoided your question.
The crazies say, "Welcome!"
XML: Eight Noble Concepts (it's all about names)
1. Markup: Trees of names.
2. Hyperness: Locations are points in space or time.
3. Identity: Locations bound to names enable persistence and
uniqueness.
4. Systems: Identities bound into a namespace.
5. Schemas: Systems to bind identities
6. Mapping: schemas whose trees are related by n-dimensional bindings
7. N-dimensional binding: a named vector of the schemas that produces
an intersection space
8. Facts: the named values within the intersections
I won't go into that. :-)
Your problem:
> I have an industry-provided bibliographic DTD to which I need to apply data
> from Word documents. Other than a manual solution (clerical cut/paste from
> Word doc to DTD doc) how would one create the DTD ASCII file for the data
> exchange.
You want to map the names
o in a source namespace (RTF)
o to a target namespace (DTD).
The source is the collection of instances, or documentation of instances
to be transformed to the target namespace. Create a table where
the names in the source definition are mapped to the target.
DTD = Target namespace. You are transforming it to this target. The
DTD
describes a tree of names. Its just a tree of named things.
Look at the TreeView object you use every day in many
applications, and that is a good geometric model for what
XML elements/attributes (trees of names) model.
DOC = source collection (word does not have a DTD. You
must use an export format and figure out which one you
want to work with. some choices here are the HTML saveAs or the RTF
(RichTextFile - doc's native format for all practical purposes).
o If you do not have it, download the RTF spec
The export format with the most information is also the one that is
hardest
to use: RTF. The RTF namespace is complex, but it is documented and
reasonably regular. The problem is working out what in that namespace
matches the
names in the DTD.
Eg, how do you get <par to become <p>? If you
don't have an RTF book, use the rtf saveAs, open that file in ASCII,
then use the replace command of an editor like Professional File Editor,
PFE, to substiture \n and the <par with \n<par. It matches strings
inside braces, so, use a substition of \n and the brace to separate
sections.
Do that with all of the names provided in the RTF namespace. Looking
them
up in the RTF spec, you will find these are the attributes that are
setting things like
bold, font name, and so on. You use that information to figure out what
the format identifies in the namespace of the TARGET DTD. Sorry, but
because of the way this works, and because the SaveAs RTF feature
produces such badly formattted data for reuse, you get to work awhile
at this because you are actually tagging someone elses style and
using their choices to infer names in the target. The bad news,
this is an inconsistent source of sources; the good news, for practical
purposes, people are reasonably consistent about how they do this.
So.. slug work for the conversionHead.
Or find some shareware or freeware that preformats RTF for conversion.
My guess is, XSLT can be used to build this now. :-)
The easy manual way is to use the SaveAsHTML and map the HTML to the
target DTD directly. Why? Depending on the consistency and
application of the style, the productions are regular enough
in the HTML source to capture most of the important information
for a downtranslation with restore. By restore, the application
of the target system restores the lost information. Up translation
is usually a little lossy, but not excessively so in this case.
The truth is, if the source is doc, most of the important information
is in the headers. Get the text nodes out of the formatting
goop, and you will have most of what you need. What you get
will look a lot like... SaveAsHTML. Bare, but a simple enough
subset of HTML that mapping back up is easy because the productions
are regular.
The analysis of the RTF won't yield enough information to make that
mapping a lot more useful if page fidelity is not an issue. If page
fidelity is still an issue, you have to analyze the RTF to get a
closeEnoughForLegalWord fidelity. Otherwise, just map the easily
recognized structures (what SaveASHTML actually produces) and
clean up a bit afterwards. Map the HTML structures/names, which you
know, to the matching structures/names (if any) of the target DTD.
If they don't match lexically, (p Is p), you work out the match
semantically (P isA p). For a biblio DTD, that should be really
straightforward. You may have to track down some authors to get
the information Word puts in those nice sets of global doc
attributes that no one uses.
I hope this helps. If not, ask more questions. This list still
answers questions best it can.
peace,
len
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|