OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Generating a DTD from XML files?

At 7:21 AM -0700 2001-05-16, Rob Lawson wrote:
>Does anyone know a utility or package to generate a basic 
>DTD from XML files?
>Second question, does anyone have a link to useful XML FAQs, 
>so I don't ask anymore possibly silly questions?
>Many thanks for any help,
>Robert G. Lawson (rob.lawson@ktiworld.com)
>KPM Consultant
>Knowledge Technologies International Ltd.
>Phone: +44 (0) 7866 610409
>Fax: +44 (0) 7970 030914

There was a good paper at SGML '95 on this.  See "Creating DTDs 
via the GB-Engine and Fred" by Keith E. Shafer at:


See especially sections 4, "Automatic DTD Creation Process" 
and 5, "Reductions".  It should help a lot. 

I'll include an Awk program that I use as the first step when 
creating a DTD from a collection of tagged documents.  It might 
help you get started.  

It reads the ESIS output from SGMLS and writes out the full 
"path" for each element like this:

doc (
doc chapter (
doc chapter section (
doc chapter section para (
doc chapter section para )
doc chapter section para xref (
doc chapter section para xref )
doc chapter section para )

The same approach could be used with SAX events.  

You can then write some other utilities that use this output to 
count the various nestings and have a better chance of getting the 
cardinality contstraints a little tighter, instead of just using 
* or + for each element.  At the least, it makes it easy to build 
loose content models such as (X | Y | Z)* in order to get started.  

/s/ Ernest G. Allen


##  GI_path.awk -- accepts ESIS input, writes the full path 
#   from the root element to each element start and end tag.
#   Uses "(" and ")" at the end of each line of output to 
#   indicate that the last GI on the line is a start tag or 
#   end tag, respectively.
#   by Ernest G. Allen, 1995-2001
#   No copyrights held, placed into the Public Domain.

/^\(/ { GI = $1; GI = substr(GI, 2); push(GI); next; }

/^\)/ { pop(); }

/^\?/ { print; }

function push(s) {
    stack[stack_ptr] = GI;
    print "(";

function pop() {
    print ")";

function print_stack() {
    for (i=0; i<=stack_ptr; i++) {
        printf("%s ", stack[i]);