[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] converting 1-20 GB xml to xsd, visualizing onwebpage
- From: "Steven J. DeRose" <sderose@acm.org>
- To: "Farkas, Illes" <illes.farkas@gmail.com>, xml-dev@lists.xml.org
- Date: Mon, 20 Oct 2008 13:56:25 -0400
Title: Re: [xml-dev] converting 1-20 GB xml to xsd,
visualizing o
There are infinitely many schemas that will match any given set
of data, so there is no single schema to extract....
If you just want *some* schema under which the document(s)
is(are) valid, you could just extract all the element types that
occur, say with something like
grep -o '<[-_.:a-zA-Z0-9]* '
documentname.xml | sort | uniq
and then create a declaration for each one using some global
changes in an editor, that allows each one to have unrestricted
content. You'd need to do something similar for attributes, but then
it should all validate.
If you want a little more information so you can build a more
detailed schema, my xmlstats utility (in Perl) at
http://derose.net/steve/utilities/xmlstats has options to tell you
what element types occur within what other ones, and from that you
could derive a more restrictive schema. The most obvious one would be
to declare each element to permit the OR of all the element types that
ever occur in it; that misses useful restrictions, such as for example
that TITLE must occur only once in each DIV, and be the first child
element; but it's better than just ANY for everything.
There used to be some nice utilities for extracting a reasonable
DTD from SGML documents; perhaps someone has one handy for XML?
Steve
At 7:24 AM +0200 10/20/08, Farkas, Illes wrote:
Dear List Members,
Do you happen to know of a linux tool (or
tools) that can extract from a 1-20 GB xml file its schema and
visualize for users similarly to this page: http://psidev.sourceforge.net/mi/rel25/doc/
Thanks in advance,
Illes Farkas, Ph.D.
http://angel.elte.hu/fij
--
Steve DeRose -- http://www.derose.net, email sderose@acm.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]