OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Schemaless XML?

While I do agree with Debbie's use case and the value of grammars in this
case as an analytic tool, just to be pedantic: in this case they are being
used as a way to capture a particular type of analysis and that same
analysis could be captured in other ways that might be just as useful for
communicating findings to clients or for subsequently evaluating the
results (e.g., how many of a particular combination pattern occurs).

That is, it's not the use of grammars here that's interesting but the
analysis those grammars reflect and the knowledge gained from the
analysis. That analysis might be more effectively presented or captured as
a spreadsheet or, dare I say it, a set of RDF triples in order to then try
to do some machine learning analysis, or any number of other forms that
provide a good match to the tools at hand and the skills, expertise, and
expectations of the people involved.


Eliot Kimber

On 10/12/16, 11:10 AM, "dal" <dalapeyre@mulberrytech.com> wrote:

>> On Oct 12, 2016, at 4:23 AM, Michael Kay <mike@saxonica.com> wrote:
>> In trying to form an understanding of a large mass of undocumented XML
>>content, I have sometimes found it useful to derive a schema. It's not
>>that the schema contains any information that wasn't in the data; it's
>>just that it provides a distillation that may be more tractable: it
>>tells you what elements are present and how they relate.
>I agree completely. Mulberry does the same. A client sends
>us a batch of XML, maybe with a schema, maybe not. Even
>if it says it uses a schema, we derive one to see what has
>actually been done.
>Is this a useful schema for authoring or any of the purposes
>Eliot mentioned? Of course not. But it is very useful to
>determine and communicate what is in this mess-of-XML for real.
>For example, if you are doing an XML-to-XML conversion, it
>could take a very long time to translate ALL of one schema
>to ALL of another, if it is even possible. But if you can say
>that this entire branch has never been used (or has been used
>3 times in 250,000 document), things get simpler.
>Or during a data analysis, if the client says “oh, that never
>happens”, it is useful to be able to say, “well, there are 150
>of them in your current 4 million document database”, are
>these real or errors?
>Or just to point out that there are 2 very reasonable ways to
>tag structure X, and they seem to have used both. Historical
>accident? Mistake? Two varying situations that might be clarified?
>Sorry to state the obvious, but schemas as so very useful.
>Not essential, just very very useful.
>Deborah A Lapeyre              mailto:dalapeyre@mulberrytech.com
>Mulberry Technologies, Inc.      http://www.mulberrytech.com
>17 West Jefferson Street         Phone: 301-315-9631 (USA)
>Suite 207                        Fax:   301-315-8385
>Rockville, MD 20850
>Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS