Re: [xml-dev] Schemaless XML?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Thomas Passin <list1@tompassin.net>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Tue, 11 Oct 2016 09:23:09 -0400

What gives you the idea that knowing a schema allows you to understand and "process" the data contained in a document?

In every toy example you have given for these kind of questions, you have used names (element or attribute) that suggest something to humans. I think you are fooling yourself that automated machine processing would know what to do with them just because *you* think you know e.g., what "temperature" or "type='graduation'" means.

TomP

On 10/11/2016 8:53 AM, Costello, Roger L. wrote:

Hi Folks,

Scenario: You are building an application that receives XML documents
from various sources. The kinds of data in the XML documents are varied.
The XML documents themselves are structured in various ways. Over time,
new XML documents are received, containing new, unanticipated kinds of data.

How will your application handle such diversity?

One approach is to create an XML Schema that models all the various
kinds of XML documents that will be received. When the application needs
to process new XML documents, the XSD is updated. The disadvantage of
this approach is that the processing of the new XML documents will be
delayed as the XSD is updated and as the application is updated to
handle the new data. The advantage of this approach is that the
application knows exactly what the data is and can process it efficiently.

An alternate approach is for the application to go “schemaless.” The
application performs machine learning on the data it receives. I’m not
sure what “machine learning on the data” means. I suspect that it means
that an internal schema (in some form or another) is dynamically
generated. Do you agree? If so, then the approach is not actually
schemaless; rather, there is a dynamically generated schema. Do you
agree? Is machine learning technology sufficiently advanced that it can
classify and understand the data to the same degree as a carefully
crafted schema and carefully crafted application code? Have you gone
schemaless?

/Roger

Follow-Ups:
- Re: [xml-dev] Schemaless XML?
  - From: Eliot Kimber <ekimber@contrext.com>

References:
- Schemaless XML?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]