OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Schemaless XML?

Hi Folks,

Scenario: You are building an application that receives XML documents from various sources. The kinds of data in the XML documents are varied. The XML documents themselves are structured in various ways. Over time, new XML documents are received, containing new, unanticipated kinds of data.

How will your application handle such diversity?

One approach is to create an XML Schema that models all the various kinds of XML documents that will be received. When the application needs to process new XML documents, the XSD is updated. The disadvantage of this approach is that the processing of the new XML documents will be delayed as the XSD is updated and as the application is updated to handle the new data. The advantage of this approach is that the application knows exactly what the data is and can process it efficiently.

An alternate approach is for the application to go “schemaless.” The application performs machine learning on the data it receives. I’m not sure what “machine learning on the data” means. I suspect that it means that an internal schema (in some form or another) is dynamically generated. Do you agree? If so, then the approach is not actually schemaless; rather, there is a dynamically generated schema. Do you agree? Is machine learning technology sufficiently advanced that it can classify and understand the data to the same degree as a carefully crafted schema and carefully crafted application code? Have you gone schemaless?



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS