Re: [xml-dev] Dealing with lots of chunks of highly interrelateddata?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Eliot Kimber <ekimber@contrext.com>
To: <xml-dev@lists.xml.org>
Date: Fri, 10 Jun 2016 16:48:36 -0500
To expand on what Steve said: you've basically set up a false dichotomy,
because "vanilla XML" has no (useful) mechanism for representing
relationships among XML elements other than hierarchical ones. [I'm
ignoring the ID/IREF mechanism in XML because (A) it has no built-in
semantics and (B) it's insufficiently complete as an addressing mechanism
to even be the basis for more general link representation. So basically
any addressing mechanism will necessarily be a layer added on top of the
base XML syntax, whether it's XPointer or URIs or something else.]

So any use of XML to represent relationships among things will necessarily
be a semantic application on top of the base XML content. But that is true
of *any* non-trivial XML application.

XML, as a serialization of structured objects, is just a data format for
objects that are themselves projections of more abstract objects, whether
those objects are documents (in the publishing sense) or airplanes, their
parts, the actors that operate on those things, and the events those
actors caused to happen.

That is, any non-trival XML application always becomes a system of
interrelated things. Whether you call it a hyperdocument or a database is
really just a matter of perspective or emphasis: fundamentally it is a set
of things linked together for some reason.

To manage such a data set you need some kind of data management
technology. Whether that's a relational database or a triple store or a
box full of index cards is entirely an implementation decision.

If you need to interchange and interoperate on this data then certainly
trying to use applicable standards makes sense, assuming applicable
standards exist. 

One challenge with standardization in this realm is that beyond the
details of addressing (e.g., URIs and fragment identifiers) and object
representation, everything else becomes very much application specific,
such that the standardization cost is high and the practical value of
standardization is relatively low beyond enabling basic data processing
(e.g., resolving references, correctly recognizing or inferring
relationships, managing the structured objects that are related) as the
scope of the application becomes more general. The actual semantic
processing is going to always require specialized processing and the more
specialized it is the less useful (or even possible) it is to standardize.

If you look at something like RDF or Topic Maps, they let you represent
relationships and you can *state* any set of relationships you like to any
degree of precision you want. But they say nothing about how you
*understand* the statements: understanding is the domain of knowledge
processing, that is, intelligence, artificial or otherwise. Understanding
can be embodied in code (that is, I can write data processing applications
that understand a given set of relationships as applied to a known set of
object types for a given purpose) but that is a far cry from having some
sort of general-purpose system that can take an arbitrary set of generic
relationship instances and tell you something both true and useful about
them.

In the context of the application you described, that sounds like a pretty
work-a-day relational application to which it might be useful to apply
additional taxonomies or ontologies that allow for discovery or inference,
but that is because the scope is pretty narrow and it's therefore
practical to define *agreed upon* data models, relationship types,
taxonomies, and ontologies. Given that, it's then relatively easy to do
data processing with whatever technology makes the most implementation
sense at the moment. Whether that's RDF-and-OWL or Topic Maps or DITA
subject scheme maps or ad-hoc processing using Xquery doesn't really
matter: that's an implementation detail.

If the goal for this particular data set is to enable interchange and
interoperation of the data *and the processing* then somebody will need to
standardize the data model abstractions, the governing taxonomies and
ontologies, the relationship type models, the population constraints, the
business rules, etc. That is, they will need to define the application to
a sufficient degree of detail that it is realistic to expect different
implementations to provide compatible behavior.

To answer your question about validation: XML Schema is not sufficient for
validating this data because it has no mechanism (beyond XSD 1.1
assertions) to validate either the semantics of relationships (e.g., Is
relationship of type X correctly applied to objects of type A and B with
specific property sets K and J?). One can imagine such a language but it
would start to look like a programming language...

The answer to your last question must be "true" simply because XSD is not
capable of validating such as system (if by "validate" we mean check the
semantic, population, and business rule constraints).

Cheers,

Eliot

----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 6/10/16, 3:03 PM, "Steve Newcomb" <srn@coolheads.com> wrote:

>> the relationships between all these chunks of data must be captured
>
>Capture is one thing.  Interchange of information between dissimilar
>systems is another.
>
>Moreover, there are no simple rules for deciding which kind of tool or
>representation is best for a given situation.  Here I'm including
>"moment in history" in the general definition of "situation", where
>"situation" also includes, but is not limited to, all the issues that
>arise from the distinctions that may need to be made between
>interchange, capture, and processing-for-some-purpose.
>
>So, in the general case, these true-or-false questions are all vexed.
>There's another pair of problems, too:
>
>     * Is there some reason to limit all relationship-modeling to RDF?
>More generally, is there *any* relationship-model that doesn't fail
>somehow?
>
>     * Is there some reason to limit all interchange representation to
>XML?  More generally, is there *any* interchange representation that
>doesn't fail somehow?
>
>For good or ill, choices must be made, regardless of the complexity of
>the issues involved in making them.  There's no substitute for real
>engineering.
>
>On 06/10/2016 02:35 PM, Costello, Roger L. wrote:
>> Hi Folks,
>>
>> So, you have lots of chunks of data and the chunks are highly
>>interrelated. Over time more chunks are added and those new chunks are
>>easily connected in with the other chunks, i.e., the system expands over
>>time.
>>
>> Example of highly interrelated chunks of data: You have data about an
>>aircraft in flight: its current location, speed, heading, etc. You have
>>data about the aircraft itself: fuel capacity, max speed, max altitude,
>>cruising speed, date built, model number, etc. You have data from FAA
>>radars tracking the aircraft. You have data about the destination and
>>originating points of the aircraft. You have data about the passengers
>>on the aircraft. And you have many other chunks of relevant data. And,
>>the relationships between all these chunks of data must be captured.
>>
>> [True or False] The only reasonable way to express an expandable system
>>with lots of chunks of highly interrelated data is with RDF, RDF Schema,
>>and/or OWL. It is not practical to express an expandable system with
>>lots of chunks of highly interrelated data using vanilla XML.
>>
>> [True or False] When dealing with an expandable system with lots of
>>chunks of highly interrelated data it's not practical to use XML Schema
>>to validate the data.
>>
>> [True or False] When dealing with an expandable system with lots of
>>chunks of highly interrelated data you must adopt the Semantic Web
>>technologies such as inference engines, triple stores, SPARQL, knowledge
>>bases. And your trading partners will naturally join you in your
>>adoption of the Semantic Web technologies.
>>
>> [True or False] Nobody has ever implemented an expandable system with
>>lots of chunks of highly interrelated data using vanilla XML and using
>>XML Schema for validation. Or, if they did, it was a complete failure.
>>
>> /Roger
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
References:
- Dealing with lots of chunks of highly interrelated data?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Dealing with lots of chunks of highly interrelateddata?
  - From: Steve Newcomb <srn@coolheads.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]