XML, Topic Maps and Everything (was Re: [xml-dev] Relationships [was RE:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML, Topic Maps and Everything (was Re: [xml-dev] Relationships [was RE: James Clark: XML versus the Web]

From: "Andrew S. Townley" <ast@atownley.org>
To: "Costello, Roger L." <costello@mitre.org>
Date: Wed, 8 Dec 2010 02:47:48 +0000

On 6 Dec 2010, at 2:17 PM, Costello, Roger L. wrote:

> Andrew Townley wrote:
>
>> ...but I'm happy to try and help explain where
>> I see some potential benefits [of using Topic Map
>> concepts to express relationships in XML] if anyone's
>> interested in hearing it.
>
> Yes, I am definitely interested.
>
> /Roger

Hi Roger,

Apologies for the delay in getting back to you about this. It's been a crazy week.

Here are some of the initial thoughts I had on how to apply TMRM fundamentals to the problems being discussed on the list about where to go with XML. I don't claim these are fully baked and there's likely holes large enough to drive a fleet of trucks through, but I did want to try and explain some of the things I see based on my experience over the last few years designing, implementing and using an information management system based on the TMRM. Part of my thinking isn't "pure" TMRM and has also been influenced by some work by the authors of the specification in both implementing and explaining it to others[1], so that puts a particular slant on things as well. Any errors in expressing the TMRM or Topic Maps are purely my own and should not be considered flaws of the specs themselves. ;)

The TMRM in 10 Minutes or Less

The Topic Maps Reference Model (TMRM) is an abstract model in which the rest of the current Topic Maps specifications can be described. It came after the original Topic Maps specifications (originally developed in HyTime), and it describes a self-recursive mechanism for representing directed graphs using both late binding and lazy resolution.

The fundamental construct of the TMRM is the Subject Proxy. Subject Proxies are representations of anything you wish to talk about (the Subject), expressing a relationship similar to Plato's shadows on the wall between the subject and the proxy.

Subject Proxies represent vertices and have zero or more Proxy Properties defining directed edges of the graph representing a particular Subject Map. Each Proxy Property has a label (key) and a value which is one of nil, a reference to another Subject Proxy or a literal. I generally use the term "links" to describe the properties referencing other proxies to differentiate them from literal property values. Each label is a symbol referencing another Subject Proxy, and all you need to do to declare a Subject Proxy is reference it.

Subject Proxies have two pre-defined relationships: superclass-subclass and class-instance that must be interpreted relative to the particular Subject Map. The superclass-subclass relationship is reflexive and transitive and circular relationships are possible in any given map. The class-instance relationship is non-reflexive, and where a proxy is an instance of another, it is also an instance of any superclass of that proxy. The TMRM does not specify a representation for either of these relationships.

As a Subject Map is a directed graph, the TMRM also defines a basic path language for navigating and extracting information from a Subject Map. The fundamental operations on the graph are (using the notation in Annex C of TMRM v7):

- keys ( p \ ) defines a postfix operator to return all of the property labels defined for p

- remote keys ( p / ) defines a postfix operator to return all of the property labels whose value is the proxy p

- values ( p -> k) defines a postfix operator to return all of the values of the particular property label k for the proxy p

- proxies ( v <- k) defines a postfix operator to return all of the proxies having the value v for the specified property label k. By definition, this result is also a Subject Map.

Subject Maps may be merged such that all proxies found to be about the same subject may be combined and referenced as a single proxy label. The operation is defined in terms of constraints applied to two proxies to generate a third proxy and therefore may only apply to some proxies and properties in a given subject map.

Merge operations and identity constraints are not defined by the specification, however they are defined in terms of a Subject Map legend that contains a finite set of constraints that represent a particular way of interpreting the information in the map. Legends may be applied to more than one map, and more than one legend can be applied to a single map to produce the desired views of the information.

Topic Maps Identity Semantics

The Topic Maps specification defines two very precise ways to indicate the identity of a subject for any given proxy. Either the subject may be indicated by a resource which is primarily about the subject, or the subject may be an addressable resource. Each relationship uses particular constructs to avoid the ambiguity present in other systems in identifying the subject a proxy represents.

The TMRM in Practice

Since the TMRM is abstract, you need to define specific syntax representations of it in order to do anything useful. As you are dealing with the most fundamental conceptual way to represent information through proxies and properties, key-value pairs or EAV/OAV models, you are only limited by your imagination in terms of what this representation can be. You can also use the same mechanism to represent information about anything of interest to you, and this fundamental model can be expanded to more efficient and expressive operations grounded in the fundamentals of proxies, the path language and the particular legend you choose (this is the "Everything" part of the subject with apologies to Douglas Adams).

You can quite easily define formalizations (legends) to represent proxies using programming language constructs from structures to classes and objects to hashes and maps. What matters most is what labels you define as reserved for any particular application and the semantics by which properties using those labels are interpreted by your particular application.

For example, you could represent the TMRM specification itself very simply in JSON as:

tmrmspec = {
isa: "document",
reifies: "http://www.isotopicmaps.org/TMRM/TMRM-7.0/tmrm7.pdf";,
title: "Topic Maps Reference Model, 13250-5",
name: "The Topic Maps Reference Model, version 7",
authors: {
isa: "part-whole",
parts: {
patrick: {
name: "Patrick Durusau",
isa: "person",
subject-indicator: "http://tm.durusau.net";, ... },
steve: { name: "Steve Newcomb", ... },
robert: { name: "Robert Barta", ... }
}, ...
};

I've chosen to explicitly make the authorship relation a proxy instance, but your legend could say that it was simply an array of anonymous proxy objects. In this example the Subject Map is defined in terms of the JSON object itself, but you could relax the JSON referential constraints to easily treat your entire program as a subject map.

You can also go backwards, so that given any JSON or JavaScript object, you can treat it as a set of proxies and property values. Of course, the above example could use string keys to reference any existing ontology for representing known information like Dublin Core. The syntax of JSON requires the labels be represented as strings if they're QNames, but that doesn't change the underlying model.

For XML, you can even treat individual elements as proxies too:

might conceptually represent the following proxy (as a Ruby Hash):

example2 = {
:isa => :Book,
:isbn => "...",
:title => "...",
"myns:animal" => "dolphin"
}

Or you could get really crazy and define a standard XML Element to Proxy mapping (based on things we already know):

<xyzzy:Book isbn="12345">
I think this is a really cool book!
</xyzzy:Book>

As JSON again:

example3 = {
isa: "xyzzy:Book",
isbn: "...",
text: "I think..."
}

Or you can define a legend that maps particular elements to proxy types and properties using XPath or Schematron, or whatever.

Similar to the call for "just show me the nodes", I think the TMRM would provide a consistent model where you could manipulate both the structure of the encoding (or the representation) as well as the information being encoded, so that you can represent what you like in the format of your choice, but manipulate the information and the serialization using the same fundamental constructs.

Admittedly, the examples above are contrived, and, as I said, I'm not saying this is a finished proposal, so there's plenty left to argue about! :)

One of the key things I think XML 2.0 should do is be less worried about the syntax of a representation and be more focused on allowing you to co-mingle any structured domain representations of information and manipulate it using a consistent model. If you need a binary format, fine, then we can easily define a legend for ASN.1 representations or your own proprietary one. If you need to serialize/deserialize into programming language constructs, that's no problem either. There's a formalized legend for languages X, Y, Z and even Q. Writing a new language? No problem, here's how you define a legend so it all plays nicely.

I also think it's possible to achieve the above with a very limited set of pre-defined proxy types and labels. Let the applications worry about how to interpret higher-level things the same as now, but at least give people a consistent way to work with information, slice and dice it, transform it and materialize it on the other end.

Expressing most of what XML technologies already do in terms of a standard, unifying model would allow those technologies to be applied to any structured data representation. To me, that would be a real win for XML 2.0 and, more importantly, a real win for the application developers in the trenches every day. The vendors might not be so keen about it, but I think they'll get over that.

Having worked with a TMRM based information management system and done a lot of structured format transformations in and out of it (CSV, XML, database tables, custom formats, JSON, Ruby objects), I believe that what I'm describing is certainly possible. It just needs to be seen as a relevant goal for the community.

Hopefully, the above is a reasonable enough balance between detail and concepts to allow you to understand where I'm coming from and why I think it's important.

I look forward to your comments, flames and feedback. :)

Cheers,

ast

[1] http://www.acs.org.au/documents/public/crpit/CRPITV43Barta.pdf
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org

Follow-Ups:
- RE: Single, Simple, Powerful Mechanism for Expressing XMLRelationships
  - From: "Costello, Roger L." <costello@mitre.org>

References:
- RE: James Clark: XML versus the Web
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] RE: James Clark: XML versus the Web
  - From: Michael Kay <mike@saxonica.com>
- RE: [xml-dev] RE: James Clark: XML versus the Web
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Relationships [was RE: James Clark: XML versus theWeb]
  - From: Michael Kay <mike@saxonica.com>
- RE: [xml-dev] Relationships [was RE: James Clark: XML versus theWeb]
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Relationships [was RE: James Clark: XML versus the Web]
  - From: "Andrew S. Townley" <ast@atownley.org>
- RE: [xml-dev] Relationships [was RE: James Clark: XML versus theWeb]
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]