xml-dev - XML waves (was : Divorcing Data Model and Syntax)

XML waves (was : Divorcing Data Model and Syntax)

[ Lists Home | Date Index | Thread Index ]

To: "Jeff Lowery" <jlowery@scenicsoft.com>
Subject: XML waves (was : Divorcing Data Model and Syntax)
From: "Danny Ayers" <danny666@virgilio.it>
Date: Mon, 14 Oct 2002 12:19:37 +0200
Cc: "XML DEV" <xml-dev@lists.xml.org>
Importance: Normal
In-reply-to: <D60D8F448216D611A21A00508B62B5BC5D9F@exchange-us.scenicsoft.com>

[Jeff Lowery]
>http://aurora.phys.utk.edu/~forrest/papers/fourier/#fourier
>
>Now, if we could just represent an XML document as a wave...


For analysis purposes, that could be pretty trivial - if you view the
characters in text nodes as each representing a unit time delay, and the
depth of nesting as the amplitude. Not sure how you'd handle attribute
data - perhaps in another dimension. What could you do with it? Well, a lot
of the smarter search/index clustering techniques are effectively
signal-based. Doing a (discrete) Fourier transform of a XML wave would
provide a kind of signature, which would be shared by documents that are
structurally similar. There are loads of filtering and cross-correlation
techniques available once you have your wave.

I doubt whether the inverse transform would be all that useful, but : Years
back I had an incredibly dull electronics lecturer. The only time I was
really awake in front of him was when he covered Laplace transforms. He
showed a kilometre of formulae describing the relationship between the input
& output of a circuit. Then he demonstrated the Laplace technique - did a
transform, then a sum, then the inverse transform - exactly the same result.
Gobsmackingly elegant. Perhaps the XML wave + Fourier view has similar
potential.

The signal analysis approach to documents is wide open, but in many/most
minds it's orthogonal to the logical approach to that found in e.g. RDF.
But if you look at how productive simple numeric algorithms like those used
by Google can be for automatically discovering semantic relations (however
weak in Google's case) then IMHO it becomes clear that Semantic Web research
should be delving into these areas. Word/character analysis can be very
useful, and statistical/neural techniques like Bayesian analysis and
Kohonen's SOM but these are usually applied at a low level (in respect of
structure) to documents. Google takes into account the linking, but this
information is still (apparently) handled at quite a trivial level.
So a wave view of XML is a great idea!

Cheers,
Danny.

Follow-Ups:
- Re: [xml-dev] XML waves (was : Divorcing Data Model and Syntax)
  - From: Miles Sabin <miles@milessabin.com>

References:
- RE: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: Jeff Lowery <jlowery@scenicsoft.com>

Prev by Date: RDDL, UDDI, OGSA, WSDA (was ... Distributed Registry Architecture)
Next by Date: Re: [xml-dev] XML waves (was : Divorcing Data Model and Syntax)
Previous by thread: RE: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
Next by thread: Re: [xml-dev] XML waves (was : Divorcing Data Model and Syntax)
Index(es):
- Date
- Thread