OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   [OFFLIST] A new approach to XML Validation

[ Lists Home | Date Index | Thread Index ]

I hope all is will with you. We are busy as usual over here...

I'm sending this to you offlist as I want to get my thoughts in
shape before I go public with this on xml-dev.

As you know, in a previous life I worked in the development
of financial trading systems in London. What you probably
don't know is that I worked on called "technical analysis"
of historical price and volume movements in financial markets.
Many traders swear by technical analysis as it helps them
develop an intiution - a "feel" for how a market is likely
to move. Simply put technical analysis is a mathematical
analysis of historical data used to generate visual/numerical
guides to future market movements.

Some traders claim to use technical analysis exclusively - others
combine it with analysis of market fundamentals - e.g. economic
data. Most of our clients in those days used both.

Anyway, I thought it would be interesting to apply the same technical
analysis ideas from the Financial Markets to XML validation.

In the XML world, developers learn the hard way to develop a "feel" for
documents and learn to spot those that are likely to troublesome to
process. This intuition is orthoganal to the analysis achievable
with grammer or rule based validation. Documents can parse
beautifully yet be very difficult to program and visa versa.

I've written a couple of XSLT programs (using some Java extensions)
that generate some numbers based on a technical analysis
inspired treatment of XML instance data.

The embedded formulae are based on smoothing the element/pcdata
structure of an XML instance into a Fourier series. Elements are
used to generate Sine waves with attributes used to calculate
modulation. PCDATA is used to generate Cosine waves with
moving averages over character count used to calculate
the modulation. The two are then combined into an infinite
series, summed to generate some numbers.

I've created a basic syntax for IBVL and I've run some IBVL
schemas against some XML collections - Jon Bosak's Shakespeare,
an ebXML test suite and some RSS feeds.

The current algorithms are designed to generate a single
number [0..<10]. I've generated a bunch of these from
the above XML documents. The number 1.618
has occured a number of times on documents
that I would consider "programmer friendly" and the
number 4.669 on documents that are dogs to
write software for.

I need to chase down the significance (if any) of these
numbers. It could be a problem in the XSLT extension

I think of these algorithms as a new form of XML validation.
We have grammer based validation (DTD, XSD, RNG) Rule based validation
(Schematron), Example based validation (Examplatron) and now
Intuition based validation (IBVL) with my stuff.

Anyway, I just wanted to let you know what I'm up to
and prepare you for the day I ask your help on
some of this! Your knowledge of the territory could
help me avoid making Foolish mistakes in the
IBVL algorithms.

Will be in touch.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS