hackable xml

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

hackable xml

From: Andrew Welch <andrew.j.welch@gmail.com>
To: xml-dev <xml-dev@lists.xml.org>
Date: Mon, 26 Jul 2010 12:08:24 +0100

One of the reasons I think non-XML devs struggle with XML is because
while it looks simple, it's actually very complex - while it appears
to be just-angle-brackets and can be treated as a string (and often
is) it really must be parsed and serialised using a specialist tool
for the job.  And even then, the apis to work with it aren't exactly
dev friendly.

Namespaces, encoding, entities etc all prevent XML from being read and
written as a string, keeping its "hackability" low, and causing
frustration amongst non-XML devs (who then look for alternatives).

So to simplify XML and make it easier for the masses to handle,  I
think a minimalist subset is needed - the absolute minimum to keep
mixed content and attributes, and thats all.  I know this has been
talked about before, but perhaps opinions have changed over time,
especially with the apparent perception of XML these days.

Remember the goal is a simplified minimalist XML to complement the
older bigger sibling XML, not replace it:

1. Elements with no prefix are in no namespace

2. Entity refs no longer exist, other than the inbuilt ones.  There is
no DTD.  (numeric refs remain)

3. PIs, CDATA sections gone

4. Encoding must be UTF-8 (or some similar rule: its to remove the
potential mismatch between the encoding in the prolog and the actual
encoding)

5. Lone inbuilt entites such as "&" in the lexical XML are
automatically parsed as &amp; and not an error (#2 above might enable
this). Same goes for a lone "<".

6a. Namespaces no longer exist - there is no ability to differentiate
elements with the same name in the same document.

or

6b. The namespace prefix is significant, and is not mapped. A Name
just consists of a prefix and localname and nothing else. For example
a well-formed document would be:

   <foo:bar/>

Just differentiating on prefix would cover 100% of the cases I've ever
been involved with.  I have never, ever, seen 2 prefixes with
different namespaces in the same document.  There is no need to map a
prefix to a namespace, the prefix provides all the uniqueness
necessary within a domain, global uniqueness isn't needed.  This would
simplify a huge number of issues - both for devs and for implementors.
 One simple example: the problem of how do you map "foo" before you
can use the xpath "/foo:bar" goes away - the XPath is self contained
for the first time... running that xpath is a one liner.

So the goal is a minimalist xml that strips as much as possible away
to make it "hackable" by the masses, keeping mixed content and
attributes, the reason why you would use xml in the first place.

The need is there - is there a reason why this can't be done?


-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Follow-Ups:
- Re: [xml-dev] hackable xml
  - From: Stephen Green <stephengreenubl@gmail.com>
- Re: [xml-dev] hackable xml
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] hackable xml
  - From: David Carlisle <davidc@nag.co.uk>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]