Re: [xml-dev] The problems and the future of the web and a formal internet technology proposal
Dear Raphael,
I appreciate your attempt to bring back some life to this list. Let me
start by giving an alternate view of the last one-and-half decade's
developments on the web:
WHATWG formed as a consortium to propose new features for web
browsers, and there's nothing wrong with that; expecially since W3C's
XHTML 2 (XForms and co) had no chance of ever becoming part of
browsers. Regarding the leap to defining a "standard", however, I
fully agree with your notion that there's a distinction to be made
between an app platform (that nobody called for, and only browser
vendors had a vetted interest in building) and a document format used
as a primary means for communication in politics, law, education,
medical, personal, etc, etc. Let me put it this (somewhat
grandstanding) way: the idea that a self-proclaimed "standardization
body" can usurp and proclaim ownership of the way humanity communicate
digitally is ... odd.
I don't agree with your criticism of Ian Hickson's work. AFAICS (and
I've probably studied HTML 5 in detail more than most people [1]) he
made a very good job of capturing HTML 4 rules, and added a couple of
not-too-controversial elements on his own. Where it's gone wrong is
that the syntax presentation for HTML 5 (as opposed to the historical
HTML 4 DTD) doesn't convey its basic construction as a span-level
markup vocabulary extended with block-level elements. You can see this
with the definition of the paragraph ("p") element which includes an
enumeration of paragraph-terminating elements rather than referring to
the category of block-only elements. Consequently, when new elements
where added, the spec authors "forgot" to include elements into the
enumerated list of p-terminating elements, making the spec bogus. In
other words, the HTML 5.1 spec process lost control over their
workflow, and didn't want to employ SGML or other formal markup tech
to quality-assure their work either, which easily would have (and has
[1]) spotted these flaws.
In this context, let me also say that the notion of "tag soup" markup
is a myth. All versions of HTML, including HTML 5.x, can be parsed by
SGML using well-known, formal rules for tag minimization/inference and
other shortform syntax (save for the definition of the "script" and
"style" elements which were bogusly introduced into HTML such that
they would be treated as comments by legacy browsers). It's odd that
XML heads (sorry) refuse to acknowledge and apply document engineering
practices known since the 1980's or earlier when these techniques are
discussed to great length on this very xml-dev mailing list (in
1996/97). SGML has additional concepts on top of XML very relevant
today, such as custom Wiki syntaxes (lots of tech content is written
using markdown today), type-safe/injection-free HTML-aware templating,
etc. And SGML also offers a formal way to integrate HTML content into
"canonical" markup (ie XML) [3].
XML-based standards from OASIS, such as Docbook
Just fyi, Docbook was as an SGML-based vocabulary most of the time;
it's only since version 5 that dedicated SGML-based formulations have
been dropped from the spec (since XML is just a subset of SGML
anyway). I agree though OASIS (fka SGML/Open group) has put out useful
standards, and is an org I'd love to see helping to bring our
stagnation to an end.
replace the current selectors with XPath based selectors [...] the
inconvenient (sic) of not being fully XML/XPath based [...] XML
reformulation of [...] CSS3
I can only recommend to look, once in a while, at techniques outside
the XML-centric world.
Python, [...], Ruby, [...], ISLisp [...]
I'm sorry but this merely reads as a list of personal preferences.
There's nothing wrong with JavaScript; it's a language ultimately
derived from an awk-like syntax (so is very adequate for text
processing even though shortforms for pattern matching didn't make it
into the language), and is probably the most portable language out
there today.
the language Mercury
Mercury is a fine statically-typed, restricted variant of Prolog, and
even used as implementation language for Prince XML, but if you want
to push for logical programming on document data, I'd recommend to
stick to ISO Prolog which has many, many implementations. In fact,
basic Prolog (on a suitable term representation for documents) can be
used to implement a large subset of CSS selectors *and*
layout/rendering algorithms using constraint-based formulations. Its
role could be to give an executable/formal spec for CSS, just like it
has been used for giving an executable spec for XML Schema by one of
the editors of the XML Schema spec (ie. by C. M. Sperberg-McQueen
[2]).
DRM is fundamentally wrong and constitutes a stupid and useless idea
I don't like DRM either, but just as with RDF, the question is how to
finance content creation when the only or primary income is ads rather
than public funding. It's all too well-known that monopolization and
platform economy is what's happening in eg. the music "industry".
May XML live-on till the end of times
I have no doubt that XML will live and prosper, but my point is that
XML is a means to an end, not an end in itself. The "end" towards
which markup should strive is to give a reasonable, canonical way for
editing, publishing, and preserving rich text for a layman (or
realistically, a power user) on the web and elsewhere. Ask yourself
what practical, real-world solutions are there left today based on XML
for this purpose that could challenge eg. WordPress?
Let me close by listing a couple of practical initiatives for bringing
us closer to that goal, rather than going all-in on an overreaching,
XML-centric roadmap that we've already seen failing on the web:
- register a new application/html (as opposed to text/html) IANA MIME
type for web apps such that pure markup sites not relying on
JavaScript can be found easily and preferably over script-heavy sites;
over time, make it bad form and penalize accordingly to serve
script-heavy sites as text/html
- question the pay-as-you-go status of W3C spec work, and push for
public funding of HTML standardization (it's been a while that W3C has
published an HTML spec; their HTML page just links to the WHATWG HTML
"standard" HEAD on github)
- work towards identifying a reasonable set of expected visual idioms
on the modern web (such as menus and other generic controls) for which
we want to have a declarative/markup-based rather than (or in addition
to) a programmatic implementation
- push/finance W3C or other body to publish formal specs for CSS
(which is where a lot of complexity sits in today's web); try and
define reasonable CSS subsets for document-oriented use cases; try and
establish forward-compatible CSS versions/levels a site can anounce
such that we can eventually see new browsers being developed
- for the same reason, push for proper *versioned* HTML spec documents
rather than "living standards" (an oxymoron in more than one way).
Maybe the community is welcoming to such efforts. I think that last
decade's SiliCon-dominated scene has definitely lost its appeal, and
there's a growing concern towards monopolies, platforms, and verticals
and the attention economy in general.
Best regards,
M. Reichardt
sgmljs.net
[1]: http://sgmljs.net/docs/html5.html
[2]: http://cmsmcq.com/2004/podcg.html
[3]: http://sgmljs.net/docs/sgml-html-tutorial.html