[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] The problems and the future of the web and a formal internet technology proposal
- From: Raphaël Hendricks <rhendricks@netcmail.com>
- To: Marcus Reichardt <u123724@gmail.com>,Sir Timothy Berners-Lee <cos@timbl.com>,Liam Quin <liam@fromoldbooks.org>,Ivan Herman <ivan@w3.org>,Eric Prud'hommeau <eric@w3.org>,IETF Discussion Mailing List <ietf@ietf.org>,xml-dev@lists.xml.org,public-exploresemdata@w3.org,public-philoweb@w3.org,public-web-copyright@w3.org,public-dntrack@w3.org,project-admin@oasis-open.org
- Date: Sun, 24 Jan 2021 17:22:08 -0500
Dear Marcus, here is the expected answer.
to all list-users: I am sorry for the delay, there have been many
days where I was too weak to answer, since I am battling chronic
fatigue syndrome, and have many days where I am non-functionnal;
moreover, during the few days where I felt better, I had some
pressing issues to handle until a week ago, then I felt bad from
monday to thursday and since I felt better afterwards, I prepared the
answers for the list on Friday, Saturday and Sunday.
> I fully agree with your notion that there's a distinction to be made
> between an app platform (that nobody called for, and only browser
> vendors had a vetted interest in building) and a document format used
> as a primary means for communication in politics, law, education,
> medical, personal, etc, etc.
I am glad to read that I am not the only one to see that such a
distinction is needed
> I don't agree with your criticism of Ian Hickson's work. AFAICS (and
> I've probably studied HTML 5 in detail more than most people [1]) he
> made a very good job of capturing HTML 4 rules, and added a couple of
> not-too-controversial elements on his own.
The problems are several with HTML5. First, it starts from the
principle that the browser developpers define the formmats, which is
wrong. Standards should be developped by consortiums in which there
are both specialists, developpers and people who use the
sepcifications. Second, in this same vein, they reintroduced elements
which had been rejected by the W3C for being the wrong solution when
writing the HTML4 specification. The most visible case being the
introduction of the embed element. The embed element was really a
Netscape proprietary extension. The W3C chose the object element for
embedded multimedia objects and rightly so. They did not do it in the
exact same way as the Microsoft implementation (they did leave the
classid attribute but making it optional, one can specify the
identity of the multimedia object by putting the URI of the object as
a value to the data attribute or as the value of the classid
attribute, there being no need to specify a hexadecimal clsid value a
la Microsoft).
> Where it's gone wrong is
> that the syntax presentation for HTML 5 (as opposed to the historical
> HTML 4 DTD) doesn't convey its basic construction as a span-level
> markup vocabulary extended with block-level elements. You can see
this
> with the definition of the paragraph ("p") element which includes an
> enumeration of paragraph-terminating elements rather than
referring to
> the category of block-only elements. Consequently, when new elements
> where added, the spec authors "forgot" to include elements into the
> enumerated list of p-terminating elements, making the spec bogus. In
> other words, the HTML 5.1 spec process lost control over their
> workflow, and didn't want to employ SGML or other formal markup tech
> to quality-assure their work either, which easily would have (and has
> [1]) spotted these flaws.
Perhaps, but even if it is so, this is only a small part of the problem.
> In this context, let me also say that the notion of "tag soup" markup
> is a myth. All versions of HTML, including HTML 5.x, can be parsed by
> SGML using well-known, formal rules for tag minimization/inference
and
> other shortform syntax (save for the definition of the "script" and
> "style" elements which were bogusly introduced into HTML such that
> they would be treated as comments by legacy browsers).
I disagree, the soup is the excessive creation of unjustified tags,
not impossible to parse tags. All the presentational markup tags are
part of the tag soup. While it filled an unanswered need, since there
was no adequate stylesheet mechanisme when Netscape 2.0 was released
(Netscape 2.0 was released in 1995 and CSS1 was released in 1996) it
gave the terrible result of mixing markup and presentation. The W3C
answered with seperate versions including and excluding the
presentation markup being the Transitional and Strict versions,
starting with HTML 4.0. Internet Explorer was just as guilty, with
elements such as the marquee element and so on. Many elements
introduced by the browser vendors should never have been introduced.
> SGML has additional concepts on top of XML very relevant
> today, such as custom Wiki syntaxes (lots of tech content is written
> using markdown today), type-safe/injection-free HTML-aware
templating,
> etc. And SGML also offers a formal way to integrate HTML content into
> "canonical" markup (ie XML) [3].
While, from what I have read (since, unlike XML, SGML is a language
whith which I have no experience), it is true that some capabilities
from SGML (in DTDs particularly) were lost when creating XML, which
may be a weak point of XML, it is also important to note that XML has
at least three major advantages over SGML. The first is a stricter
syntax, in XML general identifiers and attribute names are case
sensitive, they are no elements with optional closing tags and so on;
this allows for more efficient parsing and teaches more rigorous
syntax writing to the users. The second advantage is the addition of
a second verification in the form of a well-formedness requirement as
a complement to the validation, validation which is not always used,
some daughter languages, such as XSLT, while XML-based are non-
validating yet they can still be verified for their well-formedness.
This dual layer verification is a huge advantage with XML, where the
absence of well-fromedness is simply not allowed. The absence of well-
formedness requirements with HTML was part of what made it so
degenerated; the HTML generating public would take advantage ot the
actual behaviour of web browsers even when it violated syntactic
rules, often geared to the behavior of a single browser, the other
browsers would then try to make those pages work in their browser,
most often introducing new quirks along the way; and the end result
would be a vicious cycle of browser vendors always trying to make
more pages work, introducing quirks along the line, and HTML
generating authors using more and more lousy markup based on the
quirks introduced by the browser vendors, which would then lead to
browsers being even more tolerant and so on. With XHTML, being XML-
based, the well-formedness is verified before displaying the page and
any error is brought-up, instead of trying to make broken pages work
(making broken pages work gives the wrong message to the authors,
that it is okay to make bogus pages, as long as the browser can get
them working nonetheless). The third advantage of XML over SGML is
that with XML comes a complementary language in the form of XPath,
which is used to express everything which cannot properly be
formulated with pure XML, the XML/XPath combination is extremely
strong. This strong combination allows creating languages such as
XSLT, Schematron, XForms, and so on. If some features, available in
SGML, are badly missing in XML, it is probably best to create an
XML2.0 adding the features in a manner compatible with the remaining
of the language rather than switching to SGML. Also, with XML, using
Schematron allows data verification at a level way beyound anything
SGML DTDs will ever allow. As for the markdown issue, the idea with
XML is to use specific languages for specific tasks and attach an
XSLT stylesheet to convert, on arrival, the content to the final
format, being it XHTML, SVG or whatever.
> XML-based standards from OASIS, such as Docbook
>
> Just fyi, Docbook was as an SGML-based vocabulary most of the time;
> it's only since version 5 that dedicated SGML-based formulations have
> been dropped from the spec (since XML is just a subset of SGML
> anyway). I agree though OASIS (fka SGML/Open group) has put out
useful
> standards, and is an org I'd love to see helping to bring our
> stagnation to an end.
Well, since Docbook is now an XML based format, it can serve as a
basis for further XML efforts. Moreover, as you state, OASIS is an
organization which can help to further the XML effort. If only they
could start work on an "docarticle" format, whith support for
comments and hypertext links via extensions. This could form the
basis of a reborn web, based on XML.
> replace the current selectors with XPath based selectors [...] the
inconvenient (sic) of not being fully XML/XPath based [...] XML
reformulation of [...] CSS3
>
> I can only recommend to look, once in a while, at techniques outside
> the XML-centric world.
A fully XML based solution allows using standard XML tools for
manipulation, including XSLT.
> Python, [...], Ruby, [...], ISLisp [...]
>
> I'm sorry but this merely reads as a list of personal preferences.
I have stated why I suggested those. Python is becomming the main
interpreted programming in the Unix world, Ruby is the main
competitor to Python and for ISLisp, I already stated that
> those programmers who do not identify with the unix culture
> often are adepts of Lisp and ISLisp is lightweight (and as
> such better suited to this use case) and consists of the
> common subset of the major Lisp variants,
If you have a better list to suggest, by all means, please do so.
What I am trying to say is that a remote software execution platform
should break clean with the WWW legacy, use real programming
languages, both interpreted and through bytecode.
> There's nothing wrong with JavaScript; it's a language ultimately
> derived from an awk-like syntax (so is very adequate for text
> processing even though shortforms for pattern matching didn't make it
> into the language), and is probably the most portable language out
> there today.
Javascript had its beginning under the name LiveScript, which was
introduced in Netscape 2.0 to add a bit of dynammic capabilities to
web pages, particularly to the the page formatting, it was not meant
to write software. With Netscape 3.0, it was extended and renamed
javascript, it borrowed concepts from the Java programming language,
with the major difference that Java is class-based and javascript is
prototype-based. It was meant to be easy to use by non-programmers
(and it succeeded in being so), which most web authors were expected
to be, and there is nothing wrong with that; but it was not meant to
write software. Afterwards, it was extended several times, during
which time Microsoft designed its own partly-compatible version
called JScript. The common subset to the 2 scripting languages was
standardized under the ECMA262 standard under the name ECMAScript.
Instead of switching to the standard ECMAScript, as would have made
sense, the Mozilla team, which inherited the Netscape legacy,
continued to push javascript, extending the language with
capabilities for uses less web-centric and more generic.
> the language Mercury
>
> Mercury is a fine statically-typed, restricted variant of Prolog, and
> even used as implementation language for Prince XML, but if you want
> to push for logical programming on document data, I'd recommend to
> stick to ISO Prolog which has many, many implementations.
> In fact,
> basic Prolog (on a suitable term representation for documents) can be
> used to implement a large subset of CSS selectors *and*
> layout/rendering algorithms using constraint-based formulations.
As I stated in my first short reply, I am not suggesting to use the
Mercury programing language for the XML-based structural and
semantic, reborn web platform. I am suggesting to use it for the
second proposed platform, which is that of remote software execution,
and which should rid itself of its markup legacy (HTML and XML). On
the first platform, that used for content oriented websites, XForms
should be used for form validation, and other "programming" needs
should be handled through either XSLT or through the combination of
XML, XPath and XML-events.
The reasons why I included Mercury in the list are the following:
first, as it would put a purely declarative language on the list as
an alternative to the imperative or hybrid languages which would
constitute the other choices; Mercury allows the use of three
declarative programming paradigms (logic, functional and the
declarative sub-variant of object-oriented). Most purely declarative
programming languages are either purely functional or purely logic.
Besides Mercury, I have never heard of a purely declarative
programming language supplying facilities for using declarative-
object-oriented (as opposed to imperative-object-oriented as supplied
in languages such as Java). Prolog which you mentionned above is
meant for logic programming. If, while using Prolog, one wanted to
also use the functional paradigm and the declarative-object-oriented
(as present in Mercury) paradigm, it would likely require some
homemade solution (possibly written using Prolog) beyound the base
language. I am, however not suggesting that Prolog would necesary be
a bad choice, just that a single paradigm language of this kind would
bring more limitations than a triple paradigm such as Mercury. In
fact, I strongly encourage you to suggest your own list of
programming languages for the second platform, that of remote
software execution with the reason for each choice. My list is just a
mere suggestion. The second reason why I suggested Mercury is that
unlike most purely declarative languages, it has a little bit of
uptake in the industry while most declarative programming languages
are used only in the academic world and research institutes.
>> DRM is fundamentally wrong and constitutes a stupid and useless idea
>
> I don't like DRM either, but just as with RDF, the question is how to
> finance content creation when the only or primary income is ads
rather
> than public funding. It's all too well-known that monopolization and
> platform economy is what's happening in eg. the music "industry".
Your answer makes me think I should develop on the ugly three-headed
monster concept, in my original message I wrote:
> The current World Wide Web [Consortium], from a great
> organization has turned into an ugly three-headed monster,
> one head is the semantic web / XML / RDF people, the second
> head is the WHATWG people trying to turn the web into a
> remote application execution framework, the third and final
> head is the copyright industry.
When it comes to content creation, there are people affiliated to two
different heads. You state that the issue is the financing of content
creation. I need to say that not all digital content is created and
paid for the same way. Some content is created by university faculty,
research institutes, governements and NGOs, that content doesn't rely
on advertizing or subscriptions for its financing; the same can be
said of content put up by non-media-related, non-copyright-industry
related companies, such as, for example IBM putting up the
description of the products and services which they have to offer;
one can also add the amateur websites (which can often be as good or
better than professionally produced content) and blogs. One can even
add the few people who produce paid content but who are sensible
enough to consider that once the customer has paid for the content he
or she has unrestricted DRM-free access to it, said people who also
don't wish to use the absence of structural information and semantic
information as a form of DRM-lite (see my original message about
this). All this type of content is perfectly compatible with the
first platform proposal, that for structurally encoded and possibly
sementically encoded content, based on XML and meant for content
oriented websites. These content producing people can easely be
affiliated with the first head. You say that the primary mean of
financing is ads rather than public funding, but all the
aforementioned content doesn't rely on ads or subscription for its
financing. Contrarly to what you seem to imply, even if ads were
definitively abolished, there would still be content available.
On the other side, there are the content producers who consider that
they own their monetized content, who consider that they have the
right to control the use of their content, hence want DRM. These
people's vision is the antithesis of the open web which the XML-based
approach, the semantic web and XPath/RDF is trying to achievve. These
people are the third head. The proper thing to do about them and
their content is not to compromize with them, which will invariably
compromize the openness of the web and the core principles it should
follow, but to keep them out of the web; having them go put their
content somewhere else. Those people do not want, anyway, to make
content openly available but, on the contrary, consider that
"accessing their content is a service". This brings an association
with the second head, that of the people trying to turn the web into
a remote application execution framework, or, in other words, an
online service platform. Since it has been established that there
should be two separate platforms for the two uses (one for content
oriented websites, the other for remote service access), it becomes
clear that the place for usage-restrictions encombered content is not
on the platform for openly accessible content oriented websites,
therefore, it is best to have it on the remote service access
platform; this is even truer when considering that the corporate
media and the copyright industry are trying to turn access to their
content into a service anyway. There is no use for markup on content
where DRM disallows the very uses which the markup would have
facilitated and there is no use for markup on content for which the
markup was voluntarly misused to create a DRM-lite situation again
disallowing the uses which the markup is meant to facilitate. As a
final note, while it is true that a competitive market would be way
better than monopolies and oligopolies as is current in the domain, I
a afraid that there is little that a platform development effort /
standardization effort, can do to fight against such a situation
besides trying to stay away from proprietary technology and choosing
technologies which allows easier indexing (such as the semantic web /
RDF / RDFa) by many competing companies, sice monopolies and
oligopolies stem largely from political elements rather than
standardization or technical elements. As for DRM, I also want to add
that it has never prohibited downloading of any kind. At the current
time anyone with half a brain can download all DRM-protected ebooks,
movies or music files by using IRC/Torrents/Overnet/ed2k regardless
of the fact that the original version was DRM-protected.
> I have no doubt that XML will live and prosper, but my point is that
> XML is a means to an end, not an end in itself. The "end" towards
> which markup should strive is to give a reasonable, canonical way for
> editing, publishing, and preserving rich text for a layman (or
> realistically, a power user) on the web and elsewhere. Ask yourself
> what practical, real-world solutions are there left today based on
XML
> for this purpose that could challenge eg. WordPress?
Anything published as HTML is plagued by loads of problems which I
have addressed in my original message, the fact that the content is
prepared with something such as WordPress makes the resulting content
even less accessible. XML allow the content to be more easely
indexed, more easely analyzed and more easely reused or further
precessed due to its excellent markup.
> Let me close by listing a couple of practical initiatives for
bringing
> us closer to that goal, rather than going all-in on an overreaching,
> XML-centric roadmap that we've already seen failing on the web:
>
> - register a new application/html (as opposed to text/html) IANA MIME
> type for web apps such that pure markup sites not relying on
> JavaScript can be found easily and preferably over script-heavy
sites;
> over time, make it bad form and penalize accordingly to serve
> script-heavy sites as text/html
I personally believe that this is not a solution to the problem,
however, it can help with the transition to something better. There
should be a cleaned-up version of HTML5, or better XHTML5, separate
from the full version, in the same way that the W3C created
Transitional and Strict versions of HTML4 and in the same way that
they creates XHTML 1.0, having more or less backward compatibility
with HTML4, while preparing XHTML2.0 (which sadly never saw the light
of day). The main weak point of this approach is of course that it
doesn't erect the required iron curtain between the two platforms.
This approach allows a content page to point via a hypertext link to
an application based site, which should not be allowed. It also
allows both types of results to be mixed in search engines and so on,
all of which should be prohibited. It would still be an improvement,
however, especially as a first step toward cleaning the whole mess.
> - question the pay-as-you-go status of W3C spec work, and push for
> public funding of HTML standardization (it's been a while that W3C
has
> published an HTML spec; their HTML page just links to the WHATWG HTML
> "standard" HEAD on github)
Public instances and public bodies can and are often manipulated by
monopolies or by oligopoly-backed lobbies in which case there isn't
much difference compared to the corporations doing the
standardization directly, in fact, it may have the sole effect of
adding another administrative layer, making the process even heavier.
> - work towards identifying a reasonable set of expected visual idioms
> on the modern web (such as menus and other generic controls) for
which
> we want to have a declarative/markup-based rather than (or in
addition
> to) a programmatic implementation
I am not sure what to think about this. I think that any effort based
on HTML5 should put the emphasis on clean-up rather than extension.
> - push/finance W3C or other body to publish formal specs for CSS
> (which is where a lot of complexity sits in today's web); try and
> define reasonable CSS subsets for document-oriented use cases; try
and
> establish forward-compatible CSS versions/levels a site can anounce
> such that we can eventually see new browsers being developed
My answer will be the same than to the proposition to create a new
internet media type for application oriented website, distinct than
that of content oriented websites. I think it won't solve the
problem, however, defining cleaned-up versions, separate from the
full version. as a transitory measure would be a step in the right
direction.
> - for the same reason, push for proper *versioned* HTML spec
documents
> rather than "living standards" (an oxymoron in more than one way).
Well, if cleaned-up version of HTML5/XHTML5 and CSS3 are published,
it is obvious that these would need to be properly defined, fixed-in-
time versions.
> Maybe the community is welcoming to such efforts. I think that last
> decade's SiliCon-dominated scene has definitely lost its appeal, and
> there's a growing concern towards monopolies, platforms, and
verticals
> and the attention economy in general.
The 2010s are probably the most disgusting decade ever seen in the
world of computing.
> Not sure W3C is the proper recipient for what you seem to push for,
> simply because W3C has been in the web standardization game for most
> of its existence, yet wasn't able to prevent the demise of the web
> (not out of bad faith or something). It's my opinion that, If
> anything, if you want to see a big-time XML+RDF agenda of the scope
> you envisioned in your original mail, you'll risk evoking a bitter
> controversy over past decisions, at best an "a fortiori" reaction
> (such as TBL's SOLID project), but realistically nothing at all given
> that most of the things have been discussed to death in their heyday,
> but failed on the web. In fact, I believe W3C should disorganize
under
> its current statue, and make room for others, if only to let die the
> illusion of the general population sitting at the table when it comes
> to define the future of the web. But anyway, I look forward to your
> detailed reply.
Can you be so kind as to state what would be the proper recipient for
the proposal? In fact, in my original message, I suggested replacing
the current W3C with two new consortiums, one, a reborn W3C with a
strong OASIS and IETF participation but keeping Sir Timothy and the
key XML/Semantic Web/XPAth People, and the other, completely
separated consortium to handle the remote software execution platform.
> Not sure IETF as such is the right recipient either. After all, you
> are free to create your own RFCs as you see fit. IETF hasn't
prevented
> (nor should it have) HTTP/2 and HTTP/3 with its scope creep/land-grab
> of lower IP networking layers (which now turn out to be bogus eg.
> Chrome dropping support for push resources), keeping controversial
> features such as DoH in conflict with other RFCs. Leaving a situation
> where new browsers and network stacks can only be approached by state
> actors or very large corporations, which is exactly the kind of
> situation that bona fide "standardization bodies" should strive to
> prevent.
I am hoping that having the IETF participation can help keeping the
projects sufficiently open.
> I wholeheartedly agree with your opinion that web apps (as opposed to
> content-oriented web sites) should use programming languages for
their
> own sanity rather than a mish-mash of markup and programmatic
> techniques a la React (which otherwise I don't think is half-bad at
> all; it just wish e4x rather than jsx had "won"). But to challenge
> that, the best way would be to establish a new JavaScript component
> framework that folks would want to use rather than approach this from
> a standardization angle; easier said than done, though.
The approach which you suggest keeps a tightly linked content
platform and application platform. For the sake of sanity, it is
important to set an iron curtain between the two platforms. The
remote software execution platform should completely give-up its web
legacy.
When I talk about an iron curtain, I mean that the remote software
execution platform should be completely separate from the reborn web
platform. It should not be accessed, and this is important, by using
the same software as that used for the reborn web, it should not be
based on the same formats, it should not use the same protocols and
so on. Perhaps, it can even be meant to be accessed from different
devices, about this I recommend reading again the section of my
original message about DRM, user-owned devices and user-rented
devices. About this last point, I wish to state that a user doesn't
have much benefit in owning devices which become obsolete every few
years, and, in such a case can just as well rent them.
The vision proposed is one where there are two different platforms
available with nothing in common between the two and none keeps the
legacy of the current web. Perhaps one platform can manifest itself
as various subscription services where users subscribe to to the
platform access which allows access to online services, gains access
to edge computing services (included in the subscription), where the
subscription possibly includes an access device and where the user
subscribes to other paid or advertizing supported services available
on the platform. Perhaps the other platform can manifest itself as an
open access to (hand encoded or XSLT generated) content based on an
XML format (possibly served through a SOAP-over-TCP protocol), where
the access is through software running on the user-owned hardware and
where most of the content is freely available for non-commercial use,
where indexing and analyzing the data is easy and where there are no
restrictions put by people who consider that they own the content and
that they have the right to restrict its use. Of course, the HTML-
base web as it currently exists should be killed once and for all.
The two new platforms should break compatibility with the past and be
incompatible between themselves. It would not be unreasonable to say
that one platform would be a digital business platform and that the
other platform, the reborn web, would be an open content sharing
platform, even if this description wouldn't hold true one hundred
percent of the time; after all, when Sir Timothy created the web in
the beginning, at CERN, it was to allow open sharing of scientific
papers.
The other point which I want to bring is that you seem to think that
the resistance to a switch to XML/XPath and the semantic web is too
big to do the change and that the money speaks in favour of
maintaining the HTML5/Javascript/JSON nonsense. By seing it this way,
you do not seem to take into consideration the fact that the people
who are against XML/XPath and the semantic web (and who are pushing
the HTML5 nonsense) are the very same people trying to turn the web
into a remote software execution platform. If they are redirected to
a new platform meant for remote software execution and the entities
(which include most browser makers) and money behind them is also
redirected to the remote software execution platform, then suddenly,
there would be no more force behind the HTML5 effort and almost no
one fighting against the switch to XML/XPath and the semantic web. If
the people who want to turn the web into a remote software execution
platform are given the opportunity to switch to a new platform better
suited for remote software execution, with proper mechanisms to
integrate edge-computing and supplying corporate requirements as
standard from the beginning, including security mechanisms and, yuk!,
copy-protection; they will hopefully do so and reap its benefits; the
web environment will then be mostly free of XML/XPath and semantic
web opponents, the switch to XML/XPath and the semantic web can then
happen with little opposition. As already stated if a new remote
software execution platform is to be created, it should be now, when
edge-computing is to become important, so as to integrate it from the
start, afterwards, the opportunity will be past. Of course, as I
already stated, it is best to rid the reborn web of the old names
(html, xhtml, http, etc.) to avoid raising false expectations, a new
set of names would be best for the new technologies.
As an extra note, I see that you do not touch at all to the subject
of the adequation of integrating standard mechanisms for the use of
edge computing in the remote software execution platform, is this
voluntary?
I do believe that the coming of widely-used edge computing is the
very reason why the people trying to turn the web into a remote
software execution platform / pushing HTML5/JSON/javascript may be
willing to allow the schismm (into two new platforms) as a new
platform for remote software execution may offer proper mechanisms
for edge computing integration from the start, instead of having to
pile another hack on top of the current pile.
Raphaël Hendricks
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]