Re: [xml-dev] The problems and the future of the web and a formal intern

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] The problems and the future of the web and a formal internet technology proposal

From: Rapha�l Hendricks <rhendricks@netcmail.com>
To: Marcus Reichardt <u123724@gmail.com>,Sir Timothy Berners-Lee <cos@timbl.com>,Liam Quin <liam@fromoldbooks.org>,Ivan Herman <ivan@w3.org>,Eric Prud'hommeau <eric@w3.org>,IETF Discussion Mailing List <ietf@ietf.org>,xml-dev@lists.xml.org,public-exploresemdata@w3.org,public-philoweb@w3.org,public-web-copyright@w3.org,public-dntrack@w3.org,project-admin@oasis-open.org
Date: Sun, 24 Jan 2021 17:22:08 -0500

Dear Marcus, here is the expected answer.

to all list-users: I am sorry for the delay, there have been many days where I was too weak to answer, since I am battling chronic fatigue syndrome, and have many days where I am non-functionnal; moreover, during the few days where I felt better, I had some pressing issues to handle until a week ago, then I felt bad from monday to thursday and since I felt better afterwards, I prepared the answers for the list on Friday, Saturday and Sunday.

> I fully agree with your notion that there's a distinction to be made
> between an app platform (that nobody called for, and only browser
> vendors had a vetted interest in building) and a document format used
> as a primary means for communication in politics, law, education,
> medical, personal, etc, etc.
I am glad to read that I am not the only one to see that such a distinction is needed

> I don't agree with your criticism of Ian Hickson's work. AFAICS (and
> I've probably studied HTML 5 in detail more than most people [1]) he
> made a very good job of capturing HTML 4 rules, and added a couple of
> not-too-controversial elements on his own.
The problems are several with HTML5. First, it starts from the principle that the browser developpers define the formmats, which is wrong. Standards should be developped by consortiums in which there are both specialists, developpers and people who use the sepcifications. Second, in this same vein, they reintroduced elements which had been rejected by the W3C for being the wrong solution when writing the HTML4 specification. The most visible case being the introduction of the embed element. The embed element was really a Netscape proprietary extension. The W3C chose the object element for embedded multimedia objects and rightly so. They did not do it in the exact same way as the Microsoft implementation (they did leave the classid attribute but making it optional, one can specify the identity of the multimedia object by putting the URI of the object as a value to the data attribute or as the value of the classid attribute, there being no need to specify a hexadecimal clsid value a la Microsoft).

> Where it's gone wrong is
> that the syntax presentation for HTML 5 (as opposed to the historical
> HTML 4 DTD) doesn't convey its basic construction as a span-level
> markup vocabulary extended with block-level elements. You can see this
> with the definition of the paragraph ("p") element which includes an
> enumeration of paragraph-terminating elements rather than referring to
> the category of block-only elements. Consequently, when new elements
> where added, the spec authors "forgot" to include elements into the
> enumerated list of p-terminating elements, making the spec bogus. In
> other words, the HTML 5.1 spec process lost control over their
> workflow, and didn't want to employ SGML or other formal markup tech
> to quality-assure their work either, which easily would have (and has
> [1]) spotted these flaws.
Perhaps, but even if it is so, this is only a small part of the problem.

> In this context, let me also say that the notion of "tag soup" markup
> is a myth. All versions of HTML, including HTML 5.x, can be parsed by
> SGML using well-known, formal rules for tag minimization/inference and
> other shortform syntax (save for the definition of the "script" and
> "style" elements which were bogusly introduced into HTML such that
> they would be treated as comments by legacy browsers).
I disagree, the soup is the excessive creation of unjustified tags, not impossible to parse tags. All the presentational markup tags are part of the tag soup. While it filled an unanswered need, since there was no adequate stylesheet mechanisme when Netscape 2.0 was released (Netscape 2.0 was released in 1995 and CSS1 was released in 1996) it gave the terrible result of mixing markup and presentation. The W3C answered with seperate versions including and excluding the presentation markup being the Transitional and Strict versions, starting with HTML 4.0. Internet Explorer was just as guilty, with elements such as the marquee element and so on. Many elements introduced by the browser vendors should never have been introduced.

> SGML has additional concepts on top of XML very relevant
> today, such as custom Wiki syntaxes (lots of tech content is written
> using markdown today), type-safe/injection-free HTML-aware templating,
> etc. And SGML also offers a formal way to integrate HTML content into
> "canonical" markup (ie XML) [3].
While, from what I have read (since, unlike XML, SGML is a language whith which I have no experience), it is true that some capabilities from SGML (in DTDs particularly) were lost when creating XML, which may be a weak point of XML, it is also important to note that XML has at least three major advantages over SGML. The first is a stricter syntax, in XML general identifiers and attribute names are case sensitive, they are no elements with optional closing tags and so on; this allows for more efficient parsing and teaches more rigorous syntax writing to the users. The second advantage is the addition of a second verification in the form of a well-formedness requirement as a complement to the validation, validation which is not always used, some daughter languages, such as XSLT, while XML-based are non- validating yet they can still be verified for their well-formedness. This dual layer verification is a huge advantage with XML, where the absence of well-fromedness is simply not allowed. The absence of well- formedness requirements with HTML was part of what made it so degenerated; the HTML generating public would take advantage ot the actual behaviour of web browsers even when it violated syntactic rules, often geared to the behavior of a single browser, the other browsers would then try to make those pages work in their browser, most often introducing new quirks along the way; and the end result would be a vicious cycle of browser vendors always trying to make more pages work, introducing quirks along the line, and HTML generating authors using more and more lousy markup based on the quirks introduced by the browser vendors, which would then lead to browsers being even more tolerant and so on. With XHTML, being XML- based, the well-formedness is verified before displaying the page and any error is brought-up, instead of trying to make broken pages work (making broken pages work gives the wrong message to the authors, that it is okay to make bogus pages, as long as the browser can get them working nonetheless). The third advantage of XML over SGML is that with XML comes a complementary language in the form of XPath, which is used to express everything which cannot properly be formulated with pure XML, the XML/XPath combination is extremely strong. This strong combination allows creating languages such as XSLT, Schematron, XForms, and so on. If some features, available in SGML, are badly missing in XML, it is probably best to create an XML2.0 adding the features in a manner compatible with the remaining of the language rather than switching to SGML. Also, with XML, using Schematron allows data verification at a level way beyound anything SGML DTDs will ever allow. As for the markdown issue, the idea with XML is to use specific languages for specific tasks and attach an XSLT stylesheet to convert, on arrival, the content to the final format, being it XHTML, SVG or whatever.

> XML-based standards from OASIS, such as Docbook
>
> Just fyi, Docbook was as an SGML-based vocabulary most of the time;
> it's only since version 5 that dedicated SGML-based formulations have
> been dropped from the spec (since XML is just a subset of SGML
> anyway). I agree though OASIS (fka SGML/Open group) has put out useful
> standards, and is an org I'd love to see helping to bring our
> stagnation to an end.
Well, since Docbook is now an XML based format, it can serve as a basis for further XML efforts. Moreover, as you state, OASIS is an organization which can help to further the XML effort. If only they could start work on an "docarticle" format, whith support for comments and hypertext links via extensions. This could form the basis of a reborn web, based on XML.

> replace the current selectors with XPath based selectors [...] the inconvenient (sic) of not being fully XML/XPath based [...] XML reformulation of [...] CSS3
>
> I can only recommend to look, once in a while, at techniques outside
> the XML-centric world.
A fully XML based solution allows using standard XML tools for manipulation, including XSLT.

> Python, [...], Ruby, [...], ISLisp [...]
>
> I'm sorry but this merely reads as a list of personal preferences.
I have stated why I suggested those. Python is becomming the main interpreted programming in the Unix world, Ruby is the main competitor to Python and for ISLisp, I already stated that
> those programmers who do not identify with the unix culture
> often are adepts of Lisp and ISLisp is lightweight (and as
> such better suited to this use case) and consists of the
> common subset of the major Lisp variants,
If you have a better list to suggest, by all means, please do so. What I am trying to say is that a remote software execution platform should break clean with the WWW legacy, use real programming languages, both interpreted and through bytecode.

> There's nothing wrong with JavaScript; it's a language ultimately
> derived from an awk-like syntax (so is very adequate for text
> processing even though shortforms for pattern matching didn't make it
> into the language), and is probably the most portable language out
> there today.
Javascript had its beginning under the name LiveScript, which was introduced in Netscape 2.0 to add a bit of dynammic capabilities to web pages, particularly to the the page formatting, it was not meant to write software. With Netscape 3.0, it was extended and renamed javascript, it borrowed concepts from the Java programming language, with the major difference that Java is class-based and javascript is prototype-based. It was meant to be easy to use by non-programmers (and it succeeded in being so), which most web authors were expected to be, and there is nothing wrong with that; but it was not meant to write software. Afterwards, it was extended several times, during which time Microsoft designed its own partly-compatible version called JScript. The common subset to the 2 scripting languages was standardized under the ECMA262 standard under the name ECMAScript. Instead of switching to the standard ECMAScript, as would have made sense, the Mozilla team, which inherited the Netscape legacy, continued to push javascript, extending the language with capabilities for uses less web-centric and more generic.

> the language Mercury
>
> Mercury is a fine statically-typed, restricted variant of Prolog, and
> even used as implementation language for Prince XML, but if you want
> to push for logical programming on document data, I'd recommend to
> stick to ISO Prolog which has many, many implementations.

> In fact,
> basic Prolog (on a suitable term representation for documents) can be
> used to implement a large subset of CSS selectors *and*
> layout/rendering algorithms using constraint-based formulations.

As I stated in my first short reply, I am not suggesting to use the Mercury programing language for the XML-based structural and semantic, reborn web platform. I am suggesting to use it for the second proposed platform, which is that of remote software execution, and which should rid itself of its markup legacy (HTML and XML). On the first platform, that used for content oriented websites, XForms should be used for form validation, and other "programming" needs should be handled through either XSLT or through the combination of XML, XPath and XML-events.

The reasons why I included Mercury in the list are the following: first, as it would put a purely declarative language on the list as an alternative to the imperative or hybrid languages which would constitute the other choices; Mercury allows the use of three declarative programming paradigms (logic, functional and the declarative sub-variant of object-oriented). Most purely declarative programming languages are either purely functional or purely logic. Besides Mercury, I have never heard of a purely declarative programming language supplying facilities for using declarative- object-oriented (as opposed to imperative-object-oriented as supplied in languages such as Java). Prolog which you mentionned above is meant for logic programming. If, while using Prolog, one wanted to also use the functional paradigm and the declarative-object-oriented (as present in Mercury) paradigm, it would likely require some homemade solution (possibly written using Prolog) beyound the base language. I am, however not suggesting that Prolog would necesary be a bad choice, just that a single paradigm language of this kind would bring more limitations than a triple paradigm such as Mercury. In fact, I strongly encourage you to suggest your own list of programming languages for the second platform, that of remote software execution with the reason for each choice. My list is just a mere suggestion. The second reason why I suggested Mercury is that unlike most purely declarative languages, it has a little bit of uptake in the industry while most declarative programming languages are used only in the academic world and research institutes.

>> DRM is fundamentally wrong and constitutes a stupid and useless idea
>
> I don't like DRM either, but just as with RDF, the question is how to
> finance content creation when the only or primary income is ads rather
> than public funding. It's all too well-known that monopolization and
> platform economy is what's happening in eg. the music "industry".
Your answer makes me think I should develop on the ugly three-headed monster concept, in my original message I wrote:
> The current World Wide Web [Consortium], from a great
> organization has turned into an ugly three-headed monster,
> one head is the semantic web / XML / RDF people, the second
> head is the WHATWG people trying to turn the web into a
> remote application execution framework, the third and final
> head is the copyright industry.
When it comes to content creation, there are people affiliated to two different heads. You state that the issue is the financing of content creation. I need to say that not all digital content is created and paid for the same way. Some content is created by university faculty, research institutes, governements and NGOs, that content doesn't rely on advertizing or subscriptions for its financing; the same can be said of content put up by non-media-related, non-copyright-industry related companies, such as, for example IBM putting up the description of the products and services which they have to offer; one can also add the amateur websites (which can often be as good or better than professionally produced content) and blogs. One can even add the few people who produce paid content but who are sensible enough to consider that once the customer has paid for the content he or she has unrestricted DRM-free access to it, said people who also don't wish to use the absence of structural information and semantic information as a form of DRM-lite (see my original message about this). All this type of content is perfectly compatible with the first platform proposal, that for structurally encoded and possibly sementically encoded content, based on XML and meant for content oriented websites. These content producing people can easely be affiliated with the first head. You say that the primary mean of financing is ads rather than public funding, but all the aforementioned content doesn't rely on ads or subscription for its financing. Contrarly to what you seem to imply, even if ads were definitively abolished, there would still be content available.

On the other side, there are the content producers who consider that they own their monetized content, who consider that they have the right to control the use of their content, hence want DRM. These people's vision is the antithesis of the open web which the XML-based approach, the semantic web and XPath/RDF is trying to achievve. These people are the third head. The proper thing to do about them and their content is not to compromize with them, which will invariably compromize the openness of the web and the core principles it should follow, but to keep them out of the web; having them go put their content somewhere else. Those people do not want, anyway, to make content openly available but, on the contrary, consider that "accessing their content is a service". This brings an association with the second head, that of the people trying to turn the web into a remote application execution framework, or, in other words, an online service platform. Since it has been established that there should be two separate platforms for the two uses (one for content oriented websites, the other for remote service access), it becomes clear that the place for usage-restrictions encombered content is not on the platform for openly accessible content oriented websites, therefore, it is best to have it on the remote service access platform; this is even truer when considering that the corporate media and the copyright industry are trying to turn access to their content into a service anyway. There is no use for markup on content where DRM disallows the very uses which the markup would have facilitated and there is no use for markup on content for which the markup was voluntarly misused to create a DRM-lite situation again disallowing the uses which the markup is meant to facilitate. As a final note, while it is true that a competitive market would be way better than monopolies and oligopolies as is current in the domain, I a afraid that there is little that a platform development effort / standardization effort, can do to fight against such a situation besides trying to stay away from proprietary technology and choosing technologies which allows easier indexing (such as the semantic web / RDF / RDFa) by many competing companies, sice monopolies and oligopolies stem largely from political elements rather than standardization or technical elements. As for DRM, I also want to add that it has never prohibited downloading of any kind. At the current time anyone with half a brain can download all DRM-protected ebooks, movies or music files by using IRC/Torrents/Overnet/ed2k regardless of the fact that the original version was DRM-protected.

> I have no doubt that XML will live and prosper, but my point is that
> XML is a means to an end, not an end in itself. The "end" towards
> which markup should strive is to give a reasonable, canonical way for
> editing, publishing, and preserving rich text for a layman (or
> realistically, a power user) on the web and elsewhere. Ask yourself
> what practical, real-world solutions are there left today based on XML
> for this purpose that could challenge eg. WordPress?
Anything published as HTML is plagued by loads of problems which I have addressed in my original message, the fact that the content is prepared with something such as WordPress makes the resulting content even less accessible. XML allow the content to be more easely indexed, more easely analyzed and more easely reused or further precessed due to its excellent markup.

> Let me close by listing a couple of practical initiatives for bringing
> us closer to that goal, rather than going all-in on an overreaching,
> XML-centric roadmap that we've already seen failing on the web:
>
> - register a new application/html (as opposed to text/html) IANA MIME
> type for web apps such that pure markup sites not relying on
> JavaScript can be found easily and preferably over script-heavy sites;
> over time, make it bad form and penalize accordingly to serve
> script-heavy sites as text/html
I personally believe that this is not a solution to the problem, however, it can help with the transition to something better. There should be a cleaned-up version of HTML5, or better XHTML5, separate from the full version, in the same way that the W3C created Transitional and Strict versions of HTML4 and in the same way that they creates XHTML 1.0, having more or less backward compatibility with HTML4, while preparing XHTML2.0 (which sadly never saw the light of day). The main weak point of this approach is of course that it doesn't erect the required iron curtain between the two platforms. This approach allows a content page to point via a hypertext link to an application based site, which should not be allowed. It also allows both types of results to be mixed in search engines and so on, all of which should be prohibited. It would still be an improvement, however, especially as a first step toward cleaning the whole mess.

> - question the pay-as-you-go status of W3C spec work, and push for
> public funding of HTML standardization (it's been a while that W3C has
> published an HTML spec; their HTML page just links to the WHATWG HTML
> "standard" HEAD on github)
Public instances and public bodies can and are often manipulated by monopolies or by oligopoly-backed lobbies in which case there isn't much difference compared to the corporations doing the standardization directly, in fact, it may have the sole effect of adding another administrative layer, making the process even heavier.

> - work towards identifying a reasonable set of expected visual idioms
> on the modern web (such as menus and other generic controls) for which
> we want to have a declarative/markup-based rather than (or in addition
> to) a programmatic implementation
I am not sure what to think about this. I think that any effort based on HTML5 should put the emphasis on clean-up rather than extension.

> - push/finance W3C or other body to publish formal specs for CSS
> (which is where a lot of complexity sits in today's web); try and
> define reasonable CSS subsets for document-oriented use cases; try and
> establish forward-compatible CSS versions/levels a site can anounce
> such that we can eventually see new browsers being developed
My answer will be the same than to the proposition to create a new internet media type for application oriented website, distinct than that of content oriented websites. I think it won't solve the problem, however, defining cleaned-up versions, separate from the full version. as a transitory measure would be a step in the right direction.

> - for the same reason, push for proper *versioned* HTML spec documents
> rather than "living standards" (an oxymoron in more than one way).
Well, if cleaned-up version of HTML5/XHTML5 and CSS3 are published, it is obvious that these would need to be properly defined, fixed-in- time versions.

> Maybe the community is welcoming to such efforts. I think that last
> decade's SiliCon-dominated scene has definitely lost its appeal, and
> there's a growing concern towards monopolies, platforms, and verticals
> and the attention economy in general.
The 2010s are probably the most disgusting decade ever seen in the world of computing.

> Not sure W3C is the proper recipient for what you seem to push for,
> simply because W3C has been in the web standardization game for most
> of its existence, yet wasn't able to prevent the demise of the web
> (not out of bad faith or something). It's my opinion that, If
> anything, if you want to see a big-time XML+RDF agenda of the scope
> you envisioned in your original mail, you'll risk evoking a bitter
> controversy over past decisions, at best an "a fortiori" reaction
> (such as TBL's SOLID project), but realistically nothing at all given
> that most of the things have been discussed to death in their heyday,
> but failed on the web. In fact, I believe W3C should disorganize under
> its current statue, and make room for others, if only to let die the
> illusion of the general population sitting at the table when it comes
> to define the future of the web. But anyway, I look forward to your
> detailed reply.
Can you be so kind as to state what would be the proper recipient for the proposal? In fact, in my original message, I suggested replacing the current W3C with two new consortiums, one, a reborn W3C with a strong OASIS and IETF participation but keeping Sir Timothy and the key XML/Semantic Web/XPAth People, and the other, completely separated consortium to handle the remote software execution platform.

> Not sure IETF as such is the right recipient either. After all, you
> are free to create your own RFCs as you see fit. IETF hasn't prevented
> (nor should it have) HTTP/2 and HTTP/3 with its scope creep/land-grab
> of lower IP networking layers (which now turn out to be bogus eg.
> Chrome dropping support for push resources), keeping controversial
> features such as DoH in conflict with other RFCs. Leaving a situation
> where new browsers and network stacks can only be approached by state
> actors or very large corporations, which is exactly the kind of
> situation that bona fide "standardization bodies" should strive to
> prevent.
I am hoping that having the IETF participation can help keeping the projects sufficiently open.

> I wholeheartedly agree with your opinion that web apps (as opposed to
> content-oriented web sites) should use programming languages for their
> own sanity rather than a mish-mash of markup and programmatic
> techniques a la React (which otherwise I don't think is half-bad at
> all; it just wish e4x rather than jsx had "won"). But to challenge
> that, the best way would be to establish a new JavaScript component
> framework that folks would want to use rather than approach this from
> a standardization angle; easier said than done, though.
The approach which you suggest keeps a tightly linked content platform and application platform. For the sake of sanity, it is important to set an iron curtain between the two platforms. The remote software execution platform should completely give-up its web legacy.

When I talk about an iron curtain, I mean that the remote software execution platform should be completely separate from the reborn web platform. It should not be accessed, and this is important, by using the same software as that used for the reborn web, it should not be based on the same formats, it should not use the same protocols and so on. Perhaps, it can even be meant to be accessed from different devices, about this I recommend reading again the section of my original message about DRM, user-owned devices and user-rented devices. About this last point, I wish to state that a user doesn't have much benefit in owning devices which become obsolete every few years, and, in such a case can just as well rent them.

The vision proposed is one where there are two different platforms available with nothing in common between the two and none keeps the legacy of the current web. Perhaps one platform can manifest itself as various subscription services where users subscribe to to the platform access which allows access to online services, gains access to edge computing services (included in the subscription), where the subscription possibly includes an access device and where the user subscribes to other paid or advertizing supported services available on the platform. Perhaps the other platform can manifest itself as an open access to (hand encoded or XSLT generated) content based on an XML format (possibly served through a SOAP-over-TCP protocol), where the access is through software running on the user-owned hardware and where most of the content is freely available for non-commercial use, where indexing and analyzing the data is easy and where there are no restrictions put by people who consider that they own the content and that they have the right to restrict its use. Of course, the HTML- base web as it currently exists should be killed once and for all. The two new platforms should break compatibility with the past and be incompatible between themselves. It would not be unreasonable to say that one platform would be a digital business platform and that the other platform, the reborn web, would be an open content sharing platform, even if this description wouldn't hold true one hundred percent of the time; after all, when Sir Timothy created the web in the beginning, at CERN, it was to allow open sharing of scientific papers.

The other point which I want to bring is that you seem to think that the resistance to a switch to XML/XPath and the semantic web is too big to do the change and that the money speaks in favour of maintaining the HTML5/Javascript/JSON nonsense. By seing it this way, you do not seem to take into consideration the fact that the people who are against XML/XPath and the semantic web (and who are pushing the HTML5 nonsense) are the very same people trying to turn the web into a remote software execution platform. If they are redirected to a new platform meant for remote software execution and the entities (which include most browser makers) and money behind them is also redirected to the remote software execution platform, then suddenly, there would be no more force behind the HTML5 effort and almost no one fighting against the switch to XML/XPath and the semantic web. If the people who want to turn the web into a remote software execution platform are given the opportunity to switch to a new platform better suited for remote software execution, with proper mechanisms to integrate edge-computing and supplying corporate requirements as standard from the beginning, including security mechanisms and, yuk!, copy-protection; they will hopefully do so and reap its benefits; the web environment will then be mostly free of XML/XPath and semantic web opponents, the switch to XML/XPath and the semantic web can then happen with little opposition. As already stated if a new remote software execution platform is to be created, it should be now, when edge-computing is to become important, so as to integrate it from the start, afterwards, the opportunity will be past. Of course, as I already stated, it is best to rid the reborn web of the old names (html, xhtml, http, etc.) to avoid raising false expectations, a new set of names would be best for the new technologies.

As an extra note, I see that you do not touch at all to the subject of the adequation of integrating standard mechanisms for the use of edge computing in the remote software execution platform, is this voluntary?
I do believe that the coming of widely-used edge computing is the very reason why the people trying to turn the web into a remote software execution platform / pushing HTML5/JSON/javascript may be willing to allow the schismm (into two new platforms) as a new platform for remote software execution may offer proper mechanisms for edge computing integration from the start, instead of having to pile another hack on top of the current pile.

Rapha�l Hendricks

References:
- The problems and the future of the web and a formal internet technology proposal
  - From: Rapha�l Hendricks <rhendricks@netcmail.com>
- Re: [xml-dev] The problems and the future of the web and a formalinternet technology proposal
  - From: Marcus Reichardt <u123724@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]