[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Re: ***SPAM*** [xml-dev] Re: The Goals of XML at 25,and the one thing that XML now needs
- From: Marcus Reichardt <u123724@gmail.com>
- To: Rick Jelliffe <rjelliffe@allette.com.au>
- Date: Thu, 22 Jul 2021 13:04:19 +0200
> But I suggest that modes preclude many and perhaps most parallel implementation methods,
> which has helped XML adroitly avoid most of the improvements to CPUs in the
> last 25 years. And a technology in that situation is not in a healthy place.
XML and SGML don't have escape characters, owing to their IBM as
opposed to Unix tradition also showing in its commenting syntax
(technically, SGML has MSOCHAR, MSICHAR, MSSCHAR but these are not
assigned in the default concrete syntax). Verbatim inclusion of markup
delimiters is performed through entity references (eg <). So isn't
SGML/XML already doing what you're postulating ie avoiding the need
for arbitrary lookback in assessing whether a given escape isn't
escaped itself etc etc, which would be necessary with backslash
escapes in the Unix way, and which btw can't also be captured by
regexpes, or only in a bounded way?
>> I doubt many people care about WebSGML today.
> Indeed. But 100% of the people who do are probably on this list. Anyway, the point is
> exclusion again: if it is limited to what SGML can do (without the stable door of SEEALSO)
> then there is no point.
At the risk of stating the obvious, SGML remains the only standardized
markup meta-language capable of dealing with HTML, by far the most
important application of markup technology. The XML specification
itself starts with these words:
The Extensible Markup Language (XML) is a subset of SGML that is
completely described
in this document. Its goal is to enable generic SGML to be served,
received, and processed
on the Web in the way that is now possible with HTML.
Considering the fact that XML has failed on the web (while it's still
hugely useful in other ways I don't want to miss to be sure), is it
not relevant to attempt to avoid
- [having] to work at every computer task as if it had no need to
share data with all their other computer tasks
- [having] to act as if the computer is simply a complicated,
slightly-more-lively replacement for paper
- [having] to spend days trying to make dissimilar computers communicate
- [having] to appease software programs that seem to be at war
with one another
(citing from the first paragraph of Yuri Rubinski's foreword to The
SGML Handbook)?
You known, as opposed to obsessing over (the trivial, accidental
format that is) JSON for like 15 years?
Worse, HTML parsing complexity pales in comparison to the unhealthy
amount of CSS specs the W3C is churning out as their last remaining
stake in the web stack, with thousands and thousands of microsyntaxes,
layout models, and idiosyncrasies making Web content largely
incomprehensible for generations to come unless you're kissing the
ring of an ad company. Not to speak of tens of thousands Chrome-only
JS API functions.
But this seems to not be on the radar here either, and *that* is
what's unhealthy.
On 7/22/21, Rick Jelliffe <rjelliffe@allette.com.au> wrote:
> (Follow on)
>
> On Thu, Jul 22, 2021 at 7:42 AM Liam R. E. Quin <liam@fromoldbooks.org>
> wrote:
>
>> On Wed, 2021-07-21 at 14:29 +1000, Rick Jelliffe wrote:
>>
>> > *NON-GOALS*
>> >
>> > 1. The language* MUST NOT* be lexically identical to or a subset of
>> > XML.
>> >
>> So, deliberately incompatible. Or do you mean, the process of
>> developing the language must not be constrained to be, forced to be,
>> lexically identical to.. etc etc?
>>
>
> Neither. Deliberately not a subset or identical. But no gratuitous
> differences.
>
>> 2. The language *MUST NOT* have an identical or subset infoset to the
>> > XML Infoset.
>> Strictly speaking the XML Information Set is a vocabulary of terms. In
>> particular it is emphatically not a data model
>>
>
> Good point. But no change is needed. Because the new language MUST do
> something
> different, it cannot have an identical or subset vocabulary: it must have
> something else.
>
>
>> >
>> > 3. The language *MUST NOT* be characterizable by WebSGML
>>
>> I doubt many people care about WebSGML today.
>>
>
> Indeed. But 100% of the people who do are probably on this list. Anyway,
> the point is
> exclusion again: if it is limited to what SGML can do (without the stable
> door of SEEALSO)
> then there is no point.
>
>
>> > 4. The language *MUST NOT *be, for every possible document,
>> > completely
>> > interconvertable with *JSON.
>>
>> As John Cowan pointed out, this is a nonsense.
>>
>> I don't see it. What standard method do you have of converting a JSON
> document with
> type information to XML with no schema, using only mechanisms in XML and no
> schema? You end
> up with a bag of names, and nothing in the XML rules lets you infer a
> relationship between some
> value and a JSON storage type.
>
> Please note "inter-convertable with" not merely "convertable to" JSON.
> That you can add an extra
> processing layer is irrelevant, since we are talking language features not
> subsequent processing;
> or I am at least.
>
> Are we perhaps meaning something different by "every" here? I mean no more
> than the other requirements.
> If there is some JSON document that cannot be directly represented in this
> language, it is no problem, and
> quite likely. As for the vice versa, without knowing what the features are
> of the language there is
> no need to assert that every document could be round-tripped into JSON and
> back, even though it is certainly
> likely, not a goal.
>
>
>> >
>> > 5. The language *MUST NOT* support all declarative possibilities of
>> > XML
>> > Namespaces.
>>
>> So it must be a subset of a spec that deosn't do very much...
>>
>
> Made me laugh. Every non-goal does nothing in the final result, in a
> sense... :-)
>
> We already have XML Namespaces: what is the point in merely replicating its
> virtues and flaws?
>
>
>> > It *MUST* be possible to know that a name has a namespace from
>> > its lexical form.
>>
>> So, no default namespaces. This removes support some of the use cases
>> we had, of course.
>>
>
> Yes indeed. The road to hell is paved with good intentions.
>
> One of the reason people like Schematron is that it uses that regime.
>
> I don't think it is inconceivable that apart from human writers/readers,
> there may be
> some processing and developer benefit if, when we see a name, we
> immediately know
> whether we have to look up a namespace, and that (if there is no
> redeclaration allowed)
> if we compare two names, we can do it merely using the prefixes not the
> URLs. In an XML parser, I expect
> that there would be code and data arranged for efficiency, but it is otiose
> to the requirement of knowing whether
> two names are the same and binding them to some other process that is free
> to use its own prefixes.
>
>
>> > It *MUST* be possible to determine a namespace URL by
>> > scanning back far enough in the document to find the lexically most
>> > recent
>> > xmln:XX declaration for that value
>>
>> This is not the case in XML today, since attributes using a prefix can
>> appear lexically before the declaration.
>>
>
> Yes.
>
>
>> >
>> > 6. Language design choices *MUST NOT* be made which compromise the
>> > potential efficiency of parsing,
>>
>> So, developers are more important than users
>>
>
> Developers are humans too, and just as much worthy of a standards-maker's
> consideration
> as Joe Public, surely? Isn't that the basis of all RFC-based
> technologies? And (oops this
> sentence probably has regressed into trolling) we already have XML 1.n
> --as it has turned out with XSD etc--
> capably fulfilling the niche of something that is way too difficult for
> non-corporate developers
> to implement, yes?
>
> More seriously, isn't that a false opposition? Why isn't it possible to
> have both: a language that users
> and developers will find convenient enough, though not as user-friendly in
> some cases as XML or as
> developer-friendly in probably all cases as JSON? Yet one that can commend
> itself by also supporting
> something they don't?
>
>
>> Pfooey.
>>
>
> Duck!
>
>>
>> > 0. The language is a markup language. It should support mixed
>> > content. It
>> > should support humans.
>>
>> Desn't this contradict must-not goal 6
>>
>
> No. 6 tempers this.
>
> I thought there should be some very vague scoping statement, to say it is
> not an EXI
> or JSON substitute, but in a similar family to XML and HTML. (On the
> rationale that
> making it easy to tart up existing (XML) parsers is a proven method of
> boostrapping.)
> But this is just my opinion.
>
>
>> >
>> > 1. The language should support non-modal parsing: at every point in a
>> > document, the parsing mode can be re-established by scanning forward
>> > without knowledge of prior context until a milestone is found.
>>
>> The second sentence does not expound upon the first. The use of tags
>> implies a modal parser - in-tag or outside-tag.
>>
>
> If it reads clearer to have a ';" rather than the ":", please consider
> that.
>
> But I think the second sentence does expound on the first: it gives an
> intended
> consequence of the non-modal parsing, though it does not define it.
> Lets consider a non-modal parser as one which can does not need to know
> the current state in order to parse.
>
> I think it is kinda the difference between these
> modal = B* ( "<" B* ">" B*)*
> non-modal = B* ("<" | ">")* B*
> Say we generate a string using "modal". Every substring of it does not also
> match "modal".
> But if we use that same generated string, then every substring of it should
> match "non-modal" (if I have it right.)
>
> More background might make my comment less confusing. It is sometimes
> often possible to unwrap a grammar
> that we might think should use a simple left-to-right state or stack
> machine as, instead, a series of simpler passes.
> Indeed, where some productions in a grammar are to be interpreted as
> longest-match-first (greedy) but others as
> shortest-match first, it may be the most straight-forward method. A good
> example of this is tokenizing: we may
> find the end of the token using one rule (e.g. whitespace) which then
> simplifies our parsing/lexing inside or with that
> token.
>
> I'll post a little example grammar separately to be more concrete, and seek
> out better terminology, perhaps
>
>
>> > In other words, [ "<" and ">"] must only ever be delimiters or
>> > part of delimiter strings.
>>
>> It's true that unquoted > makes some parsing techniques difficult -
>> when i added XML support to lq-text i used backwards parsing, and > in
>> text content confuses it. The answer was to ignore > and look only for
>> < though.
>>
>
> Yes, only "<" is strictly necessary. But the more that a parallel process
> has to look outside the block it is allocated to
> parse (or the more that it has too assign initial strings as unknown, to be
> reconciled by a stitch process) the less useful
> it is to have had parallel parsing in the first place. So knowing where
> data content begins (and that, say, our begining state
> is as part of an attribute), the better.
>
>
>> >
>> > 2. The language should support straightforward right-to-left parsing
>> > with
>> > the same ultimate result as left-to-right parsing.
>>
>> oops see above.
>> >
>> > 3. The language should support arbitrary streams of elements,
>>
>> the Jabber folks would have loved this.
>>
>> >
>> > 4. The language must support some significant extra features to XML,
>>
>> This, i think, is the crux of the matter - "we must add a killer
>> feature so that people want our system, even in a world in which data
>> transfer formats are not considered exciting."
>>
>
> If you are saying that there is nothing new under the sun, I cannot agree.
>
> If you are saying that you don't expect any new markup language to have
> nearly
> the same hype curve as XML and JSON, then I completely agree.
>
>
>> > It should attempt to do this by assigning meaning to existing
>> > lexical charactistics: these alternatives include that the empty-end
>> > tag
>> > versus a matched pair, or attribute values with no delimiter, or
>> > double
>> > quotes or apostrophe.
>>
>> Simon's xmlents did this two decades ago. Since the XML stack has
>> irregular escaping, you end up with problems when e.g. you want to have
>> a double qote inside a string in an XPath expression in an attribute.
>>
>
> Yes, XML compatability is a rock that many a good idea has foundered on.
>
> But 20 years ago, it made sense to make sure XML did not fragment;
> and a big selling point for it was that developers would be more productive
> if they didn't have invent new syntaxes for essentially the same thing.
>
> But now XML is well established, and JSON has
> relieved XML of the need to do that kind of datatyping.
> But haven't JSON and time also shown that though "terseness is of minimal
> importance"
> was a fine rule-of-thumb for figleafing the amputation of SGML of extra
> limbs and carbuncles, it is not actually a good principle in itself?
>
>
>> I think if i had to redo XML without backward compatibility constraints
>> i'd want to have a reliable escaping mechanism (even though, like XML
>> text entity references, you end up with yet another parsing mode).
>>
>
> Yes, modes are great! Scala is good for that, and now Java's """ text
> blocks,
> and I guess the poster boy for modes is RTF, where you could even have
> chunks using different character encodings inside the same file.
>
> But I suggest that modes preclude many and perhaps most parallel
> implementation methods,
> which has helped XML adroitly avoid most of the improvements to CPUs in the
> last 25 years. And a technology in that situation is not in a healthy
> place.
>
> Cheers
> Rick
>
>>
>>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]