XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Are multi-language languages unique to the XML familyof languages?



On Sun, 6 Mar. 2022, 10:02 Roger L Costello, <costello@mitre.org> wrote:
Hi Folks,

Consider the C language. It is one language. It doesn't use (host) other languages. Consequently, it is relatively straightforward to create a single grammar for the C language. Once the grammar is created, a robust parser can be created.

Ditto for every other programming language.

No.   

It is common to embed other query languages in C, Java, etc:  SQL, Regex, Xpath, URLs, even shell commands. Common practise for 40+ years.  And almost every language has conventions for formatted strings with variable substitutions. 

In fact nowadays languages --as used-- are *less* standard than ever: if you use C or Java or most general purpose languages, you may well be using annotations or other extensions and preprocessors.  And you may be generating _javascript_, CSS, JSON, HTML and so on, but be developing in Typescript, Less, ExtJSON, templated HTML. Angular is a good example of this. 

In fact, all the little standard languages that prided themselves on idealistic "simplicity" almost immediately degenerated into, in effect, mutually incompatible dialects as different pre-processors, mostly doing the same things, proliferated.  Why don't we just call a spade a spade: omitting support for the things needed for basic software engineering does not make a little language "minimalist", it makes it "fatuous". For all our YAintGNI bravado, it always turned out YAreGNI.

The exception? XML, where there was less need to invent annotations (attributes or PIs), server side includes (general entities), simple macros, templates (attributes or PIs), let alone comments (comments).  And where many of the more complex preprocessings could be done in ubituitous XSLT. 

In traditional SGML terminology, I think Roger is saying that because XML schema languages have abandoned any idea of "NOTATION"  for attribute values or data content, there is no standard way either in markup or in schemas to label strings according to their little language. So there is no standard "seamless" way for an IDE to be able to colour that text or specially validate it.  (This is the same issue as not knowing if a qname in content is content, b.t.w.)  

So Roger cannot create a robust parser without some non-standard information. Not just to parse some embedded notation, but even to know which data is of that notation. 

(IDEs solve this in 2 ways: first, standard libraries such as string formatting libraries are treated as special cases and built in; second, annotation processors allow extra information that plugins can use. )

So to Roger's questions: no, embedding little languages is ubiquitous; yes, XML does not have good support for a parser to know what embedded notations are used and where; and I think this was an own-goal that unnecessarily reduced the utility of XML in favour of the things it is not particularly good for: sending labelled simple strings around. 

<Digging up the dead>
I recall raising this issue of embedded notations in data at the W3C XML Schema discussions. I recall two answers: "whatever the answer is, it isn't notations"  and "you cannot do anything without types" (I.e. if you have data types and lists/unions etc that exhausted the useful things an XML document needed: i.e. an XML document was a serialization of a database or object only.)   ...As I understand it, most XSLT is still done without types (if we ignore the extra @as'es that XSLT2+ demands we put in) which puts paid to the second comment?

I do have some sympathy for the view that syntax is not a schema issue (though wouldn't that kinda undercuts all lexical datatyping? ... nevermind) and so notation identification is some other kind of declaration: e.g.
   <query  XML:NOTATION="SQL">SELECT x FROM y</query>
or 
    <query><? XML:NOTATION name="SQL"?>SELECT x FROM y</query>
or
     <thing  query="&lt;?XML:NOTATION name='SQL'?>SELECT x FROM y" />
or
      <?XML:NOTATION path="query" name="SQL" ?>
       ...
       <query>SELECT x FROM y</query>

One thing XML DTDs ditched was the ability to declare that an attribute value is to be treated not as data but as a PI. XML Schemas followed this. So there was no standard way for a parser to know this was some special information (let alone resolve the target.)  So XML is in this situation where the decision not to treat PIs and NOTATION as first class citizens has reduced its applicability, or rather, provides no help for parsing embedded little languages using standard declarations.

I would have liked something like this, where the first token is some simple target like other PIs:
 <!ATTLIST thing query PI #REQUIRED>
...
<thing query="sql SELECT x FROM y" />
Just enough to know the value is special, and to hint what it is. But many other approaches are fine.

Regards
Rick


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS