Re: [xml-dev] Please stop writing specifications that cannot be parsed/p

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Please stop writing specifications that cannot be parsed/processed by software

From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: Dimitre Novatchev <dnovatchev@gmail.com>,Michael Kay <mike@saxonica.com>
Date: Sun, 04 Jun 2023 08:01:57 -0400

At 2023-06-03 18:52 -0700, Dimitre Novatchev wrote:

> For example, the XPath function library is defined in an XML document�
> that contains all the function signatures in a custom vocabulary�
> reflecting the object model for XPath functions, and that data is extremely useful;
> it can be used for example to create the data used by a type-checker.
> I'm sure there are cases where an XML format can be standardised
> across a wide range of specifications (for example, a format for defining BNF grammars)
> but I'm sure that highly specialised custom formats also have a role to play.
>�
> Of course in our own community we're very prepared to eat our own dogfood in this way. Getting people to use a similar approach when they're writing safety standards for industrial�
> washing machines is a different kettle of fish. Those guys just click on the word processor icon and start typing.

To this day I have been often wondering where to find the XML Schema for this type of document. Or is it a secret?

"XML Schema? I don't need no stinking XML Schema!"

Isn't that the big draw of XML over SGML?

Mind you, I was only a user of the W3C documents expressing the XML standards in XML, so I didn't have to worry about being constrained to create that content.

But certainly writing the stylesheets that converted the specifications of XSLT and XSL-FO and their content models into my own book's document model (which *did* have its own DTD, since I was creating content) didn't rely on having a document model for the specifications.

Empirical examination of content was all I needed:

<proto role="example" name="eg:if-empty" return-type="xs:anyAtomicType*"
returnEmptyOk="no" isSpecial="yes" returnSeq="no" returnVaries="no"
isSchema="no" isDatatype="no" isOp="no">
<arg name="node" type="node()" emptyOk="yes"/>
<arg name="value" type="xs:anyAtomicType"/>
</proto>

<prod num="64" id="prod-xpath-ElementTest">
<lhs>ElementTest</lhs>
<rhs>"element" "(" (<nt xmlns:xlink="http://www.w3.org/1999/xlink";
def="prod-xpath-ElementNameOrWildcard"
xlink:type="simple">ElementNameOrWildcard</nt> ("," <nt
xmlns:xlink="http://www.w3.org/1999/xlink"; def="prod-xpath-TypeName"
xlink:type="simple">TypeName</nt> "?"?)?)? ")"</rhs>
</prod>

A reference to this and a few examples will be greatly appreciated.

I never felt the need to go looking for a document model when I could correlate what I saw in the markup with what I saw on the formatted results published in HTML.

I acknowledge that the NISO-STS document model doesn't (yet!?) have models for BNF or other formal specification grammars, unless one shoehorn's such using generic named-content semantic constructs. But that is because it is leveraging JATS, which itself doesn't have such.

For me, using such "hi-tech" language in order to specify what you want to say and be understood, has always seemed an unwanted and unnecessary obstacle in the specification-creation process -- one that stifles the author and digresses him elsewhere -- not where the focus of the main topic is.

I envy GitHub authors who only have to use MD, and can easily produce stunning documents.

"Stunning" to the eye, I agree. And that is what one gets with NISO-STS off-the-shelf: stunning to the eye.

But you get out what you put in, and the XML specification writers put in a helluva lot of effort into marking up the documents they were responsible for, and users like myself could leverage such quickly and effectively without the need for a document model.

And so with NISO-STS where some of my clients are leveraging the semantic constructs for requirements markup that is then handled downstream well after the publication process.

If someone needs so much strict structure, please use ChatGPT or iXML --

(there are people working on it... stay tuned)

I hope this is considered helpful.

. . . . . Ken

but please, behind the scenes, where these do belong.

Thanks,
Dimitre

On Thu, May 25, 2023 at 5:03 PM Michael Kay <<mailto:mike@saxonica.com>mike@saxonica.com> wrote:
>no-one has to invent something new to get what you are asking for

But if you're prepared to invent something new then you can probably do better...

For example, the XPath function library is defined in an XML document that contains all the function signatures in a custom vocabulary reflecting the object model for XPath functions, and that data is extremely useful; it can be used for example to create the data used by a type-checker. I'm sure there are cases where an XML format can be standardised across a wide range of specifications (for example, a format for defining BNF grammars) but I'm sure that highly specialised custom formats also have a role to play.

Of course in our own community we're very prepared to eat our own dogfood in this way. Getting people to use a similar approach when they're writing safety standards for industrial washing machines is a different kettle of fish. Those guys just click on the word processor icon and start typing.

Michael Kay
Saxonica

> On 26 May 2023, at 00:20, G. Ken Holman <gkholman@CraneSoftwrights.com> wrote:
>
> Roger, already standards from ISO and CEN are being published in NISO STS XML:
>
> <https://www.niso-sts.org/>https://www.niso-sts.org/
>
> And there some SDOs (Standards Development Organizations) that are building requirements into their STS XML so they can be harvested downstream after publishing by requirements management software tracking, for example, "may", "shall", "should", etc.:
>
> <https://www.ncbi.nlm.nih.gov/books/NBK556169/#holman-semantics2>https://www.ncbi.nlm.nih.gov/books/NBK556169/#holman-semantics2
>
> I commend that paper I wrote regarding the identification of semantics (say, of requirements) in standards content.
>
> I've co-founded a company in Ireland that is servicing the standards development community of SDOs with software that is publishing these richly-encoded XML documents into PDF, HTML, and DOCX:
>
> <https://RealtaOnline.com>https://RealtaOnline.com
>
> Moreover, SDOs are looking to us to enrich their XML and we are experimenting with AI in this regard. Exciting stuff.
>
> I'm delivering a presentation at JATS-Con 2023 you may wish to attend to learn more about how Réalta Online is using standards such as XSLT and XSL-FO to enrich and publish standards with fidelity across output products:
>
> <https://jats.nlm.nih.gov/jats-con/2023/schedule2023a.html#1-1145>https://jats.nlm.nih.gov/jats-con/2023/schedule2023a.html#1-1145
>
> So I think all that is needed is an awareness campaign to make standards writers and SDOs aware that the technology exists already. We don't have to wait to be able to do what it is you are asking. It just has to be done with the tools at hand.
>
> And not just for ISO and CEN standards. Hundreds of SDOs exist out there publishing thousands of standards documents. Please spread the word about NISO STS XML and the leverage they can get by adopting something that exists ... no-one has to invent something new to get what you are asking for.
>
> I hope this is helpful.
>
> . . . . . . . Ken
>
> At 2023-05-25 19:57 +0000, Roger L Costello wrote:
>> Dear Specification Writer,
>>
>> Please stop writing specifications that cannot be parsed/processed by software. Please stop formatting your specifications as Word and PDF. Instead, use a format that is amenable to machine processing. The XML format is ideal. We want to analyze your specifications. We don't want to spend dozens of hours screen-scraping your Word/PDF documents.
>>
>> If you simply must persist in writing Word/PDF documents, then please write in a consistent way so that we can screen-scrape without having to write special case code. To illustrate, in one of your specifications you provide a bunch of tables with data; each table has many rows. In some tables you reference a note. Here's a row with a note reference:
>>
>> 119 Approach Route (1) Note 1 5.7
>>
>> Here's another row with a note reference:
>>
>> 52 SID Ident (1) (Note 1) 5.78
>>
>> Why did you embed Note 1 within parentheses in the second case but not the first? That's an example of not being consistent. Such inconsistencies make it difficult to do screen-scraping. Please be consistent. If at all possible, write a parser to parse the data that you embed in your specification. This will immediately inform you of any inconsistencies.
>>
>> Thank you,
>> From the people who must read, understand, and analyze your specifications
>>


--
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/x/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @US$125 (5 hours free) |
Essays (UBL, XML, etc.) http://www.linkedin.com/today/author/gkholman |

Follow-Ups:
- Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>

References:
- Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
  - From: Dimitre Novatchev <dnovatchev@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]