XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Brain teaser: how to use XML to describe a data formatthat contains illegal XML characters?

Hello, Roger:

I perk up when I see your posts on this list, because it usually means that you are about to nudge me to think.

On 2021-11-09 08:09, Roger L Costello wrote:

Hi Folks,

This is kind of a neat problem.

The technology called DFDL (Data Format Description Language) is used to describe data formats, both text and binary data formats. DFDL builds on top of XML Schema-XML Schema "hosts" DFDL in a way similar to how XSLT "hosts" XPath.

Some data formats that we want to describe contain characters that are not allowed in XML, which means they are not allowed in XML Schema since XML Schema is XML. For example, we might want to use DFDL to describe a binary data format that contains null-terminated strings. The null symbol (hex 0) is not allowed in XML, so how do we describe the data format?
It turns out that this situation is pretty routine in designing data formats and programming languages. You often need to have a way of representing something, using a format or language, which is not allowed in the literal characters or components of that language.


One way to resolve this problem is to use a special string-with printable characters-that denotes the forbidden character. For the NUL character, the special string is NUL. To identify it as a "special string" we precede it with a percent symbol and follow it with a semicolon:

%NUL;

Applications written to process DFDL schemas are expected to recognize that %NUL; denotes the NUL character and replace it with the actual NUL character.
Precisely. One name for this is a "representation", in contrast to "literal data".

But you sometimes need to be able to represent the text which makes up the representation of the special thing, without it being treated as the special thing.  Suppose I want to write a sentence, "Roger wrote a post about %NUL; today". How is software ingesting this text able to recognise that %NUL; does not denote the NUL character here?

The answer is a representation for the representation marker. This is sometimes referred to as "escaping the escape character".

Suppose we rephrase your rule above to be: to identify a "special string", we precede it with a single percent symbol and follow it with a semicolon. The special string is forbidden to begin with a percent symbol or to contain a semicolon. And, to identify a literal percent symbol, we precede it with a percent symbol. (Notice how the rules are a bit more elaborate? But the elaborations make up an often-used design pattern.)

We can now write my sentence as, "Roger wrote a post about %%NUL; today". Software ingesting this text is to recognise the "%%" as a literal percent symbol. Then the "NUL;" is not preceded by the special sequence, so it is just literal data instead of a special string.

One extension of this concept is the Quine, or self-reproducing program. See the Wikipedia article at <https://en.wikipedia.org/wiki/Quine_(computing)>. The key insight is that a string of text in a computer program can be interpreted both as literal text and as a representation of software code. Writing and reading quines can be great fun.

One very good, and famous, essay on the implications of this concept is Ken Thompson's talk, "Reflections on Trusting Trust", which he gave when accepting his Turing Award in 1984. You can find it at several places on the web, including at <https://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf>. The message is that your compilers contain logic to turn representations into code, and some of that logic may be invisible to you even when reading the source code of the tool. Thus you have to trust the people who made the tools. This is profound.

I know that for many subscribers to this list, everything I wrote here is very old hat. Many of you can explain it much better than I. But Roger, from your message I guessed that maybe you are not familiar with this. I invite you, and anyone else who is not familiar: explore quines! Read "Reflections on Trusting Trust"! They are treasures. Your mind may be blown.

Enjoy,

    —Jim DeLaHunt, software engineer, Vancouver, Canada




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS