Re: [xml-dev] {mark} - a new simple notation that unifies JSON andXML

XML.org

XML.org

FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] {mark} - a new simple notation that unifies JSON andXML

From: Henry Luo <henry@perpetuatech.net>
To: Michael Kay <mike@saxonica.com>
Date: Fri, 26 Jan 2018 14:37:21 +0800

Just to elaborate a bit more on the issue:

Imagine element <price currency="USD">5.00</price> goes through a transformation process to give some output:

should it be {span 'Price' ':' ' ' 'USD' '$' 5.00 null ''}, i.e. keep the items as they are without normalization?
or should it be {span 'Price: USD $5.00'}, i.e. normalized as in XML/HTML/Mark (with numbers and booleans converted into string, null value removed, and string merged, ...)

Content normalization is a process common in HTML/XML, people/developer must be already used to it, e.g. when using XSLT/XQuery.
If we throw away content normalization, people/developer might feel awkward, and might be prone to error, as developer might already be assuming on content normalization.

So the question is: should we stay with XML/HTML's content normalization process? or should we go with the more orthogonal approach?

One more possibility is to leave it to the user, i.e. take in a parameter during parsing/content construction, whether to normalize the content or not.
But again, giving such an option without enforcing a convention, could lead to confusion.

Current Mark's thinking is that: if your usecase needs a generic array, put it in property, if your usecase needs a mixed content model, put it in object content.
It might not be ideal, but probably more practical.

Anyone, any further thoughts?

Regards

Henry

On Fri 26/1 1:37 PM, Henry Luo wrote:

Hi Michael,

Thanks for sharing your thoughts on the specific issue of content model for element/object.

Here are my thoughts on this specific issue.

There are two options here, each with its pros and cons.

Option 1: Orthogonal approach, allows any value in element/object content, just like for property value

Very orthogonal design, simpler in syntax and data model;

Keep the door open for future 'innovative' usecases;

Option 1: Non-orthogonal approach, as currently defined in Mark (or similarly in XML, HTML)

Current Mark's content model for element/object is primarily designed for or optimized for mixed-content usecase;

There are few important differences of the element content model vs. generic array:

null values are stripped;

no direct nesting of array; (Mark flattens an array when constructing object content)

consecutive strings are merged into one;

do not allow number and boolean primitive values;

So although on the surface, the element content might look like an array, but there are subtle differences. And may require very different tools/API to work with them;

e.g. we are already used to use CSS selectors to query the content model, and it requires content normalization that is not usually done on array data

Among the 4 differences, I think the last one is up to discussion. But if we also give up the first 3 restrictions, then it could make the primary mixed-content usecase awkward, and people might as stay with their current non-orthogonal HTML or XML format.

Current Mark's design is more on a conservative approach on what is allowed in the content model.

Anyone, any further thoughts?

Regards

Henry

On Fri 26/1 2:39 AM, Michael Kay wrote:

That is also a concern that I have when designing Mark. Should it be generalized to allow any value in content? Do we have any solid usecase for storing number and boolean in content? They are probably not needed for normal mixed content usage.

I don't think the absence of a use case should ever be used to justify lack of orthogonality. If design were driven solely by use cases, orthogonality would go out of the window. The design process should aim to produce the most elegant/orthogonal design that satisfies all the use cases, it should not aim to satisfy the known use cases and nothing else.

Or to put it another way: find the space with the smallest boundary that has all the use cases within the space, where the "boundary" is the boundary between things that the system handles and things that it doesn't.

(An extreme example I often use to illustrate this principle: arithmetic expressions X+Y should allow either operand to be a literal zero, even though there is no use case for adding zero to a number, because a specification that allows literal zero is smaller than one that doesn't).

The lack of orthogonality between attribute values and element content was in fact one of the design mistakes in XML that I was trying to correct.

I know XML Schema allows primitive values as element content. But personally, I don't like that, and prefer to use attribute for those usecases.

Why should personal preferences come into it?

You might prefer <price currency="USD" amount="5.00"/> to <price currency="USD">5.00</price>, but others don't.

Michael Kay

Saxonica

References:
- {mark} - a new simple notation that unifies JSON and XML
  - From: Henry Luo <henry@perpetuatech.net>
- Re: [xml-dev] {mark} - a new simple notation that unifies JSON and XML
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] {mark} - a new simple notation that unifies JSON andXML
  - From: Henry Luo <henry@perpetuatech.net>
- Re: [xml-dev] {mark} - a new simple notation that unifies JSON and XML
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] {mark} - a new simple notation that unifies JSON andXML
  - From: Henry Luo <henry@perpetuatech.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS