Just to elaborate a bit more on the issue:
Imagine element <price currency="USD">5.00</price>
goes through a transformation process to give some output:
- should it be {span 'Price' ':' ' ' 'USD' '$' 5.00 null ''},
i.e. keep the items as they are without normalization?
- or should it be {span 'Price: USD $5.00'}, i.e. normalized as
in XML/HTML/Mark (with numbers and booleans converted into
string, null value removed, and string merged, ...)
Content normalization is a process common in HTML/XML,
people/developer must be already used to it, e.g. when using
XSLT/XQuery.
If we throw away content normalization, people/developer might
feel awkward, and might be prone to error, as developer might
already be assuming on content normalization.
So the question is: should we stay with XML/HTML's content
normalization process? or should we go with the more orthogonal
approach?
One more possibility is to leave it to the user, i.e. take in a
parameter during parsing/content construction, whether to
normalize the content or not.
But again, giving such an option without enforcing a convention,
could lead to confusion.
Current Mark's thinking is that: if your usecase needs a generic
array, put it in property, if your usecase needs a mixed content
model, put it in object content.
It might not be ideal, but probably more practical.
Anyone, any further thoughts?
Regards
Henry
On Fri 26/1 1:37 PM, Henry Luo wrote:
Hi Michael,
Thanks for sharing your thoughts on the specific issue of
content model for element/object.
Here are my thoughts on this specific issue.
There are two options here, each with its pros and cons.
Option 1: Orthogonal approach, allows any value in
element/object content, just like for property value
- Very orthogonal design, simpler in syntax and data model;
- Keep the door open for future 'innovative' usecases;
Option 1: Non-orthogonal approach, as currently defined in Mark
(or similarly in XML, HTML)
- Current Mark's content model for element/object is primarily
designed for or optimized for mixed-content usecase;
- There are few important differences of the element content
model vs. generic array:
- null values are stripped;
- no direct nesting of array; (Mark flattens an array when
constructing object content)
- consecutive strings are merged into one;
- do not allow number and boolean
primitive values;
- So although on the surface, the element content might look
like an array, but there are subtle differences. And may
require very different tools/API to work with them;
- e.g. we are already used to use CSS selectors to query the
content model, and it requires content normalization that is
not usually done on array data
- Among the 4 differences, I think the last one is up to
discussion. But if we also give up the first 3 restrictions,
then it could make the primary mixed-content usecase awkward,
and people might as stay with their current non-orthogonal
HTML or XML format.
- Current Mark's design is more on a conservative approach on
what is allowed in the content model.
Anyone, any further thoughts?
Regards
Henry
On Fri 26/1 2:39 AM, Michael Kay
wrote:
That is also a concern that I have when
designing Mark. Should it be generalized to allow any
value in content? Do we have any solid usecase for
storing number and boolean in content? They are
probably not needed for normal mixed content usage.
I don't think the absence of a use case should ever be
used to justify lack of orthogonality. If design were driven
solely by use cases, orthogonality would go out of the
window. The design process should aim to produce the most
elegant/orthogonal design that satisfies all the use cases,
it should not aim to satisfy the known use cases and nothing
else.
Or to put it another way: find the space with the
smallest boundary that has all the use cases within the
space, where the "boundary" is the boundary between things
that the system handles and things that it doesn't.
(An extreme example I often use to illustrate this
principle: arithmetic expressions X+Y should allow either
operand to be a literal zero, even though there is no use
case for adding zero to a number, because a specification
that allows literal zero is smaller than one that doesn't).
The lack of orthogonality between attribute values and
element content was in fact one of the design mistakes in
XML that I was trying to correct.
I know XML Schema allows primitive values as
element content. But personally, I don't like that,
and prefer to use attribute for those usecases.
Why should personal preferences come into it?
You might prefer <price currency="USD"
amount="5.00"/> to <price
currency="USD">5.00</price>, but others don't.
Michael Kay
Saxonica