OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Developing open business information exchange documents

At 2017-02-23 17:57 +0100, u123724 wrote:
> So ... the model is separate from the syntax. CCTS is standalone as a modeling tool.
> A user community chooses which NDRs to use to go from the model to the syntax.

That makes sense; however, while I don't know the situation with UBL
specifically, in said address consolidation project I came across
established postal address representation schemes for XML making heavy
use of their own ad-hoc "meta" systems in XML (eg. as in <field
role="this">that</field>) which I always found to have a smell.
As a developer, I would have problems with that as well. If nothing else than if the role attribute were misspelled and I wasn't using a constraint language such as RELAX-NG to constrain attribute spellings.

The goal our group is trying to achieve is the unambiguous representation of semantic information in a document syntax transmitted between different parties with different internal data models.

> Personally, I'm growing to accept that an XML schema expressed entirely
> of element content and no mixed content doesn't have benefits over a JSON schema.

That is along the lines as discussed last week here. My point was to
get an opinion of whether it's accepted that there is

- intrinsic semantic information in the formatting/presentation of
text getting lost in component representation ("fielded addresses")
unless going to extremes in preserving sequence of eg. lines and
individual tokens or vertical alignment information
Personally, I would worry about intrinsic information being properly interpreted from presentation by a program (fewer problems for humans). I think "hidden" can be another word for some "intrinsic" information.

One of the UBL content rules (layered on top of the syntax rules) is that no element is allowed to be empty, and the absence of an element cannot be considered a semantic signal of any kind. This would hopefully suppress any desire for intrinsic content:


- a benefit in having a document as a palpable/material manifest for a
business transaction in the digital age
Absolutely! A principle (as opposed to an explicit content rule) in UBL is that all values must be manifest, and it is cited from the content rule above:


These points don't focus so much on formal, grammatical, or logical
arguments as they accept a cultural "humanist/antropocentric" view
about technology and information modeling.
Agreed. Which is why we have these explicit considerations in UBL regarding intrinsic information so that feelings don't get involved. :{)}

Also, a theoretical point could be made that, since almost any
business process requires the identification of parties/persons
(natural or otherwise), and hence address data, a data serialization
format should be equipped with semistructured data modelling
Hmmmmmmm ... I don't agree. I mentioned the two extremes offered in UBL addresses of fully structured or fully unstructured representations. The unstructured representation for human eyes can read "Switzerland" and the structured representation for application use can read "CH" which can be validated against the ISO code list for country codes. Using semi-structured content to markup the country can introduce challenges for the programmer if they have to deal with, say, spelling mistakes that cannot be pre-validated. Or the challenge to express the constraint that the country code is mandatory in a semi-structured address (fine for RELAX-NG but not XSD without assertions).

NEIM is similar to CCTS in that it has element-only content only and not any mixed content, and the arguments are similar:


In UBL there is no such thing as a paragraph. There are description fields, but the cardinality is for language, not for the concept of paragraphs. There is no intrinsic interpretation of line-breaks as paragraphs in the one language description field. If information needs to be formatted in paragraphs, say a tendering document, then the user uses some other suitable representation and embeds it or refers to it as a blob as in:


If a fully structured representation is not available, then certainly semi-structured is better than no structure at all. It provides some semantic identification of the components, without validation, but that's okay because there is no validation of unstructured addresses either.

I think it is a bit strong to say a serialization format *should* be equipped with semi-structured capabilities. It depends on the data. Absolutely for a repair manual, but I think not for an address if fully-structured addresses are available. Mixed content adds a layer of complexity ... and how would I express mixed content in JSON?

Horses for courses. Use the fully structured address for machine processing and the fully unstructured address for human consumption. Leave semi-structured mixed content for presentation-oriented information.

. . . . . . . Ken

UBL introduction lecture - Exchange Summit - Orlando, FL - 2017-04-24 |
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/x/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @ US$45 (5 hours free) |

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS