[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Developing open business information exchange documents
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: u123724 <u123724@gmail.com>
- Date: Thu, 23 Feb 2017 13:23:10 -0500
At 2017-02-23 17:57 +0100, u123724 wrote:
> So ... the model is separate from the syntax. CCTS is standalone
as a modeling tool.
> A user community chooses which NDRs to use to go from the model
to the syntax.
That makes sense; however, while I don't know the situation with UBL
specifically, in said address consolidation project I came across
established postal address representation schemes for XML making heavy
use of their own ad-hoc "meta" systems in XML (eg. as in <field
role="this">that</field>) which I always found to have a smell.
As a developer, I would have problems with that as well. If nothing
else than if the role attribute were misspelled and I wasn't using a
constraint language such as RELAX-NG to constrain attribute spellings.
The goal our group is trying to achieve is the unambiguous
representation of semantic information in a document syntax
transmitted between different parties with different internal data models.
> Personally, I'm growing to accept that an XML schema expressed entirely
> of element content and no mixed content doesn't have benefits
over a JSON schema.
That is along the lines as discussed last week here. My point was to
get an opinion of whether it's accepted that there is
- intrinsic semantic information in the formatting/presentation of
text getting lost in component representation ("fielded addresses")
unless going to extremes in preserving sequence of eg. lines and
individual tokens or vertical alignment information
Personally, I would worry about intrinsic information being properly
interpreted from presentation by a program (fewer problems for
humans). I think "hidden" can be another word for some "intrinsic"
information.
One of the UBL content rules (layered on top of the syntax rules) is
that no element is allowed to be empty, and the absence of an element
cannot be considered a semantic signal of any kind. This would
hopefully suppress any desire for intrinsic content:
http://docs.oasis-open.org/ubl/UBL-2.2.html#S-EMPTY-ELEMENTS
- a benefit in having a document as a palpable/material manifest for a
business transaction in the digital age
Absolutely! A principle (as opposed to an explicit content rule) in
UBL is that all values must be manifest, and it is cited from the
content rule above:
http://docs.oasis-open.org/ubl/UBL-2.2.html#S-MANIFEST-VALUES
These points don't focus so much on formal, grammatical, or logical
arguments as they accept a cultural "humanist/antropocentric" view
about technology and information modeling.
Agreed. Which is why we have these explicit considerations in UBL
regarding intrinsic information so that feelings don't get involved. :{)}
Also, a theoretical point could be made that, since almost any
business process requires the identification of parties/persons
(natural or otherwise), and hence address data, a data serialization
format should be equipped with semistructured data modelling
capabilities.
Hmmmmmmm ... I don't agree. I mentioned the two extremes offered in
UBL addresses of fully structured or fully unstructured
representations. The unstructured representation for human eyes can
read "Switzerland" and the structured representation for application
use can read "CH" which can be validated against the ISO code list
for country codes. Using semi-structured content to markup the
country can introduce challenges for the programmer if they have to
deal with, say, spelling mistakes that cannot be pre-validated. Or
the challenge to express the constraint that the country code is
mandatory in a semi-structured address (fine for RELAX-NG but not XSD
without assertions).
NEIM is similar to CCTS in that it has element-only content only and
not any mixed content, and the arguments are similar:
https://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/NIEM-NDR-3.0-2014-07-31.html#section_6.2.7
https://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/NIEM-NDR-3.0-2014-07-31.html#section_9.1.3.1
In UBL there is no such thing as a paragraph. There are description
fields, but the cardinality is for language, not for the concept of
paragraphs. There is no intrinsic interpretation of line-breaks as
paragraphs in the one language description field. If information
needs to be formatted in paragraphs, say a tendering document, then
the user uses some other suitable representation and embeds it or
refers to it as a blob as in:
http://docs.oasis-open.org/ubl/os-UBL-2.1/mod/summary/reports/UBL-AllDocuments-2.1.html#t-CommonLibrary-2040
If a fully structured representation is not available, then certainly
semi-structured is better than no structure at all. It provides some
semantic identification of the components, without validation, but
that's okay because there is no validation of unstructured addresses either.
I think it is a bit strong to say a serialization format *should* be
equipped with semi-structured capabilities. It depends on the
data. Absolutely for a repair manual, but I think not for an address
if fully-structured addresses are available. Mixed content adds a
layer of complexity ... and how would I express mixed content in JSON?
Horses for courses. Use the fully structured address for machine
processing and the fully unstructured address for human
consumption. Leave semi-structured mixed content for
presentation-oriented information.
. . . . . . . Ken
--
UBL introduction lecture - Exchange Summit - Orlando, FL - 2017-04-24 |
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/x/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @ US$45 (5 hours free) |
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]