| [Thread Prev]
| [Thread Next]
| [Date Next]
| [Thread Index]
RE: [xml-dev] What is this principle called: "I' send data in myUOM and youconvert it, as needed, to your UOM"?
- From: "Cox, Bruce" <Bruce.Cox@USPTO.GOV>
- To: "W. E. Perry" <firstname.lastname@example.org>, XML DEV <email@example.com>
- Date: Mon, 19 Sep 2011 17:53:54 -0400
I'm replying to Walter's comment because he addresses a timely issue for the industrial property information community. Specifically, is it reasonable for a patent office to alter the numerical values in a patent application to conform to a particular standard for scientific notation (most likely SI)?
Which brings me to suggest an answer for your question, Roger. I think the use case you describe (RMIR as Mr. Salz put it) is an instance under Interoperability generally. Banks have long since solved it (at least, it appears to me that they have), but IP offices haven't needed to. Now that patent applications are increasingly electronic, and being converted to XML or (rarely) submitted in XML, and examiners are given more and more sophisticated tools that can read an equation or formula and actually operate on it, the issue comes to the surface. The domain is far broader than what banks have to deal with, and recovery from any failure could be challenging.
However, the danger that I see is to focus too much on the immediate consumption of the information. Interoperability among banks for currency transactions might be limited to a fairly short period of time (I suppose), say, the duration of a particular exchange rate value. Interoperability for the numerical values in a patent application, possibly useful for specialized tools in examination, must also support uses throughout the life of the right (20 years) and beyond (the concept of prior art has no time horizon, as far as I know).
While we have attempted over the past ten years to assure the immediate utility of tagged content in automated systems, we try to look beyond that to long-term future uses as well. When deciding what to tag, and how to represent the values, we try to let the content speak for itself, rather than take too much account of the next step in the pipeline. This is clearly an act of faith, in that we hope future users will be better served by interfering as little as possible with what applicants said. As the hierarchy of the electronic representation of the written word (glyphs, characters, code points, encodings, word sizes, big-little ended-ness, etc.) evolves, something must remain unaltered to ensure that the message gets through.
A perhaps trivial example: shall we encode numerical dates as YYYYMMDD or as YYYY-MM-DD? If the former, then the content must be processed before any human could be expected to read it. If the later, then no processing is needed for most folks these days to read it and understand it. In either case, ASCII sort order coincides with chronological order, a non-trivial side effect. When it comes to representing numerical values in the context of a purportedly novel physical process, the risk of turning content into garbage increases substantially.
This is somewhat like the debate between controlled vocabularies and keyword indexes. What gives a better picture of the content, the value assigned by an indexer from a pre-defined and negotiated vocabulary, that might never have been uttered by the author, or the words the author actually used and perhaps invented? Is the answer different for science than it is for poetry? Why?
Interoperability is, possibly, shorthand for effective communication among machines and the entities that they model. We can negotiate contracts for short-term interoperability, and their provisions should be as detailed as the parties are willing to fund and implement. For the long term, however, it appears to me that an effective answer is an institutional answer, a role embodied in a culture; a charter, if you will, committed to certain principles that will vary between commercial and social institutions. Someone has to stand guard over the interests of the as yet unborn, undiscovered, unexploited, or unconstructed consumers, a role that is possibly not unique to IP information.
Bruce B Cox
Director, Policy and Standards Division, OCIO, USPTO
This message represents my personal views only and does not represent the official policy or views of the USPTO or any other IP office.
From: W. E. Perry [mailto:firstname.lastname@example.org]
Sent: 2011 September 9, Friday 11:32
To: XML DEV
Subject: Re: [xml-dev] What is this principle called: "I' send data in my UOM and youconvert it, as needed, to your UOM"?
I'm here, reading xml-dev as I have pretty regularly these 13+ years.
I'm not sure that there is an established or accepted name for the principle which you describe. I refer to it as "In markup, you are what you produce, not what you consume." In the example which you give, the system producing an output of "1/3 meter" cannot know what other systems might consume that data, nor for what purposes, and certainly cannot be responsible for unknown systems getting data in a form or to a precision which they require. The 'contract' in markup is that each system produce--and in the usual case, publish--output in a form which is knowable and identifiable with that producing system, however idiosyncratic. Other systems which need to consume that data, for whatever purpose, are responsible for fetching it and then instantiating it in whatever form they require internally. This is the converse of the object-oriented interface, where the 'contract' is "if you pass me the precise data structure which I require, I promise to execute my particular operations upon it". The principle in markup is that the consumer is responsible for the instantiation of the particular data which it requires (and in markup there is *always* a parse and a particular instantiation required--no non-trivial system consumes markup directly). It is not the input interface which is public, but rather the output. Thinking this way can be difficult, and particularly so if you hope to find interoperability inhering in the object, or in whatever is the text or data exchanged between systems. In markup the particular instantiation of data is a local and a private matter, precisely because such interoperability as we might achieve is based on parsing a general form into a specific one. The particular, fully-specified form is local to each system; what systems exchange is a more general form which may be locally instantiated into widely differing particulars.
"Costello, Roger L." wrote:
> Hi Folks,
> Suppose that every member in a community is required to exchange data using one unit of measure. For example, suppose length is required to be expressed as a decimal value in centimeters. Consider a system that internally deals with length measurements in meters. Suppose the system wishes to exchange the value "1/3 meter." Recall that the system is required to exchange values in decimal centimeters. However, there is no exact conversion of "1/3 meter" to centimeters. Should the system send the value 33 cm, 33.3 cm, 33.33333 cm? In the general case, the sending system has no insight into the precision required by recipients. Exchanging the wrong amount of precision could be catastrophic. There is a principle that a sender should transmit its original data (e.g., 1/3 meter); it is up to recipients to convert the data, if necessary, to the precision they require.
> What's the name of the principle? I remember someone on this list
> talking about it long ago. Perhaps Len? Perhaps Walter Perry? (BTW,
> anyone know the whereabouts of Walter?)
| [Thread Prev]
| [Thread Next]
| [Date Next]
| [Thread Index]