Re: [xml-dev] Protocol Buffers

A few things should be noted.

The data format of .proto files is not the format used to exchange/persist data, but the source format from which the (binary) format used for exchange and storage is generated. Of course this is an arbitrary choice which has nothing whatsoever to do with the actual advantages of using the binary format. Of course one could easily define an XML format equivalent to the .proto format and compile the XML format, rather than .proto, into the binary format. Hence the comparison between XML and .proto (in its compiled form) is inadequate, because it is a comparison between some transformation output and an alternative transformation input. It's like comparing a cupboard with elem trees, emphasizing practical advantages of cupboards over elm trees.

We are informed that currently there are 48.162 message types defined across 12,183 .proto files. How to manage and monitor a data model repositor of that size? If the .proto files were in fact XML files, all kinds of analysis and consistency control would be very easy. For example, assuming the 12.183 .proto files were in fact XML files and located in a directory tree rooted in a directory /proto, a glossary of all .proto-defined item names could be obtained by the following XQuery expression:

sort(distinct-values(file:list('/proto', true(), '*.proto') ! concat('/proto/', .) ! doc(.)//*/local-name(.)).

Any tools allowing the same analysis in as terse a form using the actual .proto format?

So XML structured data are accessible to a power of expressing, evaluating and transforming information which is indeed remarkable. The data are gratuitously merged into a single space of interconnected information. Recognition of this peculiar effect presupposes a readiness to view XML not only as a data format, but as several things: an information model, an information processing model (consisting of the definition of expression kinds), technologies implementing the processing model, last and least, a syntax.

But in fact I find that .proto *is* XML. Or perhaps I should say tree-structured information is tree-structured information, same letter in two envelopes. Tree-structured information is what XML and its stack of technologies are about. All it takes to make the equation .proto=XML true (in an operational sense) is define an equivalent XML format (which is easy) and write a parser that parses .proto into XML. The XML power of expression and operation becomes immediately applicable to .proto. For example, the mentioned glossary is obtained, again, by an XQuery one-liner (assumuing the availability of an XQuery function proto:parse, which parses .proto data into XML):

sort(distinct-values(file:list('/proto', true(), '*.proto') ! concat('/proto/', .) ! proto:parse(.)//*/local-name(.))

(Note I just replaced the function call doc(.) by the call proto:parse(.).)

The most fruitful perspective I can imagine in this context is to shift the focus from syntax to information content, recognize that equivalent information content can be expressed in alternative syntax formats, and lose the best part of one's former interest in the relative merits of those syntax formats, because letters are more important than envelopes. You can start with the syntax you prefer for the particular usecase, and just enter the XML model when it is advantageous - namely when you would like to apply the power of XML technologies to structured information, be it XML, .proto, JSON, csv, and what not.

Hans-Jürgen Rennau

Arjun Ray <arjun.ray@verizon.net> schrieb am 23:21 Samstag, 13.Februar 2016:

On Sat, 13 Feb 2016 21:58:03 +0000, Peter Flynn <peter@silmaril.ie>
wrote:

| We have just not been very good about making that clear; in my field,
| largely because programmers glaze over when you talk about XML and
| documents.

Mainly because they had already been sold on the idea that XML was a
serialization format. The problem was to unsell them, and that would
have taken uncommon persuasion skills.

| On the other hand, I have documents exactly like this, with a markup
| payload of 500% of the document text and more, precisely because it adds
| value to the documents for their users far exceeding any minor
| inconveniences of pointy brackets getting in the way.

Yes, but compare that to the overhead in the example presented here:

http://blog.codinghorror.com/xml-the-angle-bracket-tax/

where the "value added", such as it could be, is risibly minimal. (For
a very good reason: the base information payload isn't a "document"
except by definitional legerdemain only.)

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php