TEI hero Len Burnard from Oxford Uni said something like "every dtd represents a theory about the document". (Credible schema developers/Content Architects need a good knowledge of major different approaches IMHO, otherwise the implicit theory will be "whatever comes to mind first is good enough".) See Len's ppt Myths and Realities.
Some domains do have quite good theories: for example, that complete news story needs to have a who, what, when, where and how or why. Yesterday at the White House, Mr Obama eased relations with Cuba by executive order, because people need to smoke more cigars. But that kind of theory does not lend itself to regular grammars well: every document needs at least one "who" element, but it could be anywhere in the document... Where you have domain-specific elements embedded into general paragraph-oriented text markup, you need a second-level schema for the domain requirements.
Another source of theory is Standards. For example, iso9126 gives abstract software quality characteristics: you could use those in, say, a schema for software requirements.
Of course, there are various published generic approaches to data and document analysis. Maler and el Adoloussi's Information Units for example; my cookbook/reuse aproach.
But are you asking the wrong question? Do we need theories of data domains, or do we more need theories of developers? Should we be asking: how should documents be marked up so that developers wont make mistakes, or can quickly understand what is meant, or can quickly drill into the data, or dont need to do lots of transformations/joins in the typical or worst case? You only have data because you have software.
Cheers
Rick
Hi Folks,
Is there a theory of data? I don't mean things like relational database theory. I mean theories for domains of data.
For example, is there a theory for Book data? Is there a theory for Cellphone data? Is there a theory for Location data? Is there a theory for Aircraft Flight Procedures data?
Assertion: in any area of endeavor, having an underlying theory is important. Without an underlying theory there are no guides to implementations and the developers are left in charge, making stuff up.
Example of an area of endeavor that has benefitted enormously from a rich body of theory: The benefit of using a (context-free) grammar to define a language is there is a slew of grammar checkers/parser generators (LL(1), LALR(1), GLR, etc.) that will tell you if there is an analysis algorithm and if not why not. If you step outside the Formal Language grammars you're basically on your own (flailing in the wind, making stuff up).
Suppose that I want to create an XML vocabulary for, say, Aircraft Flight Procedures. As just noted, regardless of what XML vocabulary I might choose, I can express the structure using a (context-free) grammar and thereby benefit from the theory of grammars to implement a parser for the Aircraft Flight Procedures data. But what about the data itself, is there theory to guide me in the development of a consistent and complete set of data for Aircraft Flight Procedures?
What would a theory of Aircraft Flight Procedures look like? How would such a theory guide in the development of a set of data and a data model for Aircraft Flight Procedures? Should a theory of Aircraft Flight Procedures data be developed before embarking on the creation of a data model (an XML vocabulary)?
Can we turn the creation of data models and XML vocabularies into a science?
/Roger
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php