I'm writing a schema to
represent documents which are programs in a particular programming language. The most useful part of the schema would
represent the "abstract syntax" of the program, essentially a
canonical form of the program stripped of concrete syntax and sugar. But can anyone give guidance on accepted
or conventional ways to include the specific syntax/sugar in the schema in a
way that keeps it separate somehow?
For example, consider the code-fragment :
If (x<0)
++x;
else
--x;
Something like
<if-then-else-statement>
<block>
<unary-op
type="increment" arg="x"/>
</block>
<block>
<unary-op
type="decrement" arg="x"/>
</block>
</if-then-else-statement>
could represent
the abstract syntax of the fragment, but doesn't capture the actual
syntax with the programmers choices for indenting, etc. The specific operator syntax such as "++"
can be generated with XSLT because it is part of the language, but what about
the whitespace, indenting and linebreaks? Where should they appear in the XML of
the fragment? Having two separate
schemata, echoing the program and its parse-tree, seems wrong. XML should be able to annotate the parse-tree
with the syntactic specifics somehow.
Any advice would be appreciated.
-Allen