Resolving XML's Unicode Incompatibility Problem Using"Positional Tagging

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Resolving XML's Unicode Incompatibility Problem Using"Positional Tagging"
From: Stephen Beller <sbeller@nhds.com>
To: noah_mendelsohn@us.ibm.com, 'Xml-Dev Listserv' <xml-dev@lists.xml.org>
Date: Tue, 26 Feb 2008 10:16:47 -0500
I've simplified my response to Noah's question about how to resolve the XML
Unicode incompatibility problem by transforming XML docs to grid-based files
using Unicode 5.0 characters via the code2000 font. See
http://nhds.com/org_chart_xml2ss.jpg for a revised spreadsheet containing
the recursive hierarchies he requested. In it, the hierarchies are reflected
in the row position of the "header tags" cells, which appear in rows 1-4.
Thus, for example, cells C1 and C2 indicate that a person whose "Manager"
tag resides in row 1 is the immediate manager of each person whose
"Employee" tag resides in row 2, etc. So, Bob is the manager of Mary and
Frank, while Leroy is the manager of Stan, Wes and Brenden.

Note that any Unicode characters can be used in header tags and in any
element and attribute value. 

Also note that the example above contains hyperlinks to each employees
resume, even though I could have embedded them in the cells (since each cell
can hold over 32K characters, which is more than 10 typewritten pages).

Other advantages of the data organization model I'm proposing include:
* The tags appear only once, no matter how many rows of values are under it,
which reduces overhead and compression/decompression time. 
* The tags and values can be stored easily in any delimited text file (such
a CSV), which is parsed in an instant using native spreadsheet
functionality, thereby reducing processing time.
* The spreadsheet document is very human readable.
* It accommodates and types of elements, attributes and extensions.
* Mapping different tags names can be as simple as predefining header cell
locations into which equivalent tags are to be placed. Thus, while one file
uses "Employee" at the tag in cell A1, another may use, say, "FirstName" in
A1 and the values are inherently associated by their column position.
* Querying the spreadsheet is easy and requires no additional metadata. For
example, compiling a list of all the employees under Bob can be done
automatically via an recursive macro that builds a collection by finding all
populated cells in each column with an "Employee" tag to the right of Bob's
(i.e., columns B through H), which end at row 10 since row 11 "belongs" to
the next employee at Bob's same hierarchical level. 

I realize this is a major paradigm shift -- from tagged lists to cellular
locations. I'd like to know what others think about this model.

Steve

-----Original Message-----
From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] 
Sent: Saturday, February 23, 2008 6:49 PM
To: Stephen-NHDS
Cc: Xml-Dev Listserv
Subject: Re: [xml-dev] Re: Ten Years Later - XML 1.0 Fifth Edition?

SBeller writes:

> An elegant solution for many situations is available if we shift 
> from a string-based language to "positional-based" method. This 
> solution involves transforming XML documents into grid-based files 
> (such as spreadsheets), in which (a) the cells of each column are 
> populated with the element or attribute values sharing the same XML 
> name, (b) the columns are arranged in a manner that maintains 
> hierarchies, and (b) the values in each cell in a row are associated.

I'm a bit confused about this proposal.  One of XML's most valuable 
features is its ability to unify documents and data in the same framework. 
 I can see how to translate a list of potential hires into a spreadsheet, 
as you suggest.  How would I handle the XML documents that are, for 
example, their resum�s?  The use of XML for structured documents is at 
least as important as for data;  indeed it's the combination that I think 
makes XML uniquely interesting.  I've never seen a spreadsheet that could 
do much more with documents than either extracting bits out of them, or 
maybe storing the text as blobs in cells.  How would a collection of 
resumes look in this form, presuming that the resumes had variable 
structure and lots of text? 

Certainly, spreadsheets are a also stretch for recursive hierarchies, even 
of data, and likewise I'm not sure how you represent the content 
corresponding to <xsd:choice>.  If I had an XML organization chart for my 
company, with a format like:

<manager name="bob">
  <employee name="mary"/>
  <manager name="sue">
    <employee name="tom"/>
  </manager>
</manager>

how would that map to your spreadsheet?  (note that managers have mixes 
employees and managers reporting to them at each level.)  Thank you.

Noah


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------


Stephen-NHDS <sbeller@nhds.com>
02/23/2008 11:40 AM
 
        To:     Xml-Dev Listserv <xml-dev@lists.xml.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        [xml-dev] Re: Ten Years Later - XML 1.0 Fifth 
Edition?


It seems to me that the very nature of the text-based markup method is
causing this Unicode incompatibility problem. An elegant solution for many
situations is available if we shift from a string-based language to
"positional-based" method. This solution involves transforming XML 
documents
into grid-based files (such as spreadsheets), in which (a) the cells of 
each
column are populated with the element or attribute values sharing the same
XML name, (b) the columns are arranged in a manner that maintains
hierarchies, and (b) the values in each cell in a row are associated.

The resulting grid could then be queried easily and its contents formatted
based on their cellular positions. Any Unicode characters can be used in 
the
names and values, e.g., Excel can accommodate all Unicode 5.0 characters 
via
the code2000 font, as well as using a character's code decimal value in 
its
macros. And the grid could be saved as a delimited text file, without the
overhead of tags and tag-based parsing.

I realize this paradigm shift isn't easy for many to comprehend, but it 
can
be done and is worth exploration, imo.

Steve



_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
References:
- Re: Ten Years Later - XML 1.0 Fifth Edition?
  - From: Stephen-NHDS <sbeller@nhds.com>
- Re: [xml-dev] Re: Ten Years Later - XML 1.0 Fifth Edition?
  - From: noah_mendelsohn@us.ibm.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]