OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: XML-DEV JEWELS (was : XML-DEV on Groves)

[ Lists Home | Date Index | Thread Index ]
  • From: Arjun Ray <aray@q2.net>
  • To: xml-dev@xml.org
  • Date: Sat, 12 Feb 2000 15:45:31 -0500 (EST)



On 12 Feb 2000, Thierry Bezecourt wrote:

> From what I have learned [...] groves are about hierarchical data
> structures and addressing nodes in these structures, so they seem
> ideal for a mailing list archive.

Having tackled the problem before in various lo-tech ways, I haven't
found much hierarchic structure that was useful, as opposed to merely
organizational.  Unlike usenet posts, mail messages lack a References:
header (the In-Reply-To:, when it's there, is much too bogotically
variable) so you don't get the benefit of threading.

[However, it might be a basis for a collaborative effort, where the
grove is "grown" over time with feedback on threading links - say a
forms-based adjunct to a Hypermail/MHonArc-style interface, driven by
a grove-aware engine doing smarter things than just spitting out the 
contents of an overview.fmt database.]

> To do that, if I'm correct, we would have to define a property set
> for mailing lists, where articles would be nodes, header fields
> would be properties of these nodes, the "References" header field
> would be used for links to other articles, and the "contents"
> would be the body of the article, which could contain links.

My limited understanding of groves tells me that the key is the grove
plan - which basically determines the amount of analytic granularity
one wants or needs to work with.  (E.g. an article would be a node,
but how "high" or "low" in the hierarchy?)  Maximum flexibility needs
an exhaustive/detailed property set as the basis.

> It does not seem very difficult.

Well, it has been my experience that reliably extracting fine-grained
material from mail messages is very difficult.  (Just think of the
variety of quoting habits/conventions.)  For comparison, look at:

 (1) the monthly aggregations of messages, in UNIX mbox format, from
     the majordomo bot at IC where this list was.
 (2) Erik Naggum's old archive of usenet posts to comp.text.sgml,
     already preprocessed into a SGML format at
        ftp://ftp.ifiuio.no/pub/SGML/comp.text.sgml

Care to develop a good property set?:)


Arjun






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS