OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Data warehousing and XML

[ Lists Home | Date Index | Thread Index ]
  • From: len bullard <cbullard@hiwaay.net>
  • To: Paul Prescod <papresco@technologist.com>
  • Date: Tue, 02 Dec 1997 18:28:05 -0600

Paul Prescod wrote:

> I don't doubt that there are some people in the world who want to "mine"
> documents, but I think that they are in the minority, and will be for a
> long time. But more important, it makes little sense to me to "mine" XML
> data. Even if you wanted to mine your structured document data it will
> almost always make sense to load that into the mining tool's internal
> data structures.

Umm.. that actually was one of the often requested capabilities 
when I was still working on SGML systems.  The problem was 
precisely that a great deal of the *interesting* information 
was not in relational databases.  Comparative policy analysis, 
for example.  
 
> Once again, XML is great as the transfer format, but when you get down
> to doing your queries, your data mining software should not be parsing
> the XML syntax.

Ok.  Hmm?  Well, what were the various proposals over the 
years for SGML querying systems for?

> > However, let me ask a technical
> > question that you can probably answer with a deeper
> > technical perspective than mine?  How well can one query
> > data (or convert it for that matter) for which one
> > has no rigorous schema (of some kind)?
> 
> In some cases you can do sophisticated queries on data without a schema,
> but you would have to jump through AI hoops. It's not a job I would
> apply for, but neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.

That is what I thought to be the case.  I remember when we 
were doing the GE CASS system we bounced around the idea 
of using DTDs as sort of a reversed query, that is, it 
gave us a way to figure out what kinds of queries should 
be interesting.  We never pursued the idea because the 
SGML systems of that time were fairly primitive.

len

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS