OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] MarkMail: now archiving xml-dev

Quoting Elliotte Rusty Harold <elharo@metalab.unc.edu>:

> Jason Hunter wrote:
> > I think the reason you *don't* see that is the inherent risk of letting 
> > someone else run arbitrary code on your server.  What if the user starts 
> > calculating Pi to 1,000,000,000 digits?

You don't need to let outsiders runs "arbitrary" code. 
> Perhaps we shouldn't have made XQuery Turing complete? (Side note: I'm 
> pretty sure XQuery is Turing complete. Has anyone proved it yet?)

Lets not even talk about XQuery. Do we talk about SQL in systems that have
SQL back ends? Normally the functionality is wrapped in other functions and
interfaces--- heck, these days, it seems most Java "programmers" could not
even write a line of SQL if they had too (they'd argue, of course, that they
don't). We must also be clear that XQuery is not "XQuery" especially in the
context of information retrieval (e.g. Full-Text)..  Beyond also the observation
that SQL too is not "SQL" it would be foolish to promiscuously expose, despite
all the user controls, one's RDBMS to every Tom, Dick and Harry.. One could
design an XQuery scripting extension that would be "safer" for anonymous
use (keep in mind that what looks "safe" is not always "safe" from malicious
users and bug exploits) but why the bother? What's the benefit?

Functionality? This, I suggest, could be exposed via other means. One of my
own personal interests is to explore how one can expose the information
functionality (the will to retrieve "relevant" bits of information) in the
most naive and transparent manner. Since we have a completely flexible
unit of retrieval (not bound by "record" or any other unit defined at
index time) and the user might not understand or know the details of the
structural mark-up used to encode the information, we need to figure out ways
to interfaces to get the user the information that's relevant to them.
Since the problem is not typically "individualistic" (there are classes of
common responses) one should be able to make do without user scripting.
The email archive case is really much much easier since much of the structure
is not only known by the user (subject, sender, etc. in the header and in
the message body we have lines, sentences and paragraphs and perhaps some
attachments) but the semantic rules for content too.. "Relevant" retrieval
objects are nearly always the message in the context of the thread in its
temporal context (other messages that appeared in the list). The only hard-bit
is to figure out what belongs in a thread--- we have Message-ids but not
always and we have changing subject lines..

> > What if they start consuming 
> > disk or thrashing the disk IO?  When you query against hundreds of gigs 
> > of content, you don't have to be malicious to mess things up.

Its not 100s of GB. Mailing lists are not that large.

> > 

> Or for a less constrained appraoch, try Amazon EC2. Run any code you 
> like on their servers.

That's what virtual machines, zones and some other bits and concepts about..
Its not, however, needed, I think, for doing IR on XML. A lot of the
functionality of XQuery--- holding back from talking about XQuery--- is not
about the act of searching or retrieving information but about doing things
to it. A lot of this "functionality" need not be performed by the "in-the-know"

> Yes, it's challenging; but I suspect there's a real business model in 
> there somewhere. :-)

  E. Zimmermann, BSn/Munich R&D Unit
  Leopoldstrasse 53-55, D-80802 Munich,
  Federal Republic of Germany

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS