OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] HGRAB. Syndication. Google. Grey area.

[ Lists Home | Date Index | Thread Index ]

 
 
Meerkat is cool. Really. I'm thinking
about Meerkat - HGRAB gateway.
 
The few problems  ( two minor and
one big ) I have with Meerkat are :
 
1. Not all RSS channels have a brief
description.
 
2. Many interesting websites don't
bother to provide RSS ( or robots.txt for
that matter ;-)
 
3. I want to syndicate what I want
with  the GUI,  that I want, but
not what 'they' provide me with.
 
To the best of my knowledge, for example,
W3C.org does not provide the RSS for their
website.
 
Actually, I can write HGRAB -> Meerkat
gateway (so that W3C.org would make
a channel on Meerkat).
 
If somebody from Meerkat would be
interested - I'd be glad to do that.
 
Again - you are right, that Meerkat
is a very nice thing. At the beginning
I was considering using it instead of
writing HGRAB. But now I think
that it is good to have both.
 
Rgds.Paul.
 
Sent: Wednesday, January 09, 2002 5:15 AM
Subject: RE: [xml-dev] HGRAB. Syndication. Google. Grey area.

Why not just use Meerkat? It's got a good selection
of XML channels, and you can customise it using
the 'Mobs' feature.
 
I assume it also builds a local index because you can
perform a search amongst the feeds.
 
RSS channels usually have a brief description of
an article as well as a title and channel. Isn't this
enough?
 
Cheers,
 
L.
 
-----Original Message-----
From: Paul T [mailto:pault12@pacbell.net]
Sent: 08 January 2002 21:15
To: xml-dev@lists.xml.org
Subject: [xml-dev] HGRAB. Syndication. Google. Grey area.

 
 
Some time ago I proposed that some knowledge
base about XML-related words could be created
so that people can submit/organize XML-related
words into some 'clusters'.
 
I decided that before writing such a system,
I should write some other system, that would
allow me to "automatically get all the XML-related
news".
 
So I've written a simple 'news feed', which
polls some news sources and syndicates
the 'news feed' for me.
 
The alpha  version of  this syndicator is
 
It is yet incomplete and not all XML-related
sources are syndicated ( I greatly appreciate
any URLs to XML-news websites ), but it should
give the idea.
 
I believe that for some cases it is 
*very* convenient to get such a news feed,
rather than browse each website
or use Google. The problem with RSS / RDF
is that none of the RSS / RDF sources
that I've seen provides the information
other than the title and url. That's is just
'not enough'.
 
However, there is one fundamental
legal problem with current HGRAB 
design.
 
The first impression is : "it is suspicious,
because it is not you who is creating the
content". ( that's why HGRAB strips
the original markup, so that the user is
enforced to go to the original news source ).
 
However, the more I think about this, the
more interesting it gets.
 
So, what HGRAB actually does? It polls
the HTML pages ( once in a while,
no harm done to the load of original
website ). Then it places some part of the
content into HGRAB database (for future
searching). Then it provides the end-user
with some 'part' of the original news item
and with the URL to the original news source.
 
Google does *exactly* this (and also Google
provides a cached copy of the original content)
 
That means:
 
Either both HGRAB and Google should be sued,
because they both sell the content
*which does not belong to them*, or both
HGRAB and Google should be considered
'just a service'.
 
Add the (similar) legal problems that Napster
( and many other P2P networks ) ...
 
"What can you do to the content created not by you"
is *really* a tricky question.
 
Is Google illegal ? I'm not a lawyer - I don't know.
 
My conclusion is that the Internet is a legal mess
and next years there would be more work for
lawyers,  than for developers.
 
Whatever. I'm not a lawyer, so I would continue to
improve the syndicator (anyway, my goal still is
'XML-words' knowledge base, the HGRAB
was a side-effect ) so :
 
1. Do you know about some nice XML-news
( or 'general IT' ) websites other than:
 
Could you please drop me the URL?
 
2. If HGRAB looks interesting to you (in any way) -
you're welcome to write me. I'm thinking to make
it into standalone product, but ... Thinking ...
Even in alpha, it already appears to work  and
adding a new 'news Source' is usually a matter
of 15 minutes (that was the goal)
 
Rgds.Paul.
 
PS.
 
HGRAB is written in XSLT, XML Chunks,
SQL, Perl. When writing it, I found that
XSLT is *not* actually a good tool for
*processing* of mixed content ( XSLT is
good for *rendering* mixed content - which
is different task. XPath axes are kinda
'orthogonal' to 'good-old regular expressions'
machinery and good-old regular expressions
work  'better'. I can elaborate, if somebody
is interested )
 
Smart SPAM Filter
http://www.spafi.com
 
 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS