RE: [xml-dev] ANN: the first million prime numbers in XML format

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Rushforth, Peter" <Peter.Rushforth@NRCan-RNCan.gc.ca>
To: "Costello, Roger L." <costello@mitre.org>
Date: Sun, 15 Jul 2012 09:26:01 +0000

Roger,

Just browsing some old xml-dev archive footage :-), and I came across a nice post on a similar topic by Steve DeRose circa 2004:

http://lists.xml.org/archives/xml-dev/200412/msg00205.html

He makes three good points.

First, you should reduce the size of the pages of prime numbers so that it can be consumed on small devices, like cell phones. IOW use REST.

Second:

> 2: If you want the data formatted by CSS or XSL-FO, or transformed by 
XSLT, or whatever, having all the data in one syntax that the 
applications *already* know about is much easier than rewriting the 
applications or working around them to add some syntax (like commas) 
that they *don't* know about. You'll never have to debug the XML 
parser you use to parse all those "<data>" tags, but you will spend a 
lot of time if you try to introduce a new syntax in your process.

So the advantage to the existing parser. I believe this applies to the hypermedia affordances used to drive the interactions, so perhaps I was wrong about réfh etc., I should have used href :-).

Finally, regarding the use of xml, json or plain text:
>3: Any text file that contains zillions of instances of a certain 
string, is necessarily very compressible. The first thing a 
compression program will do is discover that "<data>" is real common, 
and assign it a really short code. A comma-delimited file is 
inherently less compressible.

He even goes on to demonstrate that the compressibility of the xml is high.  In short the advantage is in having a parser already built for XML.  Now if only that parser would provide the application/user with the affordances it needs to change state.

Peter
________________________________________
From: Rushforth, Peter [Peter.Rushforth@NRCan-RNCan.gc.ca]
Sent: July 10, 2012 4:33 PM
To: Costello, Roger L.
Cc: xml-dev@lists.xml.org
Subject: RE: [xml-dev] ANN: the first million prime numbers in XML format

Hi Roger,

Interesting thread, with lots of opportunities ;-).  Unfortunately, no luck with the lottery :-<

Petite Abeille wrote:
> Roger Costello wrote:
> > So I created two XML documents, collectively containing the
> first million prime numbers.
...
> Be patient. They are large files (10 MB and 11 MB, respectively)
>
> Nice… but… XML?!?
>
> <prime>189149</prime>?!? Seriously?!?
>
> Wouldn't a simple text file be more than sufficient?!?
>
> E.g.:
>
> http://www.mathsisfun.com/numbers/prime-number-lists.html
>
> If you only have a hammer...

> David Lee wrote:
> > I personally don't see anything wrong with putting lists of
> numbers in XML format.
>
> Fair enough… I suspect though this is going to turn into the
> poster boy of all what's wrong with XML… but as you said, to
> each his own :))

C:\>curl http://www.xfront.com/second-500000-primes.xml > second-500000-primes.xml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 93 10.8M   93 10.1M    0     0  32034      0  0:05:53  0:05:30  0:00:23 50808

phew.  *That* was painful.

Why not create a "web service" which calculates the "next" page of prime numbers?
Then, you could use content negotiation to specify the format (xml, json, text, html, other), the language (en, fr, other),
the compression (gzip, other) and if you used a hypermedia format, say "application/vnd.xfront.com.prime-numbers", you could use REST to continue
retrieving prime numbers forever, if you wanted?

I suggest interactions might look like:

GET http://www.xfront.com/numbers/prime?last=7368743 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11
Accept: application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

HTTP/1.1 200 OK
Date: Tue, 10 Jul 2012 20:22:59 GMT
Server: Apache
Connection: Keep-Alive
Content-Type: application/vnd.xfront.com.prime-numbers+xml; charset=UTF-8

<?xml version="1.0" encoding="UTF-8"?>
<prime-numbers>
  <link rélation="next" réfh="http://www.xfront.com/numbers/prime?last="15485863"; genremime="application/vnd.xfront.com.prime-numbers" langueh="en"/>
  <link rélation="next" réfh="http://www.xfront.com/numbers/prime?last="15485863"; genremime="application/vnd.xfront.com.prime-numbers+json" langueh="en"/>
  <link rélation="next" réfh="http://www.xfront.com/numbers/prime?last="15485863"; genremime="text/html" langueh="en"/>
  <!-- if you prefer, and understand the mime type, use text. Is there just a CR, a CRLF or only a LF, or nothing(?) separating those numbers -->
  <link rélation="prochaine" réfh="http://www.xfront.com/numbers/prime?last="15485863"; genremime="text/plain"/>
  <prime>7368791</prime>
    ...
  <prime>15485863</prime>
</prime-numbers>

GET http://www.xfront.com/numbers/prime?last=15485863

etc.

Maybe you should cut down the size of the pages a bit though.  Your thoughts?

Regards,
Peter

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

References:
- ANN: the first million prime numbers in XML format
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] ANN: the first million prime numbers in XML format
  - From: Petite Abeille <petite.abeille@gmail.com>
- RE: [xml-dev] ANN: the first million prime numbers in XML format
  - From: "Rushforth, Peter" <Peter.Rushforth@NRCan-RNCan.gc.ca>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]