OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Whitespace

[ Lists Home | Date Index | Thread Index ]
  • From: agreene@bitstream.com (Andrew Greene)
  • To: xml-dev@ic.ac.uk
  • Date: Thu, 28 Aug 1997 12:52:15 -0400

But perl doesn't have to break $_ on newlines. Whenever I do SGML
"parsing" with perl, I start off with

    $/ = "<";

which says "the record-break character is '<', instead of newline."
Then, within my while (<>) loop, each $_ contains a single tag and
some content, (roughly) matching the regexp:

    ($etagP, $gi, $attlist, $content) =
       /(\/?)(\w+)\s*([^>]*)\>(.*)/;

[For purists only: Yes, GIs can contain a different set of characters
than \w+, and attributes can contain > if it's enclosed in quotations,
and this doesn't chop off the '<' at the end of all tags except the
last one, and so on and so forth.... For SGML, it assumes that the
first character of ETAGO is the same as STAGO; for XML, it doesn't
handle the /> syntax... but it's simplified to make a point.]

The point is that perl doesn't care whether you have whitespace or
not, and if your perl script is splitting on newlines then you're
probably not going to correctly handle tags that contain newlines,
such as

    <book
        id=TWENTYKDOWN
        authorid=VERNEJ
        pubid=PENGUIN
    ><title>20,000 Leagues Under the Sea</title
    ></book
    >

- Andrew Greene

        
        
        


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS