OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Whitespace

[ Lists Home | Date Index | Thread Index ]
  • From: matt@wdi.disney.com (Matthew Fuchs)
  • To: digitome@iol.ie
  • Date: Thu, 28 Aug 97 9:21:33 PDT

> 
> 
> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
> >>useful if it contains the *entire* document.
> >
> >Just as it's not useful in processing HTML. Regexps that don't match across
> >line boundaries are the most common problem I've seen in HTML-processing
> >Perl scripts. Looks like that will continue until people figure out that
> >Perl's line "Feature" is jsut a big when used with XML/HTML.
> >
> 
> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
> parse!
> 
Nonsense! Regexps that fail across line boundaries are only due to
lazy DPHs.  The "s" modifer to a regex will treat the entire string
(i.e., document) as a single target.  The problem here is that
_insignificant_ whitespace (a newline) is treated significantly.
A regex modifier which treated newline, tabs, etc., as spaces would
really help reduce this problem. (Larry Wall doesn't follow this
mailing list does he?)


> XML as a friendly format to, say, DPH needs some explaining. To use Perl to
> read/write XML 
> you *must* use an XML parser. Indeed any tool intending to read/write XML
> needs to use a 
> *fully blown parser* to get at the document. Bye bye the entire Unix family
> of line oriented text processing apps:-(
> 

Maybe you just need to put a filter at the beginning of your pipeline
to normalize whitespace to whatever you need.

Matthew

-----------------------------------------------------
Matthew Fuchs
matt@wdi.disney.com
http://cs.nyu.edu/phd_students/fuchs
-----------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


  • Follow-Ups:



 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS