XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] sets of parsing rules

linux? (bash, tcl, perl, etc)

Nathan Young -X (natyoung - Artizen at Cisco) wrote:
> Hi.
>
> I have seen parts of this question addressed but I think it's worth
> asking the whole question anyway, since I'm sure others have run into
> this problem but I haven't been able to dig up any best practices in my
> searching so far.  I may just need to search with the right terminology,
> in which case this should be any easy one for someone who already
> knows...
>
> I have an application that parses a large number of HTML pages.  A few
> of them are well formed XHTML but that's the exception rather than the
> rule.  By grabbing pages, manipulating them a bit (regexps have been
> sufficient here so far), then tidying them I can get them to a state
> where they are parsable XML.  From there I can use XSL to get them the
> rest of the way (although I have a process that allows me to run regexps
> here too, supplementing XSLT 1.0).
>
> The wrinkle is that I have several kinds of pages, each one requiring a
> distinct set of steps in order to parse it.  I'm starting down the road
> of modularizing the transforms so that I can handle more page types over
> time in a way that's transparent to the rest of my application.
>
> I've been exposed XML only pipelines, are there pipeline tools that
> allow for non-XML steps?
>
> ------------>Nathan
>
>
>
>
>
> .:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:
> ||:.
>
> Nathan Young
> Cisco.com->Interface Development
> A: ncy1717
> E: natyoung@cisco.com 
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>   


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS