OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [OT] bugs in JDK regex engine ?

Thanks Mike, for your comments.

Below is a simple example I tried with JDK 1.6.0.

String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";

Pattern pattern = Pattern.compile("<[^/]+>");  //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
   String group = matcher.group();

'str' is a String representation of a XML fragment.

I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).

I am expecting output:
<abc x='1'>
<pqr y='1'>

But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>

You could notice, that the 1st token is larger ...

Can you or anybody please help ...

On Feb 3, 2008 10:52 PM, Michael Kay <mike@saxonica.com> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
> Michael Kay
> http://www.saxonica.com/

Mukul Gandhi

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS