Re: [xml-dev] [OT] bugs in JDK regex engine ?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Mukul Gandhi" <gandhi.mukul@gmail.com>
To: xml-dev@lists.xml.org
Date: Mon, 4 Feb 2008 09:27:58 +0530

Thanks Mike, for your comments.

Below is a simple example I tried with JDK 1.6.0.

String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";

Pattern pattern = Pattern.compile("<[^/]+>");  //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
   String group = matcher.group();
   System.out.println(group);
}

'str' is a String representation of a XML fragment.

I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).

I am expecting output:
<root>
<abc x='1'>
<pqr y='1'>

But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>

You could notice, that the 1st token is larger ...

Can you or anybody please help ...

On Feb 3, 2008 10:52 PM, Michael Kay <mike@saxonica.com> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
>
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
>
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
>
> Michael Kay
> http://www.saxonica.com/


-- 
Regards,
Mukul Gandhi

Follow-Ups:
- RE: [xml-dev] [OT] bugs in JDK regex engine ?
  - From: "Michael Kay" <mike@saxonica.com>
- Re: [xml-dev] [OT] bugs in JDK regex engine ?
  - From: Amelia A Lewis <amyzing@talsever.com>

References:
- [OT] bugs in JDK regex engine ?
  - From: "Mukul Gandhi" <gandhi.mukul@gmail.com>
- RE: [xml-dev] [OT] bugs in JDK regex engine ?
  - From: "Michael Kay" <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]