[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [OT] bugs in JDK regex engine ?
- From: "Mukul Gandhi" <gandhi.mukul@gmail.com>
- To: xml-dev@lists.xml.org
- Date: Mon, 4 Feb 2008 09:27:58 +0530
Thanks Mike, for your comments.
Below is a simple example I tried with JDK 1.6.0.
String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";
Pattern pattern = Pattern.compile("<[^/]+>"); //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
'str' is a String representation of a XML fragment.
I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).
I am expecting output:
<root>
<abc x='1'>
<pqr y='1'>
But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>
You could notice, that the 1st token is larger ...
Can you or anybody please help ...
On Feb 3, 2008 10:52 PM, Michael Kay <mike@saxonica.com> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
>
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
>
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
>
> Michael Kay
> http://www.saxonica.com/
--
Regards,
Mukul Gandhi
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]