XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Why does validation fail with a named ENTITY for carriage returnand line feed?

Hi Folks,

I want to specify the format of a "From:" field for email messages. The requirement is:

1. It starts with the literal "From:
2. Then there are one or more characters, a - z
3. Then the @ symbol
4. Then there are one or more characters, a - z
5. Then there is a carriage return (decimal 13) followed by a line feed (decimal 10)

A regular expression in the XML Schema pattern facet is well-suited for expressing that requirement:

    <xs:element name="from">
        <xs:simpleType>
            <xs:restriction base="xs:string">
                <xs:pattern value="From:[a-z]+@[a-z\.]+&#13;&#10;"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:element>

Great.

Here is a sample instance document:

<from>From:jdoe@machine.example&#13;&#10;</from>

That validates beautifully against the XML Schema.

Now, many email fields must end with CRLF so I declared an XML ENTITY that I can reuse:

<!ENTITY CRLF "&#13;&#10;">

I then changed the pattern facet to reference the named ENTITY:

    <xs:element name="from">
        <xs:simpleType>
            <xs:restriction base="xs:string">
                <xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:element>

When I validate the above instance document I get this error:

    The content "From:jdoe@machine.example\r\n" 
    of element <from> does not match the required 
    simple type. Value "From:jdoe@machine.example\r\n" 
    contravenes the pattern facet "From:[a-z]+@[a-z\.]+  " 
    of the type of element <from>.

Huh? 

What's going on? 

Why does the instance document validate when the character entities are explicitly provided in the pattern facet, but the instance document fails validation when a named ENTITY is used in the pattern facet?

The problem is not with the XML Schema validator. The problem is at a lower level. The problem is with the XML Parser.

Look again at the pattern facet:

<xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/>

Ignore the fact that it is XML Schema stuff. It is XML. We have an element <xs:pattern> and it has one attribute, value, which has this value: From:[a-z]+@[a-z\.]+&CRLF;

What does an XML parser do to attribute values? Answer: it normalizes attribute values. (http://www.w3.org/TR/REC-xml/#AVNormalize) 

The XML normalization algorithm says this:

    For an entity reference, recursively apply step 3 
    of this algorithm to the replacement text of the entity.

Okay, let's replace &CRLF; with its replacement text:

<xs:pattern value="From:[a-z]+@[a-z\.]+&#13;&#10;"/>

The normalization algorithm then says:

    For a white space character (#32, #13, #10, #9), 
    append a space character (#32) to the normalized value.

Okay, that yields:

<xs:pattern value="From:[a-z]+@[a-z\.]+  "/>

Note the two spaces at the end of the regular expression.

So normalization of this:

<xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/>

produces this:

<xs:pattern value="From:[a-z]+@[a-z\.]+  "/>

Hold on! 

Why doesn't this:

<xs:pattern value="From:[a-z]+@[a-z\.]+&#13;&#10;"/>

also normalize to this:

<xs:pattern value="From:[a-z]+@[a-z\.]+  "/>

I'm confused. Why does validation fail with named ENTITIES and succeed with character entities?

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS