Re: [xml-dev] Schematron: Categories of Usage?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Mon, 22 Jan 2007 13:14:15 +0100
Algorithmic checking:

the following checks the algorithm of EAN Location numbers, after the
algorithm found here
http://www.ean.dk/EAN_Sys/helpdesk/faq/kntrlcif.htm#EAN%20Lokationsnummer
(sorry, it's in Danish):

<sch:rule context="*[@schemeID]">
    <sch:report test="@schemeID='EAN' and string-length(.) != 13">
WARNING: EAN numbers are 13 digits in length
</sch:report>
<sch:report test="@schemeID='EAN' and . != (. + 1) - 1">
WARNING: EAN numbers are 13 digits in length</sch:report>
<sch:report test="@schemeID='EAN' and substring(.,13,1)!=0 and ((((10
- substring((substring(.,1,1) * 1 + substring(.,2,1) * 3) +
(substring(.,3,1) * 1 + substring(.,4,1) * 3) + (substring(.,5,1) * 1
+ substring(.,6,1) * 3) + (substring(.,7,1) * 1 + substring(.,8,1) *
3) + (substring(.,9,1) * 1 + substring(.,10,1) * 3) +
(substring(.,11,1) * 1 + substring(.,12,1) *
3),string-length((substring(.,1,1) * 1 + substring(.,2,1) * 3) +
(substring(.,3,1) * 1 + substring(.,4,1) * 3) + (substring(.,5,1) * 1
+ substring(.,6,1) * 3) + (substring(.,7,1) * 1 + substring(.,8,1) *
3) + (substring(.,9,1) * 1 + substring(.,10,1) * 3) +
(substring(.,11,1) * 1 + substring(.,12,1) * 3)),1)) +
((substring(.,1,1) * 1 + substring(.,2,1) * 3) + (substring(.,3,1) * 1
+ substring(.,4,1) * 3) + (substring(.,5,1) * 1 + substring(.,6,1) *
3) + (substring(.,7,1) * 1 + substring(.,8,1) * 3) + (substring(.,9,1)
* 1 + substring(.,10,1) * 3) + (substring(.,11,1) * 1 +
substring(.,12,1) * 3))) - ((substring(.,1,1) * 1 + substring(.,2,1) *
3) + (substring(.,3,1) * 1 + substring(.,4,1) * 3) + (substring(.,5,1)
* 1 + substring(.,6,1) * 3) + (substring(.,7,1) * 1 + substring(.,8,1)
* 3) + (substring(.,9,1) * 1 + substring(.,10,1) * 3) +
(substring(.,11,1) * 1 + substring(.,12,1) * 3))) != substring(.,13,1)
)">
there is an improperly formatted EAN number.


</sch:report>
<sch:report test="@schemeID='EAN' and substring(.,13,1) =0 and
substring((substring(.,1,1) * 1 + substring(.,2,1) * 3) +
(substring(.,3,1) * 1 + substring(.,4,1) * 3) + (substring(.,5,1) * 1
+ substring(.,6,1) * 3) + (substring(.,7,1) * 1 + substring(.,8,1) *
3) + (substring(.,9,1) * 1 + substring(.,10,1) * 3) +
(substring(.,11,1) * 1 + substring(.,12,1) *
3),string-length((substring(.,1,1) * 1 + substring(.,2,1) * 3) +
(substring(.,3,1) * 1 + substring(.,4,1) * 3) + (substring(.,5,1) * 1
+ substring(.,6,1) * 3) + (substring(.,7,1) * 1 + substring(.,8,1) *
3) + (substring(.,9,1) * 1 + substring(.,10,1) * 3) +
(substring(.,11,1) * 1 + substring(.,12,1) * 3)),1) != 0">
there is an improperly formatted EAN number.
</sch:report>

don't worry, verbosity isn't a concern in XML.  :)

The same principals can be used to implement a great number of
algorithms where the boundaries of the problem are know, as in this
case I know that this sequence is 13 characters in length, not less
nor more.

Actually because of the way schematrons assert works one can do checks
on sequences where the possible upper bound is known but not if the
upper bound is actually reached.

I did a proof of this recently (generated the code of course, it took
86 assertions to implement the check), the requirement was that for a
text string the space between each linefeed was no longer than 37
characters, and there could not be more than 45 linefeeds.

The generated assertions were of course that the string-length of the
string between line feed 1 and 2 was less than 38.
the string-length of the string between line feed 2 and 3 was less
than 38 and so forth.

If there were only two line feeds the other assertions did not return
false due to wording.

It took 86 assertions because I split on if the ending line had to end
with a line feed. Unfortunately my laptop burnt out (nothing to do
with this example) and I hadn't backed it up because it was a sort of
a fun experiment. Not for actual use.

This was in Schematron 1.5 not Iso, it would be a lot easier to write
this stuff in ISO. Of course others out there could probably optimize
the code, but it has been checking EAN numbers for a year and a half
now and nobody has submitted an error yet. (fingers crossed)

Cheers,
Bryan Rasmussen





On 1/22/07, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
>
> I am putting together a list of ways that Schematron is being used.  I
> seek your help in ensuring that the list is complete. (I will post the
> final list)
>
> Let me give an example to show what I mean by "ways that Schematron is
> being used".
>
> Consider this simple XML instance document:
>
> <?xml version="1.0"?>
> <Document>
>      <Classification>unclassified</Classification>
>      <Para>
>           Lorem ipsum dolor sit amet,
>           laoreet ac convallis dictumst
>      </Para>
>      <Classification>unclassified</Classification>
> </Document>
>
> Schematron can be used to specify, "The Classification value at the top
> and bottom of the document must match; the Para element must not
> contain any restricted keywords."
>
> Thus, we see Schematron being used to express these two types of data
> constraints:
>
> 1. Co-constraints: in the example the co-constraint is between the two
> Classification values; namely, the two values must be identical.  In
> general, co-constraints are constraints that exist between data
> (element-to-element co-constraints, element-to-attribute,
> attribute-attribute).  The co-constraints may be "within" an XML
> document, or "across" XML documents.
>
> Schematron is very well-suited to expressing co-constraints.
>
> 2. Existence: in the example the existence constraint is that the Para
> element must not contain any restricted keywords.  The keywords may be
> obtained dynamically from another file. In general, existence
> constraints are constraints on the presence or absence of data.  The
> existence constraints may apply over the entire document, or to just
> portions of the document.
>
> Schematron is very well-suited to expressing existence constraints.
>
> Categories of Schematron Usage
>
> Here are the ways that Schematron is being used today:
>
> 1. Co-constraint checking
> 2. Existence checking
>
> Are you using Schematron in ways not represented by these two
> categories?  I am particularly interested in identifying ways
> Schematron is being used which cannot be expressed by other schema
> languages - XML Schemas, Relax NG.
>
> /Roger
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
Follow-Ups:
- RE: [xml-dev] Schematron: Categories of Usage?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Schematron: Categories of Usage?
  - From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
References:
- Schematron: Categories of Usage?
  - From: "Costello, Roger L." <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]