Hi Folks, A design decision can have a domino effect, effecting subsequent activities and implementations. Below is a concrete example of this.
Scenario Convert a tab-delimited file to XML. Then create a Schematron schema to check the XML. The tab-delimited file consists of person names: first name, middle initial (optional), and last name. Here
is a sample tab-delimited file: FirstName MI LastName
Tab-Delimited to XML Conversion The above tab-delimited file is converted to this XML: <Persons>
Design Decision If a field in the tab-delimited file is empty, express that in XML by omitting the XML element. For example, Sally does not have a middle initial, so there is no <MI> element in the XML.
Schematron Implementation Implement Schematron to express this rule [1]:
If the person has a middle initial, then the middle initial must be an uppercase letter
The following Schematron faithfully implements the rule: <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
Testing XML Documents Against the Schematron Schema Here is an XML document that is Schematron-valid: <Persons> The following XML document is not valid because the middle initial in John Doe does not begin with an uppercase letter of the English alphabet: <Persons> Likewise, all of these values would be flagged as invalid:
H.xxx,
HA, 6., and
H
So What’s the Problem? Recall that we made a design decision to implement empty fields in the tab-delimited file by omitting the element in the XML. Suppose that at a later date we change our decision and implement
empty fields in the tab-delimited file by an empty element in the XML. For example, under the new design decision this would be the XML for the above tab-delimited file: <Persons> Notice the empty <MI> element for Sally Smith. That XML document conforms to the middle initial rule, but it fails Schematron validation.
Change Decision, Things Break The decision we made at the start influenced the Schematron implementation and when the decision changed, the Schematron implementation broke. The culprit is with this XPath expression in the Schematron implementation:
if (exists(MI)) then matches(…) That XPath expression applies the matches() function whenever the <MI> element exists. But it should only apply the matches() function to non-empty <MI> elements.
Can Subsequent Activities/Implementations Shield Themselves From Upstream Decisions? Can we implement the Schematron rule in a way that is agnostic to how empty fields in the tab-delimited file are expressed in XML? Yes, in the Schematron rule, use this XPath expression [2]: if (MI eq '') then true() else if (exists(MI)) then matches(MI, '^[A-Z]\.$')
else true() If the value of the <MI> element is empty, then return true. Otherwise, apply the matches() function. With that XPath expression, both of these <MI> elements are Schematron-valid: <MI></MI>
Lessons Learned Recognize the design decisions you are making. Be aware that those design decisions may influence subsequent activities.
Document your design decisions. Put the design decision on big banners (figuratively) for all subsequent activities and implementers to see. Think of alternate designs. Check the subsequent activities to see that they will still work properly if an alternate
design is later adopted.
Notes [1] The rule for middle initials is intentionally simplistic. The rule would need to be expanded to make it useful in a world-wide setting. [2] This XPath expression is perhaps not the best (simplest, shortest, fastest) but it is simple and easy to understand. |