The Ripple Effect of Design Decisions

Hi Folks,

A design decision can have a domino effect, effecting subsequent activities and implementations. Below is a concrete example of this.

Scenario

Convert a tab-delimited file to XML. Then create a Schematron schema to check the XML. The tab-delimited file consists of person names: first name, middle initial (optional), and last name. Here is a sample tab-delimited file:

FirstName           MI          LastName
John                      H.            Doe
Sally                                       Smith

Tab-Delimited to XML Conversion

The above tab-delimited file is converted to this XML:

<Persons>
                <Person>
                                <FirstName>John</FirstName>
                                <MI>H.</MI>
                                <LastName>Doe</LastName>
                </Person>
                <Person>
                                <FirstName>Sally</FirstName>
                                <LastName>Smith</LastName>
                </Person>
</Persons>

Design Decision

If a field in the tab-delimited file is empty, express that in XML by omitting the XML element.

For example, Sally does not have a middle initial, so there is no <MI> element in the XML.

Schematron Implementation

Implement Schematron to express this rule [1]:

If the person has a middle initial, then the middle initial must be an uppercase letter
of the English alphabet followed by a period symbol and nothing else.

The following Schematron faithfully implements the rule:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
                        queryBinding="xslt2">

                <sch:pattern id="CheckMiddleInitial">
                                <sch:rule context="Person">
                                                <sch:assert test="if (exists(MI)) then matches(MI, '^[A-Z]\.$')
                                                                              else true()">
                                                                The Middle Initial must be a letter followed by a period, and nothing else.
                                                </sch:assert>
                                </sch:rule>
                </sch:pattern>

</sch:schema>

Testing XML Documents Against the Schematron Schema

Here is an XML document that is Schematron-valid:

The following XML document is not valid because the middle initial in John Doe does not begin with an uppercase letter of the English alphabet:

<Persons>
                <Person>
                                <FirstName>John</FirstName>
                                <MI>*.</MI>
                                <LastName>Doe</LastName>
                </Person>
                <Person>
                                <FirstName>Sally</FirstName>
                                <LastName>Smith</LastName>
                </Person>
</Persons>

Likewise, all of these values would be flagged as invalid: H.xxx, HA, 6., and H

So What’s the Problem?

Recall that we made a design decision to implement empty fields in the tab-delimited file by omitting the element in the XML. Suppose that at a later date we change our decision and implement empty fields in the tab-delimited file by an empty element in the XML. For example, under the new design decision this would be the XML for the above tab-delimited file:

<Persons>
                <Person>
                                <FirstName>John</FirstName>
                                <MI>H.</MI>
                                <LastName>Doe</LastName>
                </Person>
                <Person>
                                <FirstName>Sally</FirstName>
                                <MI></MI>
                                <LastName>Smith</LastName>
                </Person>
</Persons>

Notice the empty <MI> element for Sally Smith.

That XML document conforms to the middle initial rule, but it fails Schematron validation.

Change Decision, Things Break

The decision we made at the start influenced the Schematron implementation and when the decision changed, the Schematron implementation broke.

The culprit is with this XPath expression in the Schematron implementation:

if (exists(MI)) then matches(…)

That XPath expression applies the matches() function whenever the <MI> element exists. But it should only apply the matches() function to non-empty <MI> elements.

Can Subsequent Activities/Implementations Shield Themselves From Upstream Decisions?

Can we implement the Schematron rule in a way that is agnostic to how empty fields in the tab-delimited file are expressed in XML? Yes, in the Schematron rule, use this XPath expression [2]:

if (MI eq '') then true()

else if (exists(MI)) then matches(MI, '^[A-Z]\.$')

else true()

If the value of the <MI> element is empty, then return true. Otherwise, apply the matches() function.

With that XPath expression, both of these <MI> elements are Schematron-valid:

Lessons Learned

Recognize the design decisions you are making. Be aware that those design decisions may influence subsequent activities. Document your design decisions. Put the design decision on big banners (figuratively) for all subsequent activities and implementers to see. Think of alternate designs. Check the subsequent activities to see that they will still work properly if an alternate design is later adopted.

Notes

[1] The rule for middle initials is intentionally simplistic. The rule would need to be expanded to make it useful in a world-wide setting.

[2] This XPath expression is perhaps not the best (simplest, shortest, fastest) but it is simple and easy to understand.