[
Lists Home |
Date Index |
Thread Index
]
On Wed, 2004-04-07 at 00:46, Pascarella, Randy wrote:
<snip>
>
> The point is that the two files describe the same thing from the same
> schema in different ways via the values of the elements. So if I,
> being an intercessor, want to link the two together in a common way,
> how would I do this? I am currently thinking of something like this.
> Let's say I'm a government employee wanting to provide common tags for
> Bob and Tom to use in their XML files for consistency. I define a tag
> called <gov.illinois.chicago.street.name> as a "reference point". Bob
The problem with this approach is that you would have to define a new
element for every city in the U.S., even the entire world if you want to
record an address elsewhere.
Your schema (DTD, RELAX NG, or whatever you use) would bloat
enourmously, so much that using it would not be practical.
You can avoid this in a couple of ways. Here are two:
* Don't allow Tom and Bob to store inconsistent data in the files.
If they _enter_ data the wrong way, your application could guide them,
perhaps by verifying their entries against a database of street names.
* Use an attribute to store the correct name. Instead of this:
> <city>
> <name>Chicago</name>
> <streets>
> <street>
> <name>Main Street</name>
> <gov.illinois.chicago.street.name>Main
> St.</gov.illinois.chicago.street.name>
> <zipcode>60609</zipcode>
> </street>
> <street>
> <name>West Street</name>
> <gov.illinois.chicago.street.name>West
> St.</gov.illinois.chicago.street.name>
> <zipcode>60603</zipcode>
> </street>
> ...
> </streets>
> </city>
You would get this:
<city>
<name>Chicago</name>
<streets>
<street name="Main St.">
<name>Main Street</name>
<zipcode>60609</zipcode>
</street>
<street>
<name>West Street</name>
<gov.illinois.chicago.street.name>West
St.</gov.illinois.chicago.street.name>
<zipcode>60603</zipcode>
</street>
...
</streets>
</city>
Of course, you could also do this:
<city>
<name>Chicago</name>
<streets>
<street>
<entered-name>Main Street</entered-name>
<real-name>Main St.</real-name>
<zipcode>60609</zipcode>
</street>
<street>
<name>West Street</name>
<gov.illinois.chicago.street.name>West
St.</gov.illinois.chicago.street.name>
<zipcode>60603</zipcode>
</street>
...
</streets>
</city>
Please note that we know the real-name denotes a street in chicago
because of the context. Providing the context information in an element
name is both redundant and impractical.
Both approaches above would give Tom and Bob the flexibility you want.
However, I would think twice, or thrice, before allowing it. Sooner or
later you will end up with:
<street name="Main St.">
<name>Minor Street</name>
<zipcode>60709</zipcode>
</street>
and now you will have a bit of a problem figuring out which street this
really is, especially if 60709 is an area were there is neither a Main,
nor a Minor street.
Allowing duplication of data is rarely a good thing.
>
> The whole point is that I want to give Tom and Bob the flexibility to
> use whatever values they want for the names, but have an "internal"
> way to map those elements into standardized elements that anyone can
What is the purpose to allowing this flexibility in entering what is in
practise data in a database?
Will Tom and Bob be the sole consumers of their own data? Apparently
not, or there would be now reason to store a normalized version of the
street names. Are other consumers interested in Tom's and Bob's writing
quirks? Probably not. Are they interested in getting correct and
unambiguous information? Probably yes.
Building a strong case for allowing ambiguous data to be stored seems
hard to do. (In this particular case. Not necessarily always.)
You have too consider all of this before making a decision on
implementing the schema you are considering.
>
> Does this make sense? Is there a better way to do this? Am I off in
> the weeds?
Well, I would probably go with trying to get Tom and Bob to enter
correct data in the first place. A system that helps them do this does
not have to be obnoxious or intrusive.
Of course, one could also ask: if Tom has gone through the trouble of
entering the data, why should Bob have to do the same thing? Wouldn't it
be better if Bob could just reuse Tom's entry, preferably by linking to
it?
/Henrik
|