[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: The String Datatype is the Worst Datatype Ever Created
- From: "Renner, Scott A." <sar@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 25 Sep 2015 16:51:17 +0000
I see that Roger has started another interesting discussion. Let me throw in a thought or two.
It is impossible to completely separate the data from the people. People supply the meaning of all data, which is always relative to the purpose of at least one person. Machines don't think. This is Dr. Scott's Rule of Data #1.
It is difficult but useful to distinguish between data for machine processing and data for user presentation. The key factor is the situation of the person who understands the data and provides the meaning relative to the purpose. For machine processing, that person is the programmer, who understands the data specification but never sees the actual data at runtime. I have slides for this concept in various NIEM briefings. [http://dodccrp.org/events/19th_iccrts_2014/post_conference/presentations/129.pdf]
Some data is for both machine processing and user presentation. Consider the street address. One might think this is always for user presentation, but not so. Machine processing reads that address and calculates carrier routes for the post office, allocates packages and does route planning for trucks, etc.
For some kinds of machine processing it is usually better to use enumerations instead of strings when feasible. For example, if the data drives a case statement, like this
switch (country_code) {
case CAN:
printf("I think you mean North Montana :-)\n");
break;
Database index fields are another place where you would rather have enumerations than strings. Short messages over a bandwidth-constrained network is yet another. But there are always exceptions. Difficulties in configuration management sometimes mean you must allow a string value in addition to your enumeration / controlled vocabulary. Some data elements cannot be feasibly enumerated, like National Stock Number. And so forth...
So I think that properly qualified, a variant of Roger's rule might make sense, something like: For machine-processed data, an enumeration / controlled vocabulary is usually preferable to an unconstrained string, when feasible. Feasibility is high when the set of terms is small and relatively stable. Feasibility is low when the set is large and fast-changing. An enumeration of state codes is feasible; an enumeration of book titles, um... not so much.
cheers,
-- scott
--
Dr. Scott Renner, The MITRE Corporation
+1 703-983-1206 (office); +1 978-831-2598 (cell)
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]