[
Lists Home |
Date Index |
Thread Index
]
> -----Original Message-----
> From: Jim Melton [mailto:jim.melton@acm.org]
> Sent: Tuesday, August 09, 2005 12:10 PM
> To: ian.graham@utoronto.ca
> Cc: DuCharme, Bob (LNG-CHO); 'Bullard, Claude L (Len)';
> xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Is HTML structured or unstructured information?
>
> I have a slightly different take on the distinction between
> "structured"
> and "unstructured" (and the less-well understood "semi-structured").
>
> I agree that SQL data is well structured, not because its
> intended meaning is unambiguous (hah! you should see some of
> the databases...but that's another rant), but because every
> piece of information is "there". SQL, of course, represents
> data as rectangular structures called tables. A table is a
> structure, having a particular number of columns, in which
> there are rows of data, each having exactly one value
> corresponding to each column of the table. SQL doesn't use
> the word "cell", but it's convenient to use in this
> discussion. Every cell in every SQL table has a value. That
> value might be SQL's "null value", but the cell is always "there".
>
> Unstructured data is...well, unstructured. A decent example
> is the text of this email message. You might perceive
> structure, such as paragraphs and sentences, but those are
> artifacts of my use of common English/Western conventions,
> not actual structure. And, most importantly, there is no
> single "thing" that you can identify that is required,
> optional, or prohibited in this message. There is no
> structure at all.
That's true for the text - but the e-mail message as a whole may be
considered semi-structured regarding its inclusion of sender, receiver,
subject, etc.
Joe
Joseph Chiusano
Booz Allen Hamilton
O: 703-902-6923
C: 202-251-0731
Visit us online@ http://www.boozallen.com
> HTML, and (more importantly to many) XML, are semi-structured
> by nature, although it is certainly possible to force
> specific scenarios using those markup languages to be fully
> structured (by requiring validation against a DTD or Schema
> that makes everything mandatory, for example). To me,
> "semi-structured" means that there is structure there, but it
> is not completely reliable. Information may be missing
> entirely...not present but marked as "unknown" or "missing"
> or "irrelevant" (analogous to some meanings for SQL's null
> value)...but completely absent.
>
> I could not, in good conscience, call HTML "structured" by
> any stretch of the meaning. But it is certainly not
> unstructured, either. I must fall back on that hybrid
> concept with the name "semi-structured".
>
> Hope this helps,
> Jim
>
>
>
> At 8/9/2005 09:35 AM, ian.graham@utoronto.ca wrote:
> >Quoting "DuCharme, Bob (LNG-CHO)" <bob.ducharme@lexisnexis.com>:
> >
> >Yes +1
> >
> >OTOH, I've seen stuff so horrible on both counts it arguably
> should be "No"
> >
> > > >Is HTML structured or unstructured information?
> > >
> > > Yes!
> > >
> > > But seriously... if "Structured information may be
> characterized as
> > > information whose intended meaning is unambiguous" and "The
> > > canonical example of structured information is a
> relational database
> > > table" then the article is building from a shaky premise, because
> > > the intended meaning of the data in a relational database
> table can easily be ambiguous.
> > >
> > > If it means that a relational table is structured because the
> > > individual pieces of information in it are clearly delineated and
> > > their structural relation is unambiguous, which makes
> sense to me,
> > > then I would consider HTML structured, especially when
> compared to
> > > the article's examples of unstructured information.
> > >
> > > Bob
> > > weblog: http://www.oreillynet.com/pub/au/1191
> > > homepage: http://www.snee.com/bob
> >
> >-----------------------------------------------------------------
> >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> >initiative of OASIS <http://www.oasis-open.org>
> >
> >The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> >To subscribe or unsubscribe from this list use the subscription
> >manager: <http://www.oasis-open.org/mlmanage/index.php>
>
> ==============================================================
> ==========
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone:
> +1.801.942.0144
> Co-Chair, W3C XML Query WG; F&O (etc.) editor Fax :
> +1.801.942.3345
> Oracle Corporation Oracle Email: jim dot melton at
> oracle dot com
> 1930 Viscounti Drive Standards email: jim dot melton at
> acm dot org
> Sandy, UT 84093-1063 USA Personal email: jim at
> melton dot name
> ==============================================================
> ==========
> = Facts are facts. But any opinions expressed are the
> opinions =
> = only of myself and may or may not reflect the opinions of
> anybody =
> = else with whom I may or may not have discussed the issues
> at hand. =
> ==============================================================
> ==========
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an initiative of OASIS
> <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
|