OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Is HTML structured or unstructured information?

[ Lists Home | Date Index | Thread Index ]

> -----Original Message-----
> From: Jim Melton [mailto:jim.melton@acm.org] 
> Sent: Tuesday, August 09, 2005 12:10 PM
> To: ian.graham@utoronto.ca
> Cc: DuCharme, Bob (LNG-CHO); 'Bullard, Claude L (Len)'; 
> xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Is HTML structured or unstructured information?
> 
> I have a slightly different take on the distinction between 
> "structured" 
> and "unstructured" (and the less-well understood "semi-structured").
> 
> I agree that SQL data is well structured, not because its 
> intended meaning is unambiguous (hah! you should see some of 
> the databases...but that's another rant), but because every 
> piece of information is "there".  SQL, of course, represents 
> data as rectangular structures called tables.  A table is a 
> structure, having a particular number of columns, in which 
> there are rows of data, each having exactly one value 
> corresponding to each column of the table.  SQL doesn't use 
> the word "cell", but it's convenient to use in this 
> discussion.  Every cell in every SQL table has a value.  That 
> value might be SQL's "null value", but the cell is always "there".
> 
> Unstructured data is...well, unstructured.  A decent example 
> is the text of this email message.  You might perceive 
> structure, such as paragraphs and sentences, but those are 
> artifacts of my use of common English/Western conventions, 
> not actual structure.  And, most importantly, there is no 
> single "thing" that you can identify that is required, 
> optional, or prohibited in this message.  There is no 
> structure at all.

That's true for the text - but the e-mail message as a whole may be
considered semi-structured regarding its inclusion of sender, receiver,
subject, etc.

Joe

Joseph Chiusano
Booz Allen Hamilton
O: 703-902-6923
C: 202-251-0731
Visit us online@ http://www.boozallen.com
 
> HTML, and (more importantly to many) XML, are semi-structured 
> by nature, although it is certainly possible to force 
> specific scenarios using those markup languages to be fully 
> structured (by requiring validation against a DTD or Schema 
> that makes everything mandatory, for example).  To me, 
> "semi-structured" means that there is structure there, but it 
> is not completely reliable.  Information may be missing 
> entirely...not present but marked as "unknown" or "missing" 
> or "irrelevant" (analogous to some meanings for SQL's null 
> value)...but completely absent.
> 
> I could not, in good conscience, call HTML "structured" by 
> any stretch of the meaning.  But it is certainly not 
> unstructured, either.  I must fall back on that hybrid 
> concept with the name "semi-structured".
> 
> Hope this helps,
>     Jim
> 
> 
> 
> At 8/9/2005 09:35 AM, ian.graham@utoronto.ca wrote:
> >Quoting "DuCharme, Bob (LNG-CHO)" <bob.ducharme@lexisnexis.com>:
> >
> >Yes +1
> >
> >OTOH, I've seen stuff so horrible on both counts it arguably 
> should be "No"
> >
> > > >Is HTML structured or unstructured information?
> > >
> > > Yes!
> > >
> > > But seriously... if "Structured information may be 
> characterized as 
> > > information whose intended meaning is unambiguous" and "The 
> > > canonical example of structured information is a 
> relational database 
> > > table" then the article is building from a shaky premise, because 
> > > the intended meaning of the data in a relational database 
> table can easily be ambiguous.
> > >
> > > If it means that a relational table is structured because the 
> > > individual pieces of information in it are clearly delineated and 
> > > their structural relation is unambiguous, which makes 
> sense to me, 
> > > then I would consider HTML structured, especially when 
> compared to 
> > > the article's examples of unstructured information.
> > >
> > > Bob
> > > weblog: http://www.oreillynet.com/pub/au/1191
> > > homepage: http://www.snee.com/bob
> >
> >-----------------------------------------------------------------
> >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an 
> >initiative of OASIS <http://www.oasis-open.org>
> >
> >The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> >To subscribe or unsubscribe from this list use the subscription
> >manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> ==============================================================
> ==========
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: 
> +1.801.942.0144
>    Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : 
> +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at 
> oracle dot com
> 1930 Viscounti Drive      Standards email: jim dot melton at 
> acm dot org
> Sandy, UT 84093-1063 USA          Personal email: jim at 
> melton dot name
> ==============================================================
> ==========
> =  Facts are facts.   But any opinions expressed are the 
> opinions      =
> =  only of myself and may or may not reflect the opinions of 
> anybody   =
> =  else with whom I may or may not have discussed the issues 
> at hand.  = 
> ==============================================================
> ========== 
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
> <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS