[
Lists Home |
Date Index |
Thread Index
]
> > When you store a document in a text column, it's searchable using SQL
> > queries.
>
> XML unaware full text search would not normally find "<strong>Now</strong>
Even before products such as Oracle Text and DB2 Text Extender became XML-aware,
you could store a document in a CLOB and match occurrences of
"<runtime>123</runtime>". Even the older Oracle ConText (vintage 1997) had the
ability to search HTML, Lotus 1-2-3, Microsoft Word, Word Perfect and other text
with theme matching and the ability to invoke user-defined filters.
For something like this,
<videos>
<VHS>
<title>Sideways</title>
<runtime>123</runtime>
<rating>R</rating>
</VHS>
<VHS>
<title>Titanic</title>
<runtime>194</runtime>
<rating>PG13</rating>
</VHS>
<DVD>
<title>Sideways</title>
<runtime>123</runtime>
<rating>R</rating>
</DVD>
<DVD>
...
</DVD>
</videos>
we want to exploit structure if we're searching for a DVD with a runtime of 123
minutes.
With Oracle Text, for example, you can the WITHIN operator to match text within
tags. With DB2 Text Extender, you can define a document model that enables you
to use indexes for section searches of HTML, XML or other tagged text. You can
enable text searching of CLOBs, CHARs, VARCHARs, and other text types.
> More importantly, if I have XML data then I normally want to use the
> markup to delimit and index the searches.
> The markup has meaning. It should be used to inform and optimize the queries.
Agreed. That's was my point in this sentence:
"Developers wanted more, of course, such as being able to describe document
structure"
If you describe the document structure to a database manager, you can do more
precise searching. SQL platforms have supported techniques for defining document
structure since the late 90s (e.g., the DAD used by DB2 XML Extender).
The techniques I mentioned were presented in an order that's basically moving up
the food chain: CLOBs -> storing structure information -> XMLType.
Even if you're storing XML in a CLOB, there's added value in providing to the
DBMS a definition of the document structure.
|