OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] storing XML files

Hi Albena:

Looks like you are using the NewsML DTD.  If your requirement is to 
search on the keywords and the date issued, you can create a simple 
schema with the XML document as a blob or a clob and date issued as a 
column.  Use the document ID as the key and do a one to many mapping 
into a keyword table.  As long as you do not need to access fragments of 
each document like {RETURN the Headline when keyword = "foo" and date 
issued="yy:mm:dd"} this will work fine.

But you will have to create a schema for each DTD or type of document 
like stock quote, weather report, horoscope etc.

There are many ways to describe a relational tables in XML.  Let me see 
if I can send you some URLs.

Look for other answers below.



Albena Georgieva wrote:

> Hi nice people,
> Many thanx to all who took their time to answer my questions and think
> together with me on reaching the right approach and tools for my project.
> Some of you asked me for additional or more detailed information to be able
> to extend the discussion.
> the data I am dealing with is:
> ----------------------------------------
> 5, 6 different xml data feeds
> possible several new data feeds in the near future
> they are usually very small files
> their DTD's are well defined and don't change over time, only I don't know
> in advance the DTD's of the future data feeds.
> nesting depth is very small
> Ronald: What do you mean by "semi-structured"_ness of the data?

Your key-list element is semi-structured as you do not know apriori how 
many keyword elements are likely to be there.
The other one would be mixed content:

For example you may have a <p> element with a <ticker> element embedded 
in it.  Actually this would create a problem
for you if you define a relational schema with the document as a blob or 
a clob.

> <!--     EXAMPLE -->
> <?xml version="1.0" encoding="ISO-8859-1" ?>
> <!DOCTYPE nitf SYSTEM "nitf-adjusted-13c.dtd">
> <nitf>
> <head>
> <meta name="onlinefolder" content="Entertainment"></meta>
> <docdata>
> <date.issue norm="20010302 150629+0100"></date.issue>
> <key-list>
>         	<keyword key="Maxima"></keyword>
>         	<keyword key="Princess"></keyword>
>   	</key-list>
> </docdata>
> </head>
> <body>
> <body.head>
> <hedline><hl1>Maxima speaks perfect dutch</hl1></hedline>
> </body.head>
> <body.content>
> <p>Princess Maxima presented the Princess Collection 2000 in an estate with
> a very royal ambiance. The dazzling collection is also presented in the new
> catalogue full of show, glitter and glamour.</p>
> </body.content>
> </body>
> </nitf>
> <!--     /EXAMPLE -->
> All I want to do with them for now, is to save them in a RDBMS. The rest of
> the applications (ASPs or servlets) will access that data through ODBC or
> JDBC. 
> They will be interested in such queries:
> Find and display all datafeeds containing "this" and "that" keywords ...
> <key-list>
> 	<keyword key="Maxima"></keyword>
> 	<keyword key="Princess"></keyword>
> </key-list>
> Find me all the datafeeds issued on that date
> <date.issue norm="20010302 150629+0100"></date.issue>
> If the applications access the datafeeds through ODBC or JDBC and they don't
> ask for XML format, ( no need for XML retrieval ) I see no reason for
> introducing a XML native database at this point, but please correct me I am
> wrong ...
You are correct assuming that when you search across datafeeds, you are 
only interested in searching
on keywords and date issued.  Again the assumption is that all datafeed 
will have those elements.

> I just need a good way to transform a XML data feeds into relational
> database model. That is why, I asked for something like DTD2SQL or XML2SQL
> tool. So, I imagined applying XSLT transformations to every different
> datafeed to transform it into some DTD for SQL and then just run that SQL.
> My idea was to generate some intermediate XML that describes a RDBMS
> transactions.

A general automatic generation of this is done by the DB extenders.  
Obviously each vendor does it
a bit differently.  You can come up with your own based on your 
application requirement as I indicated
in the beginning of this response.  Then you can write a stylesheet to 
transform each XML into you
particular RDMS representation and write the INSERT code to make sure of 
transactional integrity

Or you can use Xfinity and call InsertXML and forget about it.  You can 
obviously search on kewords,
date issued as you mention above.  Sorry for the plug.

> Something maybe like this, but more sophisticated, I guess ... and simple
> enough :)
> <!--    example of standard DTD for SQL output -->
> <?xml version="1.0" encoding="ISO-8859-1" ?>
> <!DOCTYPE sql SYSTEM "sql.dtd">
> <sql>
> <record table="anp" type="master">
>    <column name="issued" type="date">1.2.2221</column>
>    <column name="description"  type="text">
>           Princess presented the Princess Collection 2000 in an estate with
> a very royal ambiance. The dazzling collection is also presented in the new
> catalogue full of show, glitter and glamour. 
>    </column>
>    <record table="keyword" type="detail">
>        	<column name="keyword"  type="text">Maxima</column>
>    </record>
>    <record table="keyword" type="detail">
>        	<column name="keyword"  type="text">Princess</column>
>    </record>
>    <record table="category" type="lookup">
>        	<column name="category"  type="text">Entertainment</column>
>    </record>
> </record>
> <!--    example of standard DTD for SQL output -->
> So, I am searching for a standart DTD to describe records ot be inserted,
> relationships between them and in general, to create SQL fairly strait
> forward from it.

I do not think there is one standard DTD for this. 

> Thanx again for your time and friendly support.
> Nice day!
> Albena
> _________________
> Albena Georgieva
> Internet Software Developer
> Info.nl
> Sint Antoniesbreestraat 16
> 1011 HB Amsterdam
> t. +31 (0) 20 530 9100
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>