OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Large xml processing into SQL Server

[ Lists Home | Date Index | Thread Index ]

while i'm not familiar with bea stuff, my initial reaction (particularly 
given the requirement to turn the hierarchical xml into "flat" records) 
would be to use a sax based parser so that the whole document doesn't 
have to be in memory at once.

as this is a new person to the list i hope you'll forgive my xml 101 
approach - but it it helps :)

you can parse this document two ways - document at once with a "dom" 
parser such as xslt - although i don't quite understand why 72mb would 
break such a beast - that's not an overly large file by today's 
standards - or you can use a streaming parser ("sax") which only holds 
the unprocessed content passing details of your document back as it 
breaks it up into xml elements - and for your requirements this is most 
likely the best approach.

i'm afraid it's up to you (or others on the list) to find out how bea 
provides these functions

rick

Christopher Milton wrote:

>These links may help:
>
>"Using the WebLogic XML Streaming API"
>http://e-docs.bea.com/wls/docs81/xml/xml_stream.html#1092257
>"BEA WebLogic XML/Non-XML Translator"
>http://e-docs.bea.com/wlxt/10/index.htm
>
>>From experience, I set my java -Xmx# to at least 4 times the
>size of the file I'm parsing for in-memory processing like DOM
>and at least twice the size for stream processing like SAX.
>YMMV
>
>--- william.j.boxall@accenture.com wrote:
>  
>
>>Thank you all who have offered to help!
>> 
>>I am home sick right now, so don't have access to my work computer so I can't
>>check all the settings.. The driver we are using, though, is
>>weblogic.jdbc.sqlserver.SQLServerDriver.
>>The temp table I am attempting to write the records to is:
>> 
>>CREATE TABLE [TMP_TAN_TAB] (
>> [TAN_PREFIX] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
>> [TAN_NUMBER] [char] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
>> [CHANNEL_TYPE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
>> [STATUS] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
>> [TIN] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
>> [NON_GDS_IND] [bit] NULL ,
>> [ACTIVE_IND] [bit] NOT NULL ,
>> [LAST_UPDATE] [smalldatetime] NOT NULL ,
>> [INTERFACE_NAME] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
>>) ON [PRIMARY]
>>GO
>>
>>And a sample 3 record xml file is:
>>
>><?xml version="1.0"?> 
>><AMEXTANToACNTAN DateTimeCreated="20040127165815" RecordCount="3"
>>xmlns="stsi/amextan" SerialNum="1">
>>    <BusinessUnit Code="AAAA" TANCount="2">
>>        <ITEM TAN="AAAA00324">
>>            <ChannelType>P</ChannelType>
>>            <TANStatus>A</TANStatus>
>>            <TIN>       </TIN>
>>        </ITEM>
>>        <ITEM TAN="AAAA98422">
>>            <ChannelType>R</ChannelType>
>>            <TANStatus>U</TANStatus>
>>            <TIN>1234567</TIN>
>>        </ITEM>
>>    </BusinessUnit>
>>    <BusinessUnit Code="BBBB" TANCount="1">
>>        <ITEM TAN="BBBB00420">
>>            <ChannelType>P</ChannelType>
>>            <TANStatus>C</TANStatus>
>>            <TIN>       </TIN>
>>        </ITEM>
>>    </BusinessUnit>
>></AMEXTANToACNTAN>
>>
>>
>>But the size of the file IS big.. The sample file we received was 72 meg. 
>>That equates into roughly 819,000 records.
>> 
>>This huge a file won't be an every day occurance, though.  It's a 1 time
>>initial load, and thereafter the daily files we receive will be smaller,
>>though we do not know exactly what size they will be.
>> 
>>Currently, I'm attempting to work with a smaller version of the file, which
>>is only 3 meg, or about 33,000 records.
>> 
>>What we're using to process the xml is whatever came packaged with BEA
>>Weblogic.. I'll admit I didn't dig too deeply into that.  There is a file
>>event generator that picks up the physical file and passes it to my process
>>as an xml document.  I then store it in a document type, and do my iteration
>>and validation checks on individual fields, then as I iterate through and the
>>individual record passes the validation routine, I add it to a batch sql
>>statement, which I pass to the database at the end.  
>> 
>>We do not use DTDs or XSLT, currently, though we may have to.  We do have a
>>schema, though, which is in hierarchical format.  It may need to be flattened
>>if we are to do a bulk load.
>> 
>>Once again, thank you for any help you can give!
>>Bill
>> 
>>
>>
>>This message is for the designated recipient only and may contain privileged,
>>proprietary, or otherwise private information.  If you have received it in
>>error, please notify the sender immediately and delete the original.  Any
>>other use of the email by you is prohibited.
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>>    
>>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>  
>

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS