[
Lists Home |
Date Index |
Thread Index
]
FYI, a preliminary discussion related to a subset of the pipeline can
found at
http://dsd.lbl.gov/nux/api/nux/xom/binary/BinaryXMLCodec.html
Wolfgang.
On Nov 11, 2004, at 1:44 PM, Roger L. Costello wrote:
> Hi Folks,
>
> I am interested in knowing the state-of-the-art practice
> for enhancing the performance of XML-based client-server interactions.
>
> Let us consider the process of a client sending XML to a server.
> Below I identify 3 "parts" to this process:
>
> Part 1: Client prepares the XML
>
> Part 2: Transmittal of the XML
>
> Part 3: Server processes the XML
>
> Now let us consider each part in turn, with the goal of determining
> the state-of-the-art practice for enhancing the performance of each
> part.
>
> Part 1: Client prepares the XML
>
> At some point the client decides to compose and prepare XML for
> transmittal
> to
> the server.
>
> Compose the XML
>
> The method employed to compose XML is highly variable. For example,
> XML
> could
> be composed from a Java program, or from a database query. I will
> restrict
> this investigation just to considering XML composition from a database
> query.
>
> The time required to compose XML from a database query will vary
> depending
> on which database is used: Oracle, SQL Server, MySQL, native XML versus
> relational, etc.
>
> Question: has anyone done a study comparing the time required to
> compose XML
> by the different databases?
>
> Prepare the XML
>
> Oftentimes the client will choose to validate the XML prior to
> transmittal.
> Validating XML could potentially take a significant amount of time.
> The
> time required will vary depending upon these factors:
>
> - Validation language: which language you use (DTD, XML Schemas,
> RelaxNG,
> Schematron, OASIS CAM) can determine how long the validation will
> take.
>
> - Parser: which parser you use (e.g., Apache Xerces, XML Spy, etc) can
> also
> impact the time required to validate.
>
> Question: has anyone done a study comparing validation times across
> validation
> languages and validation times across parsers?
>
> Part 2: Transmittal of the XML
>
> There is a delay between the moment the client sends the XML to
> the moment the server receives the XML.
>
> Assertion: the dominating factor in determining the length of the
> delay is the size of the XML[1]. Small XML chunks gets from client
> to server quicker than large XML chunks.
>
> What are the options for reducing the delay? I am aware of 4
> techniques:
>
> 1. Compression
> 2. Binary encoding
> 3. Streaming
> 4. Minimize markup
>
> Technique 1: Compression
>
> There are numerous XML compression tools. I will list 2 such
> tools here:
>
> - XMill
> - Bzip
>
> Technique 2: Binary encoding
>
> The W3C has a XML Binary Characterization (XBC) Working Group that is
> actively
> working to define a standard binary encoding for XML. I believe that
> the
> fruits
> of their labor will not be useable for several years.
>
> Technique 3: Streaming
>
> The idea of both HTML streaming as well as XML streaming is to break
> up into
> small
> chunks the data to be transmitted and then successively transmit one
> chunk
> at a time.
>
> The SAX event-based model is a form of streaming.
>
> Question: is it viable to use SAX in a client-server interaction? For
> example, if
> you are transmitting a SOAP message would it be reasonable to stream
> the
> SOAP? Is
> there such a thing as "SOAP Streaming"?
>
> Question: is the streaming technique viable for Web Services?
>
> Technique 4: Minimize markup
>
> Assertion: XML tags are the source cause for the increase in size of
> the XML
> data.
>
> In recognition of this, one solution is to design your XML to minimize
> the
> number
> of tags used. One approach for doing this is to maximize the use of
> attributes[2].
>
> Question: is the "attribute heavy" approach an effective approach for
> reducing delay?
> Is it a good approach?
>
> Question: all 4 techniques above attempt to reduce the delay via
> reducing
> the
> "size" of the data. Are there other things that can be done to the
> data
> that
> would reduce the delay?
>
> Part 3: Server processes the XML
>
> The server has now received the XML. The server may choose to
> validate it.
> In Part 1
> above we discussed the impact on time due to validation.
>
> After validating the server "processes" the XML. Clearly, what it
> means to
> "process"
> XML is highly variable. I shall restrict the discussion just to
> storing
> the XML into a database. This is the mirror of that considered in
> Part 1,
> where
> we were interested in the time required to construct XML from a
> database
> query. The same
> issues arise: what database is being used? Is the database a native XML
> database or
> a relational database?
>
> Question: has anyone done a performance analysis of storing XML into a
> database?
>
> Summary
>
> Above I discussed the delays introduced when a client sends XML to a
> server.
>
> Below is a summary of all the delays:
>
> database ---> XML ---> validate ---> transmit ---> validate --->
> database
> T1 T2 T3 T2 T4
>
> The time for all the delays are: T1 + 2 * T2 + T3 + T4
>
> Have I missed any steps/delays? /Roger
>
> [1] Obviously there are many factors other than the size of the data
> which affect the delay, such as network problems. Those are problems
> that the client has no control over. I am focused on the delays due
> to the information itself (which the client does have control over).
>
> [2] Whereas elements have a start-tag/end-tag pair, attributes don't
> have
> the concept of an "end attribute tag". Thus, by using attributes
> you can effectively reduce by half the amount of markup.
>
>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
|