OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: XML Integration

I don't recommend parsing the stream yourself looking for tags. Although XML
is a very simple syntax, I've seen many XML messaging implementations that
are buggy because they parsed XML incorrectly. There are plenty of fine
parsers out there that work. Don't reinvent the wheel; it's merely an
invitation to bugs.

The manner in which you do this and the tools you use will be dictated first
and foremost by your choice of a development platform. Will you be using
Java? Microsoft technologies? Perl? Python?

I don't know much about the latter two, but I have plenty of experience with
the Java and Microsoft tools and would be happy to share my experiences.

If using Microsoft tools, check out their XML SDK. It has everything you
need. It includes an XMLHTTPRequest object for sending XML via an HTTP POST
and reading the XML response. Their DOM implementations can read/write
directly to ASP Request/Response objects to simplify the server-side
implementation, as well.

If using Java tools, you'll need to do a bit more work, but it is still
pretty easy to do if you know the pitfalls to avoid. You can check out
Apache SOAP to see if it suits your needs. If not, you can roll your own
solution using any decent XML parser (there are plenty) and either the
HttpURLConnection class in the JDK (which has problems, but can be used for
this) or an alternative HTTP implementation. Unfortunately, almost all of
the HTTP implementations in the Java world are a bit problematic for XML
integration. They are all written with too many assumptions about how they
are used and are all written to support end-user tools, not integrations.
(I've seen code in such toolkits, for instance, that intercepts an HTTP
error status code, and returns to the client an HTTP success status code
with a hard-coded HTML web page containing an error message rather than
conveying the error message returned from the server. The JDK
HttpURLConnection class throws a FileNotFoundException with no error message
if it receives an HTTP error status code.) If you seek an alternative HTTP
implementation rather than relying upon HttpURLConnection, I recommend using
the HTTPRequest class included in Sun's Open Source Brazil framework. It's
the only one I've found that doesn't make any of the sort of flawed
assumptions about usage that the others all make. However, many implementors
simply use the JDK HttpURLConnection class. In spite of its problems, it can
be used for most XML/HTTP integrations. Don't use the JDK class, though, in
any instance where an integration might explicitly set an HTTP status code
to indicate an error (which SOAP implementations do, but which is otherwise

Any of the Java XML parsers out there will permit you to read XML directly
from an HTTP stream or write to an HTTP stream. You can load the XML into a
DOM if you wish, use JDOM as an alternative (www.jdom.org), or use the
low-level SAX API. Be attentive to character encoding issues, though. Never
create an OutputStreamWriter without an explicit encoding, and never create
an InputStreamReader without an explicit encoding. If you do so, the reader
or writer is created with a platform-dependent encoding; this can cause
interoperability problems, and won't support internationalized text. If you
know who you are going to be integrating with and can specify which
character encodings they must use, stick with UTF-8 and UTF-16. They can
handle any Unicode characters. For simplicity, you could even specify only
use UTF-8. Character encoding is indicated in the "Content-Type" header as a
"charset" parameter. For example:
     Content-Type: text/xml; charset=utf-8
or   Content-Type: text/xml; charset="utf-8"

HTTP is case-insensitive with the charset parameter. Java, however, is very
picky with some of the encoding names, and does not always use the same name
as the official IANA name (which is what you must use in the header). For
utf-8, it's best to specify "UTF8" as the encoding with Java (although it
will accept some variations). For utf-16, use "Unicode" as the Java encoding
name (in the same case as I've shown; Java's picky with this one).

I've made a point of mentioning the character encoding issue because this is
an issue that relatively few developers are familiar with and which
frequently leads to interoperability problems or problems with
internationalization support. Within a document, the encoding can be
indicated via the XML declaration.
For example:
<?xml version="1.0" encoding="utf-8"?>

But when transmitting XML over HTTP, the proper way to indicate encoding is
via the charset parameter in the Content-Type header.

I hope this helps (or at least that it doesn't simply make matters more

> -----Original Message-----
> From: Jon Ceanfaglione [mailto:Jcean@cisglobal.com]
> Sent: Monday, January 29, 2001 8:14 AM
> To: xml-dev@lists.xml.org
> Subject: XML Integration
> Hello
> I'm getting ready to begin an integration project using HTTP 
> to transmit and
> receive XML formatted data.  What I'm interested in is getting some
> suggestions about how you might go around transmittng and 
> receiving the XML.
> I have my own ideas, like doing a read on the entire HTTP 
> stream and looking
> for the beginning xml tag and the ending xml tag and then 
> load it into a
> DOM.  What are some other ideas out there?