OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Sax filter encoding problem

[ Lists Home | Date Index | Thread Index ]

I am trying to write a simple sax filter in Java to
experiment spliting large xml file into small ones.
But I found I couldn't get same content as that in
original xml file. 

For example, I have a xml file as following which was
encoded as "UTF-8".

<?xml version="1.0" encoding="utf-8"?>
<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil” test</Record>
</Root>

It looks OK when I used IE to browse it but after I
used it as input file and run my sax program (just use
sax API to write same file to output file), the
content changed to as followings:

<Root>
<Record>Multipurpose abcd</Record>
<Record>the banana oil test</Record>
</Root>

I checked the respective binary: "e2 80 9c " changed
to "93" and "e2 80 9d" changed to "94"! It's not what
I wanted and also I got error when I tried to use IE
to browse it!

At this time, I used 

parser.parse(new InputSource(new File
(input_file_name).toURL().toString ()));

in my program.

And then, I tried another way:

FileReader in_file = new FileReader(input_file_name);
parser.parse(new InputSource(new File
(args[0]).toURL().toString ())));

after running my program, the output looks like:

<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil? test</Record>
</Root>

Speaking with binary, "e2 80 9c " is OK but "e2 80 9d"
still was changed to "94". It's also illegl character
when I use IE to browse it.

Can any body tell me how to handle this problem? And
which way is best way to wrap the input file in
inputsource? Any reply would be appreciated!

My test program likes following:

public class parseXML extends DefaultHandler {

       public void startElement(java.lang.String
namespaceURI,
		java.lang.String localName, java.lang.String qName,
Attributes atts)
	{
             ......
        }

        public void characters(char[] ch, int start,
int length)
	{
           for(int i=0; i<length; i++){
                                                      
                   System.out.print(ch[start+i]);
                }
                .......
        }
......
}


Thanks.

Leo



__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS