XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parse binary files into XML, sanitize the XML, reconstitute the binary

Hi Roger,

This is interesting but people would be more interested if we could see an image-editing tool, say like Picasa, written using these technologies. Then we would be able to compare this to the way this is done in a more conventional way, and also the quality of the code.

Cheers,
Dimitre 

Sent from my iPhone

On Dec 13, 2016, at 6:19 AM, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

This is so cool.

Here is a picture of a place in Norway:

<image001.jpg>

That’s a JPEG file. It’s a binary file.

Open it in a hex editor and here’s (a portion of) what you will see:

<image002.png>

Fun, aye?

Now, did you know that there is a standard technology for parsing binary files, where the output of parsing is XML? There is, and that technology is called Data Format Description Language (DFDL). It’s awesome.

Using DFDL, I parsed the JPEG (binary) file. That resulted in this XML:

<JPEG>
 
<Segment>
   
<SOI> </SOI>
 
</Segment>
 
<Segment>
   
<APP0>
     
<Length>16</Length>
     
<Identifier>JFIF-APP0</Identifier>
     
<version>
       
<major>1</major>
       
<minor>1</minor>
     
</version>
     
<Units>Dots per inch</Units>
     
<Xdensity>72</Xdensity>
      
<Ydensity>72</Ydensity>
     
<Xthumbnail>0</Xthumbnail>
     
<Ythumbnail>0</Ythumbnail>
     
<RGB/>
   
</APP0>
 
</Segment>
  …
 
</Segment>
 
<Segment>
   
<EOI> </EOI>
 
</Segment>
</JPEG>

 

Wow!

 

Next, I wrote an XSLT program that scoured the XML, removing evil stuff like comments. (Did you know that comments can be inserted into images and you will never see them with your eyes? Yikes!). The output of my XSLT program was a sanitized XML-JPEG file.

 

Lastly, I used DFDL to “unparse” the sanitized XML-JPEG file. DFDL reconstituted the JPEG (binary). Here it is:

<image003.png>

That JPEG file doesn’t have any evil stuff in it. (Well, lesser amounts of evil stuff anyway)

Let’s recap: using a standard technology (DFDL) I parsed a binary file (JPEG). The output is XML. I then processed the XML using the full suite of XML technologies (XSLT, Schematron, XML Schema, etc.). Lastly, I unparsed the XML to reconstitute the binary.

That is totally amazing!

/Roger

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS