XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Parse binary files into XML, sanitize the XML, reconstitute the binary

Ahh, but did you clean out steganography…

 

Nice use of an existing tool. Roundabout, but full points for the tool reuse.

 

From: Costello, Roger L. [mailto:costello@mitre.org]
Sent: Tuesday, December 13, 2016 9:19 AM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Parse binary files into XML, sanitize the XML, reconstitute the binary

 

Hi Folks,

This is so cool.

Here is a picture of a place in Norway:

That’s a JPEG file. It’s a binary file.

Open it in a hex editor and here’s (a portion of) what you will see:

Fun, aye?

Now, did you know that there is a standard technology for parsing binary files, where the output of parsing is XML? There is, and that technology is called Data Format Description Language (DFDL). It’s awesome.

Using DFDL, I parsed the JPEG (binary) file. That resulted in this XML:

<JPEG>
 
<Segment>
   
<SOI> </SOI>
 
</Segment>
 
<Segment>
   
<APP0>
     
<Length>16</Length>
     
<Identifier>JFIF-APP0</Identifier>
     
<version>
       
<major>1</major>
       
<minor>1</minor>
     
</version>
     
<Units>Dots per inch</Units>
     
<Xdensity>72</Xdensity>
      
<Ydensity>72</Ydensity>
     
<Xthumbnail>0</Xthumbnail>
     
<Ythumbnail>0</Ythumbnail>
     
<RGB/>
   
</APP0>
 
</Segment>
  …
 
</Segment>
 
<Segment>
   
<EOI> </EOI>
 
</Segment>
</JPEG>

 

Wow!

 

Next, I wrote an XSLT program that scoured the XML, removing evil stuff like comments. (Did you know that comments can be inserted into images and you will never see them with your eyes? Yikes!). The output of my XSLT program was a sanitized XML-JPEG file.

 

Lastly, I used DFDL to “unparse” the sanitized XML-JPEG file. DFDL reconstituted the JPEG (binary). Here it is:

That JPEG file doesn’t have any evil stuff in it. (Well, lesser amounts of evil stuff anyway)

Let’s recap: using a standard technology (DFDL) I parsed a binary file (JPEG). The output is XML. I then processed the XML using the full suite of XML technologies (XSLT, Schematron, XML Schema, etc.). Lastly, I unparsed the XML to reconstitute the binary.

That is totally amazing!

/Roger

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS