Hi Folks, This is so cool. Here is a picture of a place in Norway: That’s a JPEG file. It’s a binary file.
Open it in a hex editor and here’s (a portion of) what you will see: Fun, aye? Now, did you know that there is a standard technology for parsing binary files, where the output of parsing is XML? There is, and that technology is called Data Format Description Language (DFDL).
It’s awesome. Using DFDL, I parsed the JPEG (binary) file. That resulted in this XML: <JPEG> Wow! Next, I wrote an XSLT program that scoured the XML, removing evil stuff like comments. (Did you know that comments can be inserted into images and you will never see them with your eyes? Yikes!).
The output of my XSLT program was a sanitized XML-JPEG file. Lastly, I used DFDL to “unparse” the sanitized XML-JPEG file. DFDL reconstituted the JPEG (binary). Here it is: That JPEG file doesn’t have any evil stuff in it. (Well, lesser amounts of evil stuff anyway) Let’s recap: using a standard technology (DFDL) I parsed a binary file (JPEG). The output is XML. I then processed the XML using the full suite of XML technologies (XSLT, Schematron, XML Schema,
etc.). Lastly, I unparsed the XML to reconstitute the binary. That is totally amazing! /Roger |