XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: Seek your help in compiling a list of "facts aboutbase64-encoded data"

As regards the need for more decoding for base64 data:

I think the point Roger’s trying to make is that from the standpoint of communicating data to a system that is expected to be able to process it in some way, if you provide a stream of bytes that is a JPEG image then the consumer of that stream will be able to process it directly, assuming it is otherwise able to recognize JPEG byte sequences and decode them (e.g., a browser with a built-in JPEG renderer). If you provide a sequence of characters representing the base64 encoding of the same image, then another decoding step is first required.

It’s also important that the JPEG byte sequence is self-identifying as being JPEG data (using the magic numbers associated with well-known media types)—a receiver of arbitrary byte sequences can at least distinguish those that claim to be of a particular type by use of magic numbers from those that don’t. 

By contrast, a sequence of characters that (might be) a base64 encoding of something are not self-describing as being base64 and also are not self-describing as to the nature of the data the encoding is of. Thinking about it now, it would definitely be convenient if base64 encoding had something that self-identified it as base64-encoded data and further indicated what the original data was, if it can be meaningfully identified. For example, it might be useful to know that the base64 data encoded a JPEG and not a PNG.

Of course, once the base64 encoded data is decoded it is no different from any other byte sequence that might be received. The fact that it had been base64 encoded is not really interesting.

I do wonder what is intended by “causing harm”? 

The implication seems to be some sort of malicious attack through data injection by disguising the data as “plain text”. But if that’s a possibility it’s not the fault of base64 in particular but in the use of a data format that requires decoding. I’ve often circumvented mail system restrictions on specific file types by zipping them or double zipping them. But even then it’s ultimately the responsibility of the receiving system not to trust any data it gets from any source. 

That is, if a malicious agent were to send you base64 data you could decode it to a new byte sequence without danger—the mere presence of the bytes in some storage context cannot, by itself, cause harm. The potential harm would be if you then gave those bytes to some process that, in the act of consuming them, caused harm, e.g., treating the decoded bytes as an executable and running it.

If the data that was encoded could cause harm, for example by exploiting a vulnerability in a graphic renderer or media player, the fact that it was base64 encoded cannot be blamed—the fault is with the creator of the original data and the implementor of the vulnerable consumer of the data and maybe with the user’s lack of caution or misplaced trust in the data’s source.

But it’s impossible to see how the use of base64, or any other encoding, could, by itself, be held to blame for any harm caused by the decoded data.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 9/16/17, 1:09 PM, "David Carlisle" <d.p.carlisle@gmail.com> wrote:

    > I would also be very interested to hear your thoughts on 17. If you don't mind, would you post your thoughts to the list, please?
    
    well, as you ask...
    
    On 16 September 2017 at 14:09, Costello, Roger L. <costello@mitre.org> wrote:
    > Hi Folks,
    >
    >
    >
    > Thank you for your excellent comments. I added them to my list. Below is the
    > updated list. Is it missing anything?  /Roger
    >
    >
    >
    > Here are some things to know about base64:
    >
    >
    >
    > 1.       Base64 encoding is specified in RFC 4648.
    >
    > 2.       Base64-encoded data is plain text.
    >
    > 3.       There are several base64 alphabets:
    >
    > a.       The standard base64 alphabet consists of these 64 ASCII characters:
    > a-z, A-Z, 0-9, +, / and the equals symbol ( = ).
    >
    > b.       In the URL and Filename safe base64 alphabet, the plus symbol ( + )
    > is replaced with a minus sign ( - ) and the forward slash symbol ( / ) is
    > replaced with an underscore symbol ( _ ).
    >
    > c.       Other base64 alphabets are called non-standard, or custom, base64
    > alphabets.
    >
    > d.       As a common extension, base64 can also contain arbitrary
    > whitespace, which is ignored.
    >
    > 4.       Any type of file, from plain text to binary executable, can be
    > base64-encoded.
    
    for some definition of "file" and "executable".
    You are I think assuming a filesystem that stores files as streams
    with a character as end of line marker, not a system using fixed
    length records, also you don't normally encode the executable or other
    status in the base64 encoded string
    
    >
    > 5.       Base64-encoding enables binary objects to be transported using
    > text-based protocols, such as SMTP.
    >
    > 6.       People have created regular expressions that specify the pattern of
    > base64 text. It is possible for text to match the regular expression and yet
    > not be base64. That is, text that appears to be base64, may not be.
    
    It isn't clear what you mean by "not base64"  you can write a regex
    that matches text just if it can be decoded, but of course you can't
    tell if any string was generated by base64 encoding another string
    I might have just typed aGVsbG8gd29ybGQ=  as a random string of
    characters, or I may have base 64 encoded some other string, you can't
    tell unless I tell you (and even then you can't tell if I'm telling
    the truth)
    
    >
    > 7.       External information must be provided to tell whether text is
    > base64.
    
    as above, but nothing about base64 particularly, the same is true of any data.
    
    >
    > 8.       If external information says that text is base64, the external
    > information might be incorrect, either by accident or by intent.
    
    as above, but nothing about base64 particularly, the same is true of any data.
    
    >
    > 9.       Performing base64 decoding on data that is not base64 might cause
    > harm.
    
    I  think "causes harm" is the wrong thing to say. If the data isn't
    base64 encoded
    data then it might not decode at all, if it does decode then it may or may not
    decode to the intended data but that in itself doesn't cause harm.
    If you misuse any data then something may go wrong, but that's not really
    related to base64.
    
    >
    > a.       Is there a way to perform base64 decoding that is guaranteed to
    > never cause harm?
    
    No. You could decode it and pipe it to  /dev/null but if the
    information you discarded
    was "run now", ignoring that may be harmful. Using  or misusing or ignoring data
    can cause harm, whether or not base64 encoding is involved.
    
    >
    > 10.   There is nothing in base64-encoded data which identifies the media
    > type of the data. External information must be provided to tell the media
    > type. Without external information, the media type must be discovered (if
    > possible).
    
    The same is true of most plain text files base 64 encoded or not.
    
    >
    > 11.   If there is external information about the media type of the data, the
    > external information might be incorrect, either by accident or by intent.
    
    as above, but nothing about base64 particularly, the same is true of any data.
    
    >
    > 12.   Processing an object, assuming it is of media type A when it is
    > actually of media type B, might cause harm.
    
    as above, but nothing about base64 particularly, the same is true of any data.
    
    >
    > 13.   Decoding base64 text is a trivial task.
    >
    > 14.   Data that is base64-encoded cannot be directly viewed, used, or
    > inspected.
    
    any data on a computer needs to be decoded.
    If you have a jpeg image or a base64 encoded jpeg image then in either case
    you need to interpret the bytes in the data as an image, the software needed
    is slightly different but I don't think there's any conceptual difference.
    
    >
    > 15.   Compared to data that is not encoded, viewing/using/inspecting
    > base64-encoded data requires an additional step: decode and then
    > view/use/inspect.
    
    shrug. getting from the bytes in a file to some human understandable information
    is a black box in most cases, if the box is a bit longer so be it.
    
    
    > 16.   Without external information about the media type of the data that is
    > base64-encoded, there are two additional steps to viewing/using/inspecting
    > base64-encoded data: decode, determine the media type (if possible), and
    > then view/use/inspect.
    
    this is just restating the above and I don't think being base64 encoded changes
    much here other than the details of the software needed.
    
    >
    > 17.   Text formats such as XML and JSON cannot carry binary data. If binary
    > data must be carried by a text format, the binary data can be
    > base64-encoded, thus generating plain text, and then the base64-encoded
    > plain text can be carried by the text format.
    
    true, although it's not saying much. It can be base64 encoded. There
    are lots of other ways as well
    either other text encodings or non-text encodings such as using
    elements to replace control characters
    not allowed in xml.
    
    >
    > 18.   The size of base64-encoded data is roughly 4/3 the size of the
    > original data.
    >
     depending on the size of the character encoding used, as noted in the
    previous message,
    
    
    David
    
    _______________________________________________________________________
    
    XML-DEV is a publicly archived, unmoderated list hosted by OASIS
    to support XML implementation and development. To minimize
    spam in the archives, you must subscribe before posting.
    
    [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
    Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
    subscribe: xml-dev-subscribe@lists.xml.org
    List archive: http://lists.xml.org/archives/xml-dev/
    List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
    
    




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS