OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: Seek your help in compiling a list of "facts aboutbase64-encoded data"

> I would also be very interested to hear your thoughts on 17. If you don't mind, would you post your thoughts to the list, please?

well, as you ask...

On 16 September 2017 at 14:09, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
> Thank you for your excellent comments. I added them to my list. Below is the
> updated list. Is it missing anything?  /Roger
> Here are some things to know about base64:
> 1.       Base64 encoding is specified in RFC 4648.
> 2.       Base64-encoded data is plain text.
> 3.       There are several base64 alphabets:
> a.       The standard base64 alphabet consists of these 64 ASCII characters:
> a-z, A-Z, 0-9, +, / and the equals symbol ( = ).
> b.       In the URL and Filename safe base64 alphabet, the plus symbol ( + )
> is replaced with a minus sign ( - ) and the forward slash symbol ( / ) is
> replaced with an underscore symbol ( _ ).
> c.       Other base64 alphabets are called non-standard, or custom, base64
> alphabets.
> d.       As a common extension, base64 can also contain arbitrary
> whitespace, which is ignored.
> 4.       Any type of file, from plain text to binary executable, can be
> base64-encoded.

for some definition of "file" and "executable".
You are I think assuming a filesystem that stores files as streams
with a character as end of line marker, not a system using fixed
length records, also you don't normally encode the executable or other
status in the base64 encoded string

> 5.       Base64-encoding enables binary objects to be transported using
> text-based protocols, such as SMTP.
> 6.       People have created regular expressions that specify the pattern of
> base64 text. It is possible for text to match the regular expression and yet
> not be base64. That is, text that appears to be base64, may not be.

It isn't clear what you mean by "not base64"  you can write a regex
that matches text just if it can be decoded, but of course you can't
tell if any string was generated by base64 encoding another string
I might have just typed aGVsbG8gd29ybGQ=  as a random string of
characters, or I may have base 64 encoded some other string, you can't
tell unless I tell you (and even then you can't tell if I'm telling
the truth)

> 7.       External information must be provided to tell whether text is
> base64.

as above, but nothing about base64 particularly, the same is true of any data.

> 8.       If external information says that text is base64, the external
> information might be incorrect, either by accident or by intent.

as above, but nothing about base64 particularly, the same is true of any data.

> 9.       Performing base64 decoding on data that is not base64 might cause
> harm.

I  think "causes harm" is the wrong thing to say. If the data isn't
base64 encoded
data then it might not decode at all, if it does decode then it may or may not
decode to the intended data but that in itself doesn't cause harm.
If you misuse any data then something may go wrong, but that's not really
related to base64.

> a.       Is there a way to perform base64 decoding that is guaranteed to
> never cause harm?

No. You could decode it and pipe it to  /dev/null but if the
information you discarded
was "run now", ignoring that may be harmful. Using  or misusing or ignoring data
can cause harm, whether or not base64 encoding is involved.

> 10.   There is nothing in base64-encoded data which identifies the media
> type of the data. External information must be provided to tell the media
> type. Without external information, the media type must be discovered (if
> possible).

The same is true of most plain text files base 64 encoded or not.

> 11.   If there is external information about the media type of the data, the
> external information might be incorrect, either by accident or by intent.

as above, but nothing about base64 particularly, the same is true of any data.

> 12.   Processing an object, assuming it is of media type A when it is
> actually of media type B, might cause harm.

as above, but nothing about base64 particularly, the same is true of any data.

> 13.   Decoding base64 text is a trivial task.
> 14.   Data that is base64-encoded cannot be directly viewed, used, or
> inspected.

any data on a computer needs to be decoded.
If you have a jpeg image or a base64 encoded jpeg image then in either case
you need to interpret the bytes in the data as an image, the software needed
is slightly different but I don't think there's any conceptual difference.

> 15.   Compared to data that is not encoded, viewing/using/inspecting
> base64-encoded data requires an additional step: decode and then
> view/use/inspect.

shrug. getting from the bytes in a file to some human understandable information
is a black box in most cases, if the box is a bit longer so be it.

> 16.   Without external information about the media type of the data that is
> base64-encoded, there are two additional steps to viewing/using/inspecting
> base64-encoded data: decode, determine the media type (if possible), and
> then view/use/inspect.

this is just restating the above and I don't think being base64 encoded changes
much here other than the details of the software needed.

> 17.   Text formats such as XML and JSON cannot carry binary data. If binary
> data must be carried by a text format, the binary data can be
> base64-encoded, thus generating plain text, and then the base64-encoded
> plain text can be carried by the text format.

true, although it's not saying much. It can be base64 encoded. There
are lots of other ways as well
either other text encodings or non-text encodings such as using
elements to replace control characters
not allowed in xml.

> 18.   The size of base64-encoded data is roughly 4/3 the size of the
> original data.
 depending on the size of the character encoding used, as noted in the
previous message,


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS