Re: [xml-dev] Speed in Languages and Browser Architectures

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: noah_mendelsohn@us.ibm.com
To: Rick Marshall <rjm@zenucom.com>
Date: Thu, 1 Mar 2007 18:35:51 -0500
Rick Marshall writes:

> If I write a Java VM in C why can I write a program in Java that can run 

> faster that a functionally equivalent program in C? (or assembler for 
> that matter).
> 
> Is there a permathread for this?
> 
> The implication is that Java can create a sequence of instructions that 
> C can't.

Great question (though I'm still waiting for someone to say that this 
discussion has strayed too far from XML and so should go elsewhere.  In 
the meantime, this stuff happens to interest me, so):

To understand why there's no contradiction in Java sometimes being faster 
than C, even if the runtime is written in C, you need to explore some 
aspects of how optimizers work.  I can think of at least two language 
characteristics that sometimes (not always) work in favor of Java for 
optimization:

The Java language is in some ways more constrained
---------------------------------------------------

First of all, optimizers need to prove theorems about your program, >> and 
that's often easier to do if your language is less expressive, not more<<. 
 One classic barrier to the optimization of C programs is the relatively 
unconstrained semantics and of pointers.  So, consider (my C is a bit 
rusty, so please excuse minor syntax errors):

        float *p;  // claimed to point to a float
        int a;
        int b;

        ...random code here, including some that might set p...

        a=1;
        somefunc(p);
        b = a+a;

You'd like to believe that a good optimizer would notice that a=1, and so 
be able to determine at compile time that b will always be 2.  The problem 
is that in the absence of other information, we really don't know what p 
points to.  It's just possible that in that random code, someone did:

        p = &a; // p points to a

and that in somefunc the pointer was used to overwrite a:

        *((int *)p = 6;

Even though this is highly unlikely, the optimizer can't prove it isn't 
happening.  Ask someone who writes optimizers for C and they'll tell you 
that one of the toughest problems is keeping a seemingly innocent 
reference to a pointer (in this case in the call to somefunc) from 
defeating optimizations all over the program.

Java is strongly typed, and uses references instead of pointers.  The 
equivalent Java might be:

        java.lang.Float floatRef;  // definitely references a Float
        int a;
        int b;

        ...random code here, including some that might set floatRef...

        a = 1;
        somefunc(floatRef);
        b = a+a;

Note that the C and Java are very, very similar programs in terms of the 
data structures and code you'd want to generate, but in Java, an optimizer 
can >prove< that somefunc can't affect the value of a.  Bingo, you can 
compute the value of both a and b at compile time. 

These same features may cause C to be much faster in other cases.  If 
you're decoding a network packet, and the same bytes may contain either an 
int or a float, that's a trival cast in C.  It's at best tricky in Java. 
Part of what you gain is that the Java optimizer can prove some things 
that the C optimizer can't.  So, it's a tradeoff, but you can show why the 
Java can be faster.

Committing to optimizations early
---------------------------------

This is not inherent in the languages, but it's inherent in how they're 
typically deployed.  C compilers are typically run in advance of program 
execution.  They must therefore commit to optimizations in advance, and 
typically without knowledge of program execution statistics (which 
branches are taken often, which loops run long or short, which virtual 
function calls mostly resolve to the same method repeatedly.)

Because Java JITs are executed at runtime, they can gamble on high risk 
optimizations, and can watch the program's execution.   One common trick 
is to do a very aggressive optimization that only works if your gamble is 
correct, but to guard it with a check.  So, rather than generating code 
that will run fast for any value of variable "i", for example, you can 
generate code that won't work at all if i<0, but guard it with an:

        if (i<0) reoptimize();

check.  A C optimizer would have to statically generate code that work 
work for all values of a variable. 

Now, when all is said and done, C is often faster than Java.   Type safety 
can cost you.  Code that makes numerous array accesses implies a very 
large number of bounds checks, and you are relying on the optimizer to 
find ways to skip most of them.  Bringing this discussion back to XML: 
that's one of the reasons it can be hard to write a fast XML parser in 
Java.  The inner loops tend to be working their way through arrays of 
input bytes, and depending on how exactly you write the code, the 
optimizer may or may not be smart enough to avoid adding an array bounds 
check to each one.  Furthermore, code that optimizes well in one VM may 
not do well in another.  In C, you lose type safety, but you put the 
bounds checks exactly where you need them yourself (and if you're lucky, 
the optimizer will find a way to skip optimize even some of the ones you 
put in.)

So, yes:  Java runtimes and compilers can be written in C, and in certain 
cases they will execute a given Java program faster than its C equivalent.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
References:
- Re: [xml-dev] Speed in Languages and Browser Architectures
  - From: Rick Marshall <rjm@zenucom.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]