[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Speed in Languages and Browser Architectures
- From: noah_mendelsohn@us.ibm.com
- To: Rick Marshall <rjm@zenucom.com>
- Date: Thu, 1 Mar 2007 18:35:51 -0500
Rick Marshall writes:
> If I write a Java VM in C why can I write a program in Java that can run
> faster that a functionally equivalent program in C? (or assembler for
> that matter).
>
> Is there a permathread for this?
>
> The implication is that Java can create a sequence of instructions that
> C can't.
Great question (though I'm still waiting for someone to say that this
discussion has strayed too far from XML and so should go elsewhere. In
the meantime, this stuff happens to interest me, so):
To understand why there's no contradiction in Java sometimes being faster
than C, even if the runtime is written in C, you need to explore some
aspects of how optimizers work. I can think of at least two language
characteristics that sometimes (not always) work in favor of Java for
optimization:
The Java language is in some ways more constrained
---------------------------------------------------
First of all, optimizers need to prove theorems about your program, >> and
that's often easier to do if your language is less expressive, not more<<.
One classic barrier to the optimization of C programs is the relatively
unconstrained semantics and of pointers. So, consider (my C is a bit
rusty, so please excuse minor syntax errors):
float *p; // claimed to point to a float
int a;
int b;
...random code here, including some that might set p...
a=1;
somefunc(p);
b = a+a;
You'd like to believe that a good optimizer would notice that a=1, and so
be able to determine at compile time that b will always be 2. The problem
is that in the absence of other information, we really don't know what p
points to. It's just possible that in that random code, someone did:
p = &a; // p points to a
and that in somefunc the pointer was used to overwrite a:
*((int *)p = 6;
Even though this is highly unlikely, the optimizer can't prove it isn't
happening. Ask someone who writes optimizers for C and they'll tell you
that one of the toughest problems is keeping a seemingly innocent
reference to a pointer (in this case in the call to somefunc) from
defeating optimizations all over the program.
Java is strongly typed, and uses references instead of pointers. The
equivalent Java might be:
java.lang.Float floatRef; // definitely references a Float
int a;
int b;
...random code here, including some that might set floatRef...
a = 1;
somefunc(floatRef);
b = a+a;
Note that the C and Java are very, very similar programs in terms of the
data structures and code you'd want to generate, but in Java, an optimizer
can >prove< that somefunc can't affect the value of a. Bingo, you can
compute the value of both a and b at compile time.
These same features may cause C to be much faster in other cases. If
you're decoding a network packet, and the same bytes may contain either an
int or a float, that's a trival cast in C. It's at best tricky in Java.
Part of what you gain is that the Java optimizer can prove some things
that the C optimizer can't. So, it's a tradeoff, but you can show why the
Java can be faster.
Committing to optimizations early
---------------------------------
This is not inherent in the languages, but it's inherent in how they're
typically deployed. C compilers are typically run in advance of program
execution. They must therefore commit to optimizations in advance, and
typically without knowledge of program execution statistics (which
branches are taken often, which loops run long or short, which virtual
function calls mostly resolve to the same method repeatedly.)
Because Java JITs are executed at runtime, they can gamble on high risk
optimizations, and can watch the program's execution. One common trick
is to do a very aggressive optimization that only works if your gamble is
correct, but to guard it with a check. So, rather than generating code
that will run fast for any value of variable "i", for example, you can
generate code that won't work at all if i<0, but guard it with an:
if (i<0) reoptimize();
check. A C optimizer would have to statically generate code that work
work for all values of a variable.
Now, when all is said and done, C is often faster than Java. Type safety
can cost you. Code that makes numerous array accesses implies a very
large number of bounds checks, and you are relying on the optimizer to
find ways to skip most of them. Bringing this discussion back to XML:
that's one of the reasons it can be hard to write a fast XML parser in
Java. The inner loops tend to be working their way through arrays of
input bytes, and depending on how exactly you write the code, the
optimizer may or may not be smart enough to avoid adding an array bounds
check to each one. Furthermore, code that optimizes well in one VM may
not do well in another. In C, you lose type safety, but you put the
bounds checks exactly where you need them yourself (and if you're lucky,
the optimizer will find a way to skip optimize even some of the ones you
put in.)
So, yes: Java runtimes and compilers can be written in C, and in certain
cases they will execute a given Java program faster than its C equivalent.
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]