20

I have a program I ported from C to Java. Both apps use quicksort to order some partitioned data (genomic coordinates).

The Java version runs fast, but I'd like to get it closer to the C version. I am using the Sun JDK v6u14.

Obviously I can't get parity with the C application, but I'd like to learn what I can do to eke out as much performance as reasonably possible (within the limits of the environment).

What sorts of things can I do to test performance of different parts of the application, memory usage, etc.? What would I do, specifically?

Also, what tricks can I implement (in general) to change the properties and organization of my classes and variables, reducing memory usage and improving speed?

EDIT : I am using Eclipse and would obviously prefer free options for any third-party tools. Thanks!

4
  • Have you benchmarked the C version against the Java version. Using JIT techniques, the JVM can actually out perform native code in certain circumstances, depending on the machine code actually generated by the C compiler. Commented Jun 2, 2009 at 11:27
  • Depending of the nature of the data and how you choose your pivot in your quick sort algorithm, you may want to consider merge sort as its worst case is O(n log n) Commented Jun 2, 2009 at 13:30
  • Try using the built-in quicksort to see how it compares with your implementation. You might be surprised :) Commented Jun 2, 2009 at 22:25
  • Some performance improvement tips that may also be relevant for Java here: stackoverflow.com/questions/32581644 Commented Jul 3, 2019 at 10:10

14 Answers 14

35

do not try to outsmart the jvm.

in particular:

  • don't try to avoid object creation for the sake of performance

  • use immutable objects where applicable.

  • use the scope of your objects correctly, so that the GC can do its job.

  • use primitives where you mean primitives (e.g. non-nullable int compared to nullable Integer)

  • use the built-in algorithms and data structures

  • when handing concurrency use java.util.concurrent package.

  • correctness over performance. first get it right, then measure, then measure with a profiler then optimize.

Sign up to request clarification or add additional context in comments.

10 Comments

While many of your advices may be true, I'd boldly disagree with the first one. Reducing object creation was the most crutial part to optimization in many of my projects. Of course, it's no use to save 10 big objects, but often you end up with millions of small objects, and not creating them is important.
Andreas is largely correct, object allocation is extremely quick in modern VM's (a lot faster than C++ for example) and assuming they don't live long they'll be cleaned up in a minor collection. Having said that, profile, profile some more, only do things that you have evidence for, and not just because somebody said so.
Outsmarting the VM means trying not trying to improve performance, by, for example, avoiding object creation. Leave these kinds of optimisations to the compiler and VM. Afterwards, perhaps connect a profiler such as VisualVM/JProfiler to see which parts of your code are most affecting your program's performance, then concentrate on improving them.
That first sentence should read "Outsmarting the VM means trying not to improve performance, by, for example, avoiding object creation".
@Brian: In Java 1.4 memory allocation for an object took less than 10 processor instructions. The magic with movable generational GC is that free memory is always contiguous, allocating 10 bytes is just returning the current free pointer and incrementing it by 10. Conversely, reusing old objects holding references to possibly newer objects will impact performance (possibly forcing memory moves). That is 'trying to outsmart the VM'. Depending on your object definitions and the code you use, you can improve or worsen performance.
|
11

Obviously, profile profile profile. For Eclipse there's TPTP. Here's an article on the TPTP plugin for Eclipse. Netbeans has its own profiler. jvisualvm is nice as a standalone tool. (The entire dev.java.net server seems to be down at the moment, but it is very much an active project.)

The first thing to do is use the library sorting routine, Collections.sort; this will require your data objects to be Comparable. This might be fast enough and will definitely provide a good baseline.

General tips:

  • Avoid locks you don't need (your JVM may have already optimized these away)
  • Use StringBuilder (not StringBuffer because of that lock thing I just mentioned) instead of concatenating String objects
  • Make anything you can final; if possible, make your classes completely immutable
  • If you aren't changing the value of a variable in a loop, try hoisting it out and see if it makes a difference (the JVM may have already done this for you)
  • Try to work on an ArrayList (or even an array) so the memory you're accessing is contiguous instead of potentially fragmented the way it might be with a LinkedList
  • Quicksort can be parallelized; consider doing that (see quicksort parallelization)
  • Reduce the visibility and live time of your data as much as possible (but don't contort your algorithm to do it unless profiling shows it is a big win)

4 Comments

when escape analysis is in place and 100% correctly working, StringBuilder has the same performance characteristics as StringBuffer. haven't tried it under j6u14 yet. so you should not worry about this too heavily.
Concatenation of plain Strings is implemented by means of StringBuilder and optimized in many cases. That kind of micro-optimization is what made some of use use StringBuffer-s in the past just to realize that with modern VMs plain String concatenation is faster than our hand-tailored... who will refactor those StringBuffer to StringBuilder now? That is one of the examples of trying to outsmart the compiler/VM.
Yes, switching from String concatenation to using a StringBuilder brought a huge performance increase in a few JSPs I was rendering. Good Point.
@Andreas: No use in hoping the JVM can figure out what you meant (I don't need "thread-safe" appends) when there is already a class that makes that intent explicit. StringBuffer won't handle a fair amount of the locking that a truly thread-safe program is going to need anyway (ordering issues, mostly). @dribeas: I realize that is the theory, but any number of things can cause the JVM to not perform that optimization. Also, StringBuffer -> StringBuilder is a trivial refactoring in any case where it is valid.
5

Use a profiler:

Use the latest version of JVM from your provider. Incidentally Sun's Java 6 update 14 does bring performance improvements.

Measure your GC throughput and pick the best garbage collector for your workload.

3 Comments

The link to visualvm seems to be dead. Is it still maintained?
Yes, and the last Java update - Java 6 update 14 - brought in improvements to it. It's very much alive.
All of dev.java.net seems to be down at the moment, so this is why the link is down.
4

Don't optimize prematurely.

Measure performance, then optimize.

Use final variables whenever possible. It will not only allow JVM to optimize more, but also make your code easier to read and maintain.

If you make your objects immutable, you don't have to clone them.

Optimize by changing the algorithm first, then by changing the implementation.

Sometimes you need to resort to old-style techniques, like loop unrolling or caching precalculated values. Remember about them, even if they don't look nice, they can be useful.

Comments

2

jvisualvm ships with JDK 6 now - that's the reason the link cited above doesn't work. Just type "jvisualvm <pid>", where <pid> is the ID of the process you want to track. You'll get to see how the heap is being used, but you won't see what's filling it up.

If it's a long-running process, you can turn on the -server option when you run. There are a lot of tuning options available to you; that's just one.

Comments

2

Also try tweaking the runtime arguments of the VM - the latest release of the VM for example includes the following flag which can improve performance in certain scenarios.

-XX:+DoEscapeAnalysis 

Comments

2

First caveat - make sure you have done appropriate profiling or benchmarking before embarking on any optimisation work. The results will often enlighten you, and nearly always save you a lot of wasted effort in optimising something that doesn't matter.

Assuming that you do need it, then you can get performance comparable to C in Java, but it takes some effort. You need to know where the JVM is doing "extra work" and avoid these.

In particular:

  • Avoid unnecessary object creation. While the JVM heap and GC is extremely fast and efficient (probably the best in the world, and almost certainly better than anything you could roll yourself in C), it is still heap allocation and that will be beaten by avoiding the heap in the first place (stack or register allocation)
  • Avoid boxed primitives. You want to be using double and not Double.
  • Use primitive arrays for any big chunks of data. Java primitive arrays are basically as fast as C/C++ arrays (they do have an additional bounds check but that is usually insignificant)
  • Avoid anything synchronized - Java threading is pretty decent but it is still overhead that you may not need. Give each thread it's own data to work on.
  • Exploit concurrency - Java's concurrency support is very good. You might as well use all your cores! This is a big topic but there are plenty of good books / tutorials available.
  • Use specialised collection classes for certain types of data if you have some very specific requirements, e.g. supporting some specialised sorting/search algorithms. You may need to roll your own, but there are also some good libraries with high performance collection classes available that may fit your needs - see e.g. Javoltion
  • Avoid big class heirarchies - this is a design smell in performance code. Every layer of abstraction is costing you overhead. Very fast Java code will often end up looking rather like C....
  • Use static methods - the JIT can optimise these extremely well. It will usually inline them.
  • Use final concrete classes - again, the JIT can optimise these very well by avoiding virtual function calls.
  • Generate your own bytecode - if all else fails, this can be a viable option if you want the absolute maximum performance out of the JVM. Particularly useful if you need to compile your own DSL. Use something like ASM.

Comments

1

If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.

This is however not a decision to be taken lightly, as writing concurrent code is hard.

Comments

1

Can't you use the sort functions that are included in the Java library?

You could at least look at the speed difference between the two sorting functions.

4 Comments

The comparator I am using is customized to deal with ordering a genomic and positional data structure.
@Alex If you make your data objects implement Comparable you can still use the library sort.
@Hank: why can't @Alex use the overloaded sort method that takes a Comparator?
@Hemal The code is cleaner when you use the natural sort: no need to create a comparator and fewer arguments to pass. Of course, if Alex's sort criteria don't make sense as the natural order, the Comparator version is the way to go.
0

Methodolically, you have to profile the application and then get an idea of what components of your program are time and memory-intensive: then take a closer look to that components, in order to improve their performances (see Amdahl's law).

From a pure technological POV, you can use some java-to-nativecode compilers, like Excelsior's jet, but I've to note that recent JVM are really fast, so the VM should not impact in a significative manner.

4 Comments

Okay, but what tools would I actually use to this end?
Pick a profiler and use it. I've used jprofiler: it is good, but it costs money.
I've also used eclipse's TPTP, but its capabilities are poor compared to what jprofiler has to offer to you.
TPTP is very hard to set up correctly. I've tried more than once to use it with Eclipse 3.4.x and failed. JProfiler "just works".
0

Is your sorting code executing only once, e.g. in a commandline utility that just sorts, or multiple times, e.g. a webapp that sorts in response to some user input?

Chances are that performance would increase significantly after the code has been executed a few times because the HotSpot VM may optimize aggressively if it decides your code is a hotspot.

This is a big advantage compared to C/C++.

The VM, at runtime, optimizes code that is used often, and it does that quite well. Performance can actually rise beyond that of C/C++ because of this. Really. ;)

Your custom Comparator could be a place for optimization, though.

Try to check inexpensive stuff first (e.g. int comparison) before more expensive stuff (e.g. String comparison). I'm not sure if those tips apply because I don't know your Comparator.

Use either Collections.sort(list, comparator) or Arrays.sort(array, comparator). The array variant will be a bit faster, see the respective documentation.

As Andreas said before: don't try to outsmart the VM.

Comments

0

Perhaps there are other routes to performance enhancement other than micro-optimization of code. How about a different algorithm to achieve what you wanted your program to do? May be a different data structure?

Or trade some disk/ram space for speed, or if you can give up some time upfront during the loading of your program, you can precompute lookup tables instead of doing calculations - that way, the processing is fast. I.e., make some trade-offs of other resources available.

Comments

0

Here's what I would do, in any language. If samples show that your sort-comparison routine is active a large percentage of the time, you might find a way to simplify it. But maybe the time is going elsewhere. Diagnose first, to see what's broken, before you fix anything. Chances are, if you fix the biggest thing, then something else will be the biggest thing, and so on, until you've really gotten a pretty good speedup.

Comments

0

Profile and tune your java program and host machine. Most code follows 80/20 rule. That is 20% of code 80% of time, so find that 20% and make it as fast as possible. For example, the article Tuning Java Servers (http://www.infoq.com/articles/Tuning-Java-Servers) provides a description of drill down from command line and then isolate the problem using tools like Java Flight recorder, Eclipse Memory Analyser, and JProfiler.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.