2

My Java application essentially does this:

  • read nThread buffers from a file (1 mb byte arryas)
  • create nThread threads that process the buffers
  • wait for threads to process data
  • write processed data to another file

It is one of those applications that should achieve a theoretical 100% speed boost for each core, but instead, the more threads process the information, the SLOWER it gets!

example:

  • 1 thread: 4800 ms
  • 2 threads: 10200 ms
  • 3 threads: 13400 ms
  • 4 threads: 18560 ms
  • and so on
9
  • 3
    Are you reading the file with multiple thread by chance? Commented May 14, 2012 at 16:10
  • 1
    Can you show us a small code sample of how you are creating your threads and sending the lines to them? Commented May 14, 2012 at 16:13
  • 1
    I bet he is using the run method instead of start :D Commented May 14, 2012 at 16:22
  • 1
    This is expected behaviour. Your bottleneck is your hard-drive and nothing will make that faster. Some simple pre-optimisation testing would have confirmed that. Commented May 14, 2012 at 16:31
  • 1
    @OldCurmudgeon This is not expected behavior. If he is IO bound then adding threads won't help but they shouldn't hurt and certainly not that much. Commented May 14, 2012 at 16:48

2 Answers 2

1

Getting that sort of performance as you add threads means that you are doing something really wrong. Often adding threads will not provide any speed improvement and sometimes can penalize you a bit but adding another thread and doubling the program run time is highly unusual.

Here's some things to investigate:

  • As @Tudor mentioned in the comments, you should be reading the input file from a single thread and then dispatching the work to the worker threads.
  • You should consider using an ExecutorService instead of managing your own threads yourself. This usually removes a lot of user code and associated bugs. See Executors.newFixedThreadPool(numThread).
  • Are you sure you are starting your threads correct? You should be calling new Thread(...).start() and not calling run() directly.
  • Are you calling join() before you start your next thread? You should start() all of your threads, dispatch the lines to them, and then join() on them at the end.
  • Any chance you are sending all of the input lines to all of the threads by accident? I wouldn't think that this would show the performance numbers however unless you are increasing your output IO as well.

If you show us some of your thread code in your question, we can help more.

Sign up to request clarification or add additional context in comments.

10 Comments

i tried both reusing the same threads and creating new ones, with minimum changes in the order of <1ms
i am using .start() but not .join() because of how i managed the threads, now i'll give a look at that thread pool, i'll see if it works. thanks a lot
i just rewrote the application using the thread pool. still no improvements. i called nanotime at the beginning and at the end of the thread that processes the data, and i don't know what to say. the time it takes to a thread to process the data is proportional to the number of running threads, but there are enough cpu cores to run them all simultaneously, and they are running at about 50-60% with 4 threads, so... what the hell is it scheduling to take that long??
How are you dividing up the lines to be processed. Any chance you are sending all of the lines to all of the threads?
Sorry, too much code for me to process. Can you cut it down to 50-100 lines and post to pastebin.com?
|
0

A code which hasn't been adequately optimized will usually use up the entire memory bandwidth all by itself. Add another thread using the same unoptimized code on a multi-core processor and they'll split the bandwidth between them and, in addition, run into each other fairly often, further slowing things down.

Gray says "... doubling the program run time is highly unusual." I disagree. That's usually what happens in C code before you start optimizing the memory accesses. I'd say it's highly unusual to not see a slowdown right from the start.

I don't know if sampling is available for Java but that is an obvious place to start.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.