3

I am trying to sort a file using threading. Here is Sort.java :

This function sorts with help of threading

public static String[] threadedSort(File[] files) throws IOException { String sortedData[] = new String[0]; int counter = 0; boolean allThreadsTerminated = false; SortingThread[] threadList = new SortingThread[files.length]; for (File file : files) { String[] data = getData(file); threadList[counter] = new SortingThread(data); threadList[counter].start(); counter++; } while(!allThreadsTerminated) { allThreadsTerminated = true; for(counter=0; counter<files.length; counter++) { if(threadList[counter].getState() != Thread.State.TERMINATED) { allThreadsTerminated = false; } } } for(counter=0; counter<files.length; counter++) { sortedData = MergeSort.merge(sortedData, threadList[counter].data); } return sortedData; } 

This function sorts just normally

 public static String[] sort(File[] files) throws IOException { String[] sortedData = new String[0]; for (File file : files) { String[] data = getData(file); data = MergeSort.mergeSort(data); sortedData = MergeSort.merge(sortedData, data); } return sortedData; } 

Now when I sort using both ways the normal sorting is faster than threaded version. What can be reason for it ? Had i missed something ?

My SortingThread is something like this :

public class SortingThread extends Thread { String[] data; SortingThread(String[] data) { this.data = data; } public void run() { data = MergeSort.mergeSort(data); } } 

When I analyze my threaded implementation by comparing its performance to the original non-threaded implementation I find second one faster. What can be reason for such behavior ? If we talk of relative performance improvement we expect for threaded implementation to be faster if am not wrong.

EDIT : Assume I have properly functional MergeSort. But its of no use to post its code here. Also getData() function is just to take input from file. I think problem lies with the fact that am taking whole file in array. I think I should provide different lines to different threads :

private static String[] getData(File file) throws IOException { ArrayList<String> data = new ArrayList<String>(); BufferedReader in = new BufferedReader(new FileReader(file)); while (true) { String line = in.readLine(); if (line == null) { break; } else { data.add(line); } } in.close(); return data.toArray(new String[0]); } 
7
  • What's your timing data? How much quicker is it? Or in the words of Art of Noise: "How rapid is rapid?" You seem to be sorting file content. Filesystem access may be the bottleneck. Creating threads is a heavy process, while it perhaps is not giving any benefit. Commented Jun 7, 2015 at 7:15
  • If you require to perform an operation and combine the result at the end, ForkJoinPool would probably be a better choice. Commented Jun 7, 2015 at 7:20
  • @RogerGustavsson Sort.sort took 1.129517647 seconds to read and sort the data. Sort.threadedSort took 3.171421661 seconds to read and sort the data. Commented Jun 7, 2015 at 7:25
  • @RogerGustavsson So is threading expected to be slower in this case ? Commented Jun 7, 2015 at 7:26
  • @xTrollxDudex How to do it in my above code ? Please help Commented Jun 7, 2015 at 7:26

4 Answers 4

1

First of all, how do you measure elapsed time? Do you execute both tests in the same program? If so, keep in mind that mergesort will probably undergo Hotspot compilation while the first test is executed. I suggest you run each method twice, measuring the time on the second run

Sign up to request clarification or add additional context in comments.

7 Comments

You are right !!!! If I call normal sort first pastebin.com/j7nLAKkz its different results than if I call threaded sort first pastebin.com/6BzCmmxn. What can be reason for it ? Please help
Another reason is that files are cached when you run the first sort, so the second sort reads files faster and this might be more important than all the threading you do :)
@Michael How to remove it from cache then ? so that both executions are independent
As I wrote, run each test twice and measure time the second time
I like to interleave code I want to compare a few times - first option, second option, first again, second again, first again, second again. Look at all 6 times to get an idea of how the executions affect each other.
|
0

How many CPU/cores do you have? One problem with this code is that the main thread spends CPU time in "while(!allThreadsTerminated)" loop, actively checking thread state. If you have one CPU - you are wasting it, instead of doing actual sorting.

Replace the while-loop with:

 for(counter=0; counter<files.length; counter++) { threadList[counter].join(); } 

2 Comments

You mean replace while loop with this statement ?
Can we have some faster I/O startergy for this program ?
0

You should use Stream and standard sort:

static String[] sort(File[] files, boolean parallel) { return (parallel ? Stream.of(files).parallel() : Stream.of(files)) .flatMap(f -> { try { return Files.lines(f.toPath()); } catch (Exception e) { e.printStackTrace(); return null; } }) .sorted() .toArray(String[]::new); } static String[] sort(File[] files) { return sort(files, false); } static String[] threadSort(File[] files) { return sort(files, true); } 

In my environmet threadSort is faster.

sort: files=511 sorted lines=104419 elapse=4784ms threadSort: files=511 sorted lines=104419 elapse=3060ms 

3 Comments

I want to use only my two methods.
Can we have some faster I/O startergy for this program ?
It is little bit faster to put I/O outside of parallelization as you did. But it is slower to sort each files.
0

You can use java.util.concurrent.ExecutorService which will run all your tasks in specified number of threads, and once all threads have finished execution you will get a list Future object which will hold the result of each thread execution. List of Future objects will be in same order as you inserted the Callable objects into its list.

For that first thing you need is have your SortingThread implement Callable interface so that you can get the result of each thread execution.
Each Callable object have to implement the call() method and its return type would be your Future object.

 public class SortingThread implements Callable<String[]> { String[] data; SortingThread(String[] data) { this.data = data; } @Override public String[] call() throws Exception { data = MergeSort.mergeSort(data); return data; } } 

Next you need is to use ExecutorSerivce for thread management.

public static String[] sortingExampleWithMultiThreads(File[] files) throws IOException { String sortedData[] = new String[0]; int counter = 0; boolean allThreadsTerminated = false; SortingThread[] threadList = new SortingThread[files.length]; ArrayList<Callable<String[]>> callableList = new ArrayList<Callable<String[]>>(); for (File file : files) { String[] data = getData(file); callableList.add(new SortingThread(data)); //Prepare a Callable list which would be passed to invokeAll() method. counter++; } ExecutorService service = Executors.newFixedThreadPool(counter); // Create a fixed size thread pool, one thread for each file processing... List<Future<String[]>> futureObjects = service.invokeAll(callableList); //List of what call() method of SortingThread is returning... for(counter=0; counter<files.length; counter++) { sortedData = MergeSort.merge(sortedData, futureObjects.get(counter)); } return sortedData; } 

This way you can avoid using WHILE loop which is known to increase CPU utilization (hence decrease in speed), and if you have single core CPU then it can reach 100% of utilization, and if dual core then 50%.
Also, using ExecutorService for thread management is better way when dealing with multi-threading instead of dev starting and monitoring threads for results. So, you can expect performance.

I have not ran it, so you may need to do so change here and there but I have highlighted you approach.

P.S.: When measuring the performance, to get the neat and precise results, always have a new JVM instance created for each run.

2 Comments

Can we have some faster I/O startergy for this program ?
We have to use threads to get the job done, now there are no faster threads and slower threads. So, best things I can suggest is - 1. Use Java's very own ExecutorService API so that you expect best result. 2. All these multi-threading API's like ExecutorService, ThreadPool etc. do not take advantage of all available processors in the CPU, so the top most approach can be to use fork-join framework (docs.oracle.com/javase/tutorial/essential/concurrency/…) which lets you take advantage of all available processor along with multi-threading...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.