1

I have a java function which operate on huge amount of data may be 500MB. I have to pass this 500MB of data to a java function and return the data after processing from the java function.

My is in the tabular form as follows

col1 col2 col3 col4 col5 col6 3 5 2 5 1 6 7 5 6 8 3 8 5 3 7 9 8 1 

I have few ideas in mind but don't know exactly which one is efficient and how to implement like which java api I need for those.

  1. Convert the data into java objects (each row one object of same class). Then pass the objects as an array to java function.
  2. Prepare XML doc from the tabular data and pass XML doc to java function. inside java function extract objects from XML document.
  3. Save the tabular data into file and input the file as argument to java function.

These ideas I have in my mind, if some one can provide pros and cons of above 3 methods or suggest some new method it will be grateful to me.

6
  • In order to give a good answer much more detail is needed. Where do you get the data from? What is the function supposed to do with it? Commented Jun 25, 2014 at 6:30
  • I think you have to tests your ideas. It depends the processing actions you want to do. Commented Jun 25, 2014 at 6:31
  • Just to correct you, there are NO FUNCTIONS in Java, it has only Methods. stackoverflow.com/a/16335031/1055241 Commented Jun 25, 2014 at 6:42
  • @GPRathour technically and historically you are correct (if you speak about the Java -language-), although you could argue that static methods serve the purpose of a function. In any case terminology is a big conflicting mess nowadays... Commented Jun 25, 2014 at 8:22
  • @GPRathour Yeah... well, a Java static method is pretty much a function. Let's not get into a nomenclature fight. Commented Jun 25, 2014 at 8:48

4 Answers 4

1

Passing an array will just pass a reference that will not involve any data copying and as such is as efficient as it can be. Any modification to the array will be done on the referenced array. Nothing needs to be returned.

Sign up to request clarification or add additional context in comments.

7 Comments

thanks a lot for the suggestion. Could you please let me know if this method is applicable,if my function to call is on another server?
@SurjyaNarayanaPadhi if you need to pass 500 MB to another server for a method call, I would seriously recommend having a hard, long look at your architecture.
No, you do not want to pass 500MB via some RPC mechanism.
@TassosBassoukos this is not necessarily an architectural problem. There are a lot of applications that need to transfer huge amounts of data as part of their requirements (e.g. video streaming services). I would suggest you use input/output streams, which are the intended Java mechanism for data transfer. Using stream you gives you the opportunity to control the flow of the data, to manipulate it before sending/after receiving, etc.
@eitanfar I am aware that the need is there (I was transferring that amount in 98 between various stages of a meteorological model that were in different countries - fun times) - It's just that doing it as a function call is... suboptimal, mainly from a memory usage perspective.
|
1

If you are reading the data from a file or a stream, then you can map the file into memory. So it won't read the entire file. Look in here

Comments

1

Since you have a large amount of data in tabular format, have you considered using Java DB (database)? Granted this is depending on what kind of processing you're going to do, how long you have to develop and how well you already know databases/SQL, but it sounds like you're going to read the data in row by row and databases are a good way to do this - especially with large amounts of data.

There is information about the JDBC API here on the Java Trail, with steps on how to use it: http://docs.oracle.com/javase/tutorial/jdbc/overview/index.html

From the Java Trail:

The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a Relational Database.

Some things to keep in mind:

  • You have to know/learn SQL or other querying language.
  • You'll have to design the structure of the database and build it, although probably you can use a similar structure to what you were planning in your XML file.
  • KEYS! Keys are unique identifiers for each row in your database, like an ID number. I highly recommend you add a separate field/column to use as a key, especially if you're new to databases. They increase the memory overhead of your database a small amount but in return you don't have to worry about identifying unique rows and can quickly go back to a row you've already searched.
  • You can pick and choose what data to bring in - don't bring in more than you need.

1 Comment

I highly support this idea, although I interpret from the question description that the method to do the processing already exists and is thus not designed to work on a database. So that would require redesigning the existing code too.
0

If you are thinking about processing of the Data by a Java function/method, Consider chunks of Data to be processed at once. Again the size of Chunk you can decide based on some computations like start with 10 KB and see the performance and calculate. it depends on execution environment. There are several ways to get the chunks of data from file/stream/Database (even if it is remote server).you need to post more details about your problem to get the better suggestions.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.