Speed up reading CSV in Java

Question

I have a relatively inefficent CSVReader code, see below. It takes more than 30 seconds to read 30000+ lines. How to speed up this reading process as fast as possible?

public class DataReader { private String csvFile; private List<String> sub = new ArrayList<String>(); private List<List> master = new ArrayList<List>(); public void ReadFromCSV(String csvFile) { String line = ""; String cvsSplitBy = ","; try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) { System.out.println("Header " + br.readLine()); while ((line = br.readLine()) != null) { // use comma as separator String[] list = line.split(cvsSplitBy); // System.out.println("the size is " + country[1]); for (int i = 0; i < list.length; i++) { sub.add(list[i]); } List<String> temp = (List<String>) ((ArrayList<String>) sub).clone(); // master.add(new ArrayList<String>(sub)); master.add(temp); sub.removeAll(sub); } } catch (IOException e) { e.printStackTrace(); } System.out.println(master); } public List<List> getMaster() { return master; } }

UPDATE: I have found that my code actually can finish the reading work in less than 1 second if run it separately. As this DataReader is a part used by my simulation model to initialize the relevant properties. And the following part is associated with the use of the data imported, WHICH TAKES 40 SECONDS TO FINISH! Anyone could help by looking at the generic part of the codes?

// add route network Network<Object> net = (Network<Object>)context.getProjection("IntraCity Network"); IndexedIterable<Object> local_hubs = context.getObjects(LocalHub.class); for (int i = 0; i <= CSV_reader_route.getMaster().size() - 1; i++) { String source = (String) CSV_reader_route.getMaster().get(i).get(0); String target = (String) CSV_reader_route.getMaster().get(i).get(3); double dist = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(6)); double time = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(7)); Object source_hub = null; Object target_hub = null; Query<Object> source_query = new PropertyEquals<Object>(context, "hub_code", source); for (Object o : source_query.query()) { if (o instanceof LocalHub) { source_hub = (LocalHub) o; } if (o instanceof GatewayHub) { source_hub = (GatewayHub) o; } } Query<Object> target_query = new PropertyEquals<Object>(context, "hub_code", target); for (Object o : target_query.query()) { if (o instanceof LocalHub) { target_hub = (LocalHub) o; } if (o instanceof GatewayHub) { target_hub = (GatewayHub) o; } } // System.out.println(target_hub.getClass() + " " + time); // Route this_route = (Route) net.addEdge(source_hub, target_hub); // context.add(this_route); // System.out.println(net.getEdge(source_hub, target_hub)); if (net.getEdge(source, target) == null) { Route this_route = (Route) net.addEdge(source, target); context.add(this_route); // this_route.setDist(dist); // this_route.setTime(time); } } }

Have you tried not cloning the sub list, but just creating a new one on every iteration? Also, Arrays.asList() is probably faster than looping yourself. — Robby Cornelissen
– Robby Cornelissen, Commented Oct 25, 2019 at 3:26
39 seconds to read 37490 lines. This is too much time for me. — Jack
– Jack, Commented Oct 25, 2019 at 3:29
You should probably also get a baseline for just reading the file so you know what you're up against. Little point in speeding up processing if IO is the bottleneck. — Robby Cornelissen
– Robby Cornelissen, Commented Oct 25, 2019 at 3:31
sub.removeAll(sub); - This seems a lot more expensive than sub.clear(); — Jacob G.
– Jacob G., Commented Oct 25, 2019 at 3:31

user207421 · Accepted Answer · 2019-10-25 06:16:42Z

In your code you are doing many write operation to just add the list of values from current row in your master list which is not required. You can replace the existing code with simple one as given below.

Existing code:

String[] list = line.split(cvsSplitBy); // System.out.println("the size is " + country[1]); for (int i = 0; i &lt; list.length; i++) { sub.add(list[i]); } List<String> temp = (List<String>) ((ArrayList<String>) sub).clone(); // master.add(new ArrayList<String>(sub)); master.add(temp); sub.removeAll(sub);

Suggested code:

master.add(Arrays.asList(line.split(cvsSplitBy)));

Alex R · Accepted Answer · 2019-10-25 04:10:34Z

~~I don't have a CSV that big~~, but you could try the following:

public static void main(String[] args) throws IOException { Path csvPath = Paths.get("path/to/file.csv"); List<List<String>> master = Files.lines(csvPath) .skip(1) .map(line -> Arrays.asList(line.split(","))) .collect(Collectors.toList()); }

EDIT: I tried it with a CSV sample with 50k entries and the code runs in less than one second.

it's not working. report Exception in thread "main" java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1 at java.base/java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:173)
@Jack It's probably some problem with the encoding of your file. You can pass a Charset as the second parameter of the Files.lines method. Take a look at this or this

Atul · Accepted Answer · 2019-10-25 05:30:01Z

1

With extends to the answer of @Alex R, you can process it in parallel as well like this:

public static void main(String[] args) throws IOException { Path csvPath = Paths.get("path/to/file.csv"); List<List<String>> master = Files.lines(csvPath) .skip(1).parallel() .map(line -> Arrays.asList(line.split(","))) .collect(Collectors.toList()); }

answered Oct 25, 2019 at 5:30

Atul

3,45734 silver badges47 bronze badges

3 Comments

Atul Over a year ago

Yes but in case if you dont want to keep the track of lines and just process the data then you can process them in parallel as well to get the result fast.

Alex R Over a year ago

Reading in parallel may even slow the whole thing down... There are multiple posts about that, like this or this

Atul Over a year ago

Thanks @Alex R for information. If you are using Java9 or later version then it is working as expected. But for Java8 avoid parallel processing then.

Collectives™ on Stack Overflow

Speed up reading CSV in Java

3 Answers 3

Comments

2 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

3 Comments

Linked

Related