3

I am writing on a graph library that should read the most common graph formats. One format contains information like this:

e 4 3 e 2 2 e 6 2 e 3 2 e 1 2 .... 

and I want to parse these lines. I looked around on stackoverflow and could find a neat solution to do this. I currently use an approach like this (file is an fstream):

string line; while(getline(file, line)) { if(!line.length()) continue; //skip empty lines stringstream parseline = stringstream(line); char identifier; parseline >> identifier; //Lese das erste zeichen if(identifier == 'e') { int n, m; parseline >> n; parseline >> m; foo(n,m) //Here i handle the input } } 

It works quite good and as intended, but today when I tested it with huge graph files (50 mb+) I was shocked that this function was by far the worst bottleneck in the whole program:

The stringstream I use to parse the line uses almost 70% of the total runtime and the getline command 25%. The rest of the program uses only 5%.

Is there a fast way to read those big files, possibly avoiding slow stringstreams and the getline function?

3
  • Have you considered boost::spirit ? Commented Mar 9, 2012 at 0:36
  • I want to avoid boost if it is possible. Commented Mar 9, 2012 at 0:36
  • 1
    dollars to doughtnuts that your C library scanf can beat all of these. :) Commented Mar 9, 2012 at 0:44

2 Answers 2

3

You can skip double-buffering your string, skip parsing the single character, and use strtoll to parse integers, like this:

string line; while(getline(file, line)) { if(!line.length()) continue; //skip empty lines if (line[0] == 'e') { char *ptr; int n = strtoll(line.c_str()+2, &ptr, 10); int m = strtoll(ptr+1, &ptr, 10); foo(n,m) //Here i handle the input } } 

In C++, strtoll should be in the <cstdlib> include file.

Sign up to request clarification or add additional context in comments.

1 Comment

Nice, I think combining both answers I can write something really fast.
1

mmap the file and process it as a single big buffer.

If you system lacks mmap, you might try to read the file into a buffer that you malloc

Rationale: most of the time is in the transition from user to system and back in the calls to the C library. Reading in the whole file eliminates almost all those calls.

1 Comment

Thank you, I will try this and report my results. However one major bottleneck is the parsing via stringstreams which will not be removed by just reading everything in a huge buffer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.