String search/ indexing in a file using C++

Question

I am using the following code which searches the file and provides the data and associated line number. But is this code fast enough in case of hundreds of thousands of lines? My PC literally froze for a few seconds. I need to search pair of integers and return its RHS value after comma (some statistical stuff), but with the following code I could able to return the whole line.

Is it good idea in terms of fastness to parse the returned data using split functions and get my RHS value

OR

Directly get RHS value based on LHS argument. (Well I am unable to do this)

Can anyone help me in achieving any one of the above two?

Here is my code:

#include <string> #include <iostream> #include <fstream> int main() { std::ifstream file( "index_hyper.txt" ) ; std::string search_str = "401" ; std::string line ; int line_number = 0 ; while( std::getline( file, line ) ) { ++line_number ; if( line.find(search_str) != std::string::npos ) std::cout << "line " << line_number << ": " << line << '\n' ; } }

Here is my content of index_hyper.txt file:

18,22 20,37 151,61 200,62 156,63 158,64 159,65 153,66 156,67 152,68 154,69 155,56 156,14 157,13 160,122 161,1333 400,455 401,779 402,74 406,71

What's RHS and LHS ? What output do you want for what input ? — Jabberwocky
– Jabberwocky, Commented Oct 3, 2013 at 12:22
This code with this file froze your computer? I'm surprised at that. Sure it wasn't the virus checker? — john
– john, Commented Oct 3, 2013 at 12:22
@user2754070 ahh, it not that big, something else is causing the problem. See this — P0W
– P0W, Commented Oct 3, 2013 at 12:33
1.7MB? grep would parse that file very fast, no need to write a program to do that — Slava
– Slava, Commented Oct 3, 2013 at 12:37

Adam Burry · Accepted Answer · 2013-10-03 14:44:17Z

1

You can do the work of the code above with:

grep -n "^401," index_hyper.txt

If you want to output just the RHS, you can:

grep "^401," index_hyper.txt | sed "s/[^,]*,//"

If you are on a Windows platform without sed, grep, bash, etc. then you can easily access unix tools by installing cygwin.

edited Oct 3, 2013 at 14:44

answered Oct 3, 2013 at 12:33

Adam Burry

1,90213 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

P0W Over a year ago

And how is this related to C++ ?

Adam Burry Over a year ago

@P0W, It is not. It is related to the problem specification.

James Kanze Over a year ago

Why not simply sed -n 's:/^401,//p' index_hyper.txt? That will probably be at least as fast as anything he can write. But presumably, in real life, he'd like to detect syntax errors, and output a message to standard error in such cases. (Otherwise, why worry about C++, when there is such a simple solution already available?)

Adam Burry Over a year ago

@JamesKanze, thanks for the simplification. Why C++? Well, the question does not say why C++. Without a reason given, it is reasonable to assume there is none. "If all you have is a hammer, everything looks like a nail."

James Kanze Over a year ago

@AdamBurry :-). Yes. Since he mentions his "PC", it's likely that he's running Windows. In which case, he probably doesn't have very much else but C++. (I'll admit that even with error checking etc., I'd knock off a Python script if I couldn't do it with the shell tools. No point in cranking up the compiler for anything this small.)

James Kanze · Accepted Answer · 2013-10-03 14:40:14Z

As a general rule, don't start breaking the string up into smaller pieces (substrings) until you need to. And start by specifying exactly what is wanted: you speak of RHS and LHS, and talk of "get RHS value based on LHS argument". So: do you want an exact match on the first field, a substring match on the first field, or a substring match on the entire line?

At any rate: once you have the line in line, you can easily separate it into the two fields:

std::string::const_iterator pivot = std::find( line.cbegin(), line.cend(), ',' );

What you do then depends on what your criterion is:

if ( pivot - line.cbegin() == search_str.size() && std::equal( line.cbegin(), pivot, search_str.begin() ) ) { // Exact match on first field... std::cout << std::string( std::next( pivot ), line.cend() ); } if ( std::search( line.cbegin(), pivot, search_str.begin(), search_str.end() ) != pivot ) { // Matches substring in first field... std::cout << std::string( std::next( pivot ), line.cend() ); } if ( std::search( line.cbegin(), line.cend(), search_str.begin(), search_str.end() ) != line.cend() ) { // Matches substring in complete line... std::cout << std::string( std::next( pivot ), line.end() ); } }

Of course, you'll need some additional error checking. What should you do if there isn't a comma in the line (e.g. pivot == line.end()), for example? Or what about extra spaces in the line. (Your example looks like numbers. Should "401" match only "401", or also "+401"?)

Before going any further, you should very carefully specify exactly what the code should do, for all possible inputs. (For most possible inputs, of course, the answer will probably be: output an error message with the line number to std::cerr and continue. Being sure to return EXIT_FAILURE in such a case.)

Superb! James its one value as input and its associated RHS as output, one more thing any/ all input values are unique. Can you please put it in my code, I am getting lots of errors even though I include algorithm.h
What kind of errors? Except for an obvious typo, the only problem I encounter is that the compiler cannot instantiate std::equals, because pivot is a const_iterator, but line.begin() is non-const. You can correct this in several ways: in C++11, either make pivot auto (so it will be a non-const iterator), or use cbegin() and cend() everywhere on line, once you've read it. In pre C++11, the simplest is to declare pivot as a non-const iterator (or since you're going to be using line.begin() several times, you can save it in a const_iterator, and use that.
I've edited to answer to reflect the necessary fixes so that it compiles and works (for me).
Thanks! I am on Linux Env and using NetBeans IDE, I presume std::next is been included in iterator.h I've included that, but still - unable to resolve identifier 'next'
Well I've got 2 errors: Error-1: main.cpp:25: error: no matching function for call to ‘equal(__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<const char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >)’ Error-2: main.cpp:28: error: ‘next’ is not a member of ‘std’

Collectives™ on Stack Overflow

String search/ indexing in a file using C++

2 Answers 2

5 Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

6 Comments

Linked

Related