How to extract formatted text in C++?

Question

This might have appeared before, but I couldn't understand how to extract formatted data. Below is my code to extract all text between string "[87]" and "[90]" in a text file.

Apparently, the position of [87] and [90] is the same as indicated in the output.

void ExtractWebContent::filterContent(){ string str, str1; string positionOfCurrency1 = "[87]"; string positionOfCurrency2 = "[90]"; size_t positionOfText1, positionOfText2; ifstream reading; reading.open("file_Currency.txt"); while (!reading.eof()){ getline (reading, str); positionOfText1 = str.find(positionOfCurrency1); positionOfText2 = str.find(positionOfCurrency2); cout << "positionOfCurrency1 " << positionOfText1 << endl; cout << "positionOfCurrency2 " << positionOfText2 << endl; //str1= str.substr (positionOfText); cout << "String" << str1 << endl; } reading.close();

An Update on the currency file:

[79]More »Brent slips to $102 on worries about euro zone economy

Market Data

 * Currencies

CAPTION: Currencies

 Name Price Change % Chg [80]USD/SGD 1.2606 -0.00 -0.13% USD/SGD [81]USDSGD=X [82]EUR/SGD 1.5242 0.00 +0.11% EUR/SGD [83]EURSGD=X

You might like my older answer which used Boost Format for the output — Flexo - Save the data dump
– Flexo - Save the data dump ♦, Commented Jul 24, 2012 at 16:29
I wrote a very general answer that should send you in the right direction. If you can add the actual file format I can be more specific. — pmr
– pmr, Commented Jul 25, 2012 at 0:01
I have updated the text file which content is to be extracted. It seems getline is a possible solution. — Bryan Wong
– Bryan Wong, Commented Jul 25, 2012 at 8:15

Community · Accepted Answer · 2017-05-23 11:55:46Z

That really depends on what 'extracting data means'. In simple cases you can just read the file into a string and then use string member functions (especially find and substr) to extract the segment you are interested in. If you are interested in data per line getline is the way to go for line extraction. Apply find and substr as before to get the segment.

Sometimes a simple find wont get you far and you will need a regular expression to do easily get to the parts you are interested in.

Often simple parsers evolve and soon outgrow even regular expressions. This often signals time for the very large hammer of C++ parsing Boost.Spirit.

Mike C · Accepted Answer · 2012-07-25 00:06:13Z

Boost.Tokenizer can be helpful for parsing out a string, but it gets a little trickier if those delimiters have to be bracketed numbers like you have them. With the delimieters as described, a regex is probably adequate.

dimatura · Accepted Answer · 2012-07-25 00:01:34Z

All that does is concatenate the output of reading and the strings "[1]" and "[2]". I'm guessing this code resulted from a rather literal extrapolation of similar code using scanf. scanf (as well as the rest of C) still works in C++, so if that works for you I would use it.

That said, there are various levels of sophistication at which you can do this. Using regexes is one of the most powerful/flexible ways, but it might be overkill. The quickest way in my opinion is just to do something like:

Find index of substring "[1]", i1
Find index of substring "[2]", i2
get substring between i1+3 and i2.

In code, supposing std::string line has the text:

size_t i1 = line.find("[1]"); size_t i2 = line.find("[2]"); std::string out(line.substr(i1+3, i2));

Warning: no error checking.

Right, I have done it as shown above. But the returning position of 2 size_t is the same! How can we resolve this? Thanks!

Collectives™ on Stack Overflow

How to extract formatted text in C++?

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related