1

This might have appeared before, but I couldn't understand how to extract formatted data. Below is my code to extract all text between string "[87]" and "[90]" in a text file.

Apparently, the position of [87] and [90] is the same as indicated in the output.

void ExtractWebContent::filterContent(){ string str, str1; string positionOfCurrency1 = "[87]"; string positionOfCurrency2 = "[90]"; size_t positionOfText1, positionOfText2; ifstream reading; reading.open("file_Currency.txt"); while (!reading.eof()){ getline (reading, str); positionOfText1 = str.find(positionOfCurrency1); positionOfText2 = str.find(positionOfCurrency2); cout << "positionOfCurrency1 " << positionOfText1 << endl; cout << "positionOfCurrency2 " << positionOfText2 << endl; //str1= str.substr (positionOfText); cout << "String" << str1 << endl; } reading.close(); 

An Update on the currency file:

[79]More »Brent slips to $102 on worries about euro zone economy

Market Data

 * Currencies 

CAPTION: Currencies

 Name Price Change % Chg [80]USD/SGD 1.2606 -0.00 -0.13% USD/SGD [81]USDSGD=X [82]EUR/SGD 1.5242 0.00 +0.11% EUR/SGD [83]EURSGD=X 
3
  • You might like my older answer which used Boost Format for the output Commented Jul 24, 2012 at 16:29
  • I wrote a very general answer that should send you in the right direction. If you can add the actual file format I can be more specific. Commented Jul 25, 2012 at 0:01
  • I have updated the text file which content is to be extracted. It seems getline is a possible solution. Commented Jul 25, 2012 at 8:15

3 Answers 3

2

That really depends on what 'extracting data means'. In simple cases you can just read the file into a string and then use string member functions (especially find and substr) to extract the segment you are interested in. If you are interested in data per line getline is the way to go for line extraction. Apply find and substr as before to get the segment.

Sometimes a simple find wont get you far and you will need a regular expression to do easily get to the parts you are interested in.

Often simple parsers evolve and soon outgrow even regular expressions. This often signals time for the very large hammer of C++ parsing Boost.Spirit.

Sign up to request clarification or add additional context in comments.

Comments

1

Boost.Tokenizer can be helpful for parsing out a string, but it gets a little trickier if those delimiters have to be bracketed numbers like you have them. With the delimieters as described, a regex is probably adequate.

Comments

0

All that does is concatenate the output of reading and the strings "[1]" and "[2]". I'm guessing this code resulted from a rather literal extrapolation of similar code using scanf. scanf (as well as the rest of C) still works in C++, so if that works for you I would use it.

That said, there are various levels of sophistication at which you can do this. Using regexes is one of the most powerful/flexible ways, but it might be overkill. The quickest way in my opinion is just to do something like:

  • Find index of substring "[1]", i1
  • Find index of substring "[2]", i2
  • get substring between i1+3 and i2.

In code, supposing std::string line has the text:

size_t i1 = line.find("[1]"); size_t i2 = line.find("[2]"); std::string out(line.substr(i1+3, i2)); 

Warning: no error checking.

1 Comment

Right, I have done it as shown above. But the returning position of 2 size_t is the same! How can we resolve this? Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.