1

I am using the following code for splitting of each word into a Token per line. My problem lies here: I want a continuous update on my number of tokens in the file. The contents of the file are:

Student details: Highlander 141A Section-A. Single 450988012 SA 

Program:

#include <iostream> using std::cout; using std::endl; #include <fstream> using std::ifstream; #include <cstring> const int MAX_CHARS_PER_LINE = 512; const int MAX_TOKENS_PER_LINE = 20; const char* const DELIMITER = " "; int main() { // create a file-reading object ifstream fin; fin.open("data.txt"); // open a file if (!fin.good()) return 1; // exit if file not found // read each line of the file while (!fin.eof()) { // read an entire line into memory char buf[MAX_CHARS_PER_LINE]; fin.getline(buf, MAX_CHARS_PER_LINE); // parse the line into blank-delimited tokens int n = 0; // a for-loop index // array to store memory addresses of the tokens in buf const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0 // parse the line token[0] = strtok(buf, DELIMITER); // first token if (token[0]) // zero if line is blank { for (n = 1; n < MAX_TOKENS_PER_LINE; n++) { token[n] = strtok(0, DELIMITER); // subsequent tokens if (!token[n]) break; // no more tokens } } // process (print) the tokens for (int i = 0; i < n; i++) // n = #of tokens cout << "Token[" << i << "] = " << token[i] << endl; cout << endl; } } 

Output:

Token[0] = Student Token[1] = details: Token[0] = Highlander Token[1] = 141A Token[2] = Section-A. Token[0] = Single Token[1] = 450988012 Token[2] = SA 

Expected:

Token[0] = Student Token[1] = details: Token[2] = Highlander Token[3] = 141A Token[4] = Section-A. Token[5] = Single Token[6] = 450988012 Token[7] = SA 

So I want it to be incremented so that I could easily identify the value by its variable name. Thanks in advance...

7
  • 2
    I'm just curious, but where are people finding this junk. There's no case (even in C) where strtok is an appropriate solution, and there's almost no case in C++ where you should be using the member getline, rather than reading into an std::string. And of course, !fin.eof() as a loop condition is wrong as well. Commented Sep 30, 2013 at 14:40
  • strtok(0, DELIMITER); is not valid, and should be generating a warning. Strtok's first parameter is a char*, and you have passed an int. Commented Sep 30, 2013 at 14:41
  • boost.org/doc/libs/1_54_0/libs/tokenizer/tokenizer.htm ? Commented Sep 30, 2013 at 14:46
  • 1
    @NeilKirk The first thing you need to learn when learning C++ is that nothing is obvious. But why are so many tutorials so bad? You'd think that word would get around after a while, people would stop linking to them, and they'd stop showing up in Google. Commented Sep 30, 2013 at 14:50
  • 2
    @andre If by "more effective", you mean correct, or "that actually work", then I agree. The issue isn't effectiveness here, it is correctness. Commented Sep 30, 2013 at 14:51

2 Answers 2

2

What's wrong with the standard, idiomatic solution:

std::string line; while ( std::getline( fin, line ) ) { std::istringstream parser( line ); int i = 0; std::string token; while ( parser >> token ) { std::cout << "Token[" << i << "] = " << token << std::endl; ++ i; } } 

Obviously, in real life, you'll want to do more than just output each token, and you'll want more complicated parsing. But anytime you're doing line oriented input, the above is the model you should be using (probably keeping track of the line number as well, for error messages).

It's probably worth pointing out that in this case, an even better solution would be to use boost::split in the outer loop, to get a vector of tokens.

Sign up to request clarification or add additional context in comments.

4 Comments

You should move int i = 0; before the wile loop. Otherwise you won't have the expected output.
@OlafDietsche The int i = 0; is before the while loop. (Look at his sample output to see what he wants.)
Sorry, I meant to move it before the first while loop. The output labeled "Output:" is what he gets and the output "Expected:" is what he wants. At least, that's what I understand.
@OlafDietsche Yes. It was I who misread his question. Yes, the variable (and its initialization) does belong before the first loop. (And in this case, there's no reason to use the nested loops, unless you want to keep track of the line number for error messages. Or use boost::split, which is really more appropriate in this case.)
0

I would just let iostream do the splitting

std::vector<std::string> token; std::string s; while (fin >> s) token.push_back(s); 

Then you can output the whole array at once with proper indexes.

for (int i = 0; i < token.size(); ++i) cout << "Token[" << i << "] = " << token[i] << endl; 

Update:

You can even omit the vector altogether and output the tokens as you read them from the input strieam

std::string s; for (int i = 0; fin >> s; ++i) std::cout << "Token[" << i << "] = " << token[i] << std::endl; 

12 Comments

What's with the !fin.eof()? That's never an appropriate loop condition.
See here: stackoverflow.com/questions/5605125/… for a discussion of what's wrong with !fin.eof().
@JamesKanze, us2012 You're both right. But if OP insists on doing it that way, he can achieve his objective with a separate output variable.
@user2754070 What do you mean with it breaks at line[2]?
@OlafDietsche If the OP insists on using fin.eof(), his code will never work. And if he insists on using strtok, it will be excessively fragile, and unmaintainable. You're first solution is fine, at least if he doesn't need to keep the lines separate; there's no point in trying to pretend that the alternatives he seems to favor are acceptable.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.