0

I have a data file with an unknown amount of unformatted, not needed data at the start and end of the file. But, in the middle, the data is precisely formatted and the first column will always start with one of a couple keywords. I want to skip to this part and read in that data, assigning each column to a variable. This would be simple if there wasn't the start and end "garbage" text.

Here is simple example problem. In my real code, each variable is part of a structure. I do not think this will matter, but mention it just in case...

here is my text file, I want all lines that start with keyword, and I want all columns assigned to variables


REMARK: this should be simpler REMARK: yes, it should REMARK: it is simple, you just don't see it yet Comment that doesn't start with REMARK keyword aaa 1 bbb 1 1.2555 O keyword aaa 1 bbb 2 2.2555 H keyword aaa 1 bbb 3 3.2555 C keyword aaa 1 bbb 4 4.2555 C END Arbitrary garbage texts 

if there were no random comments, I could use

int main{ string filename = "textfile.pdb"; string name1,name2,name3; int int1, int2; double number; ifstream inFile; inFile.open(filename.c_str()); while (inFile.good()) { inFile >> keyword >> name1 >> int1>>name2>>int2>>number>>name3; } inFile.close(); } 

I tried getting around this by using

while (getline(inFile,line)) 

This method lets me look at the line, and check if it has "keyword" in it. but then I couldn't use the convenient formatted input of the first method. I need to parse the string, which seems tricky in c++.I tried using sscanf but it complained about str to char.

The first method is nicer, I just don't know how to implement a check to only read in the line to the variables, if the line is a formatted one.

2 Answers 2

2

You can easily locate only the formatted lines you are interested in by reading each line and creating a stringstream from the line and validating the line begins with "keyword" and that it contains each remaining item. Since you are using stringstream, you need not read all values as a string, you can simply read the value as the desired type. If the line begins with END, you are done reading, just break;, otherwise if the first word is not "keyword", just read the next line from the file and try again.

After opening an ifstream to your data file as f, you could do the following to locate and parse the wanted data:

 while (getline (f, line)) { /* read each line */ int aval, bval; /* local vars for parsing line */ double dblval; std::string kw, a, b, ccode; std::stringstream s (line); /* stringstream to parse line */ /* if 1st word not keyword, handle line appropriately */ if ((s >> kw) && kw != "keyword") { if (kw == "END") /* done with data */ break; continue; /* otherwise get next line */ } /* read/validate all other data values */ else if ((s >> a) && (s >> aval) && (s >> b) && (s >> bval) && (s >> dblval) && (s >> ccode)) std::cout << kw << " " << a << " " << aval << " " << b << " " << bval << " " << dblval << " " << ccode << '\n'; else { /* otherwise invalid data line */ std::cerr << "error: invalid data: " << line; continue; } } 

(which just outputs the wanted values to stdout, you can use them as needed)

Putting it altogether in a short example to use with your data, you could do something similar to:

#include <iostream> #include <fstream> #include <sstream> #include <string> int main (int argc, char **argv) { std::string line; /* string to hold each line */ if (argc < 2) { /* validate at least 1 argument given */ std::cerr << "error: insufficient input.\n" "usage: " << argv[0] << " filename\n"; return 1; } std::ifstream f (argv[1]); /* open file */ if (!f.is_open()) { /* validate file open for reading */ perror (("error while opening file " + std::string(argv[1])).c_str()); return 1; } while (getline (f, line)) { /* read each line */ int aval, bval; /* local vars for parsing line */ double dblval; std::string kw, a, b, ccode; std::stringstream s (line); /* stringstream to parse line */ /* if 1st word not keyword, handle line appropriately */ if ((s >> kw) && kw != "keyword") { if (kw == "END") /* done with data */ break; continue; /* otherwise get next line */ } /* read/validate all other data values */ else if ((s >> a) && (s >> aval) && (s >> b) && (s >> bval) && (s >> dblval) && (s >> ccode)) std::cout << kw << " " << a << " " << aval << " " << b << " " << bval << " " << dblval << " " << ccode << '\n'; else { /* otherwise invalid data line */ std::cerr << "error: invalid data: " << line; continue; } } f.close(); } 

Example Input File

$ cat dat/formatted_only.txt REMARK: this should be simpler REMARK: yes, it should REMARK: it is simple, you just don't see it yet Comment that doesn't start with REMARK keyword aaa 1 bbb 1 1.2555 O keyword aaa 1 bbb 2 2.2555 H keyword aaa 1 bbb 3 3.2555 C keyword aaa 1 bbb 4 4.2555 C END Arbitrary garbage texts 

Example Use/Output

$ ./bin/sstream_formatted_only dat/formatted_only.txt keyword aaa 1 bbb 1 1.2555 O keyword aaa 1 bbb 2 2.2555 H keyword aaa 1 bbb 3 3.2555 C keyword aaa 1 bbb 4 4.2555 C 

Look things over and let me know if you have further questions.

Sign up to request clarification or add additional context in comments.

Comments

2

I'd suggest something like this:

Parsing text file in C++

string name,age,salary,hoursWorked,randomText; ifstream readFile("textfile.txt"); while(getline(readFile,line)) { stringstream iss(line); getline(iss, name, ':'); getline(iss, age, '-'); getline(iss, salary, ','); getline(iss, hoursWorked, '['); getline(iss, randomText, ']'); } readFile.close(); 

4 Comments

hmm I get this error main.cpp: In function ‘int main()’: main.cpp:92:39: error: variable ‘std::stringstream iss’ has initializer but incomplete type std::stringstream iss(line);
that does make a difference
The issue I have now is that my variables in my structure are not strings. This method seems to require them to be strings... I cannot for other reasons, define them as strings in the structure, it gets used by too many other things
Once you've parsed an item, you can easily convert it from "string" into anything your little heart desires ;) For example, std::stod converts a string to a double.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.