0

I have a file, which I want to process and take only some information to modify. I want, on the same run, for the sake of speed, to write the file in another output file.

I could just pick the info I wanted (one run) and then copy the file to the output file(second run). I am just doing that in one run, so that I can avoid the second one.

Below is my code. Don't get distracted by the if conditions, these are for picking the info I want. The problem is writing the to other file.

void readPoints(char* filename, std::vector<Point>& v, char* outfilename) { std::ifstream infile; std::string str; infile.open(filename); if (!infile) std::cout << "File not found!" << std::endl; std::ofstream outfile; outfile.open(outfilename); Point::FT coords[3]; while(1) { infile >> str; outfile << str << "\t"; if(str == "ABET") outfile << std::endl; if(str == "ATOM") { infile >> str; outfile << str << "\t"; if(str == "16" || str == "17" || str == "18" || str == "20" || str == "21" || str == "22") { for(int j = 0; j < 4; ++j) { infile >> str; outfile << str << "\t"; } for (int j = 0; j < 3; ++j) { infile >> str; outfile << str << "\t"; coords[j] = std::stod(str); } Point p(3, coords); v.push_back(p); } } if(str == "END") break; } infile.close(); outfile.close(); } 

The problem is that infile brings me words, not whitespaces, etc. So, I am using a tab to separate the words from each other. However, this is not enough, since the original file is not using tabs, but (white)spaces, I think.

Original file:

ATOM 1 HT1 ASP X 1 9.232 -9.194 6.798 1.00 1.00 ABET ATOM 2 HT2 ASP X 1 8.856 -7.726 7.401 1.00 1.00 ABET ... ATOM 50 HH11 ARG X 5 0.925 -3.001 6.677 1.00 1.00 ABET ATOM 51 HH12 ARG X 5 0.285 -4.616 6.734 1.00 1.00 ABET ... END 

Output file:

ATOM 1 HT1 ASP X 1 9.232 -9.194 6.798 1.00 1.00 ABET ATOM 2 HT2 ASP X 1 8.856 -7.726 7.401 1.00 1.00 ABET ... ATOM 50 HH11 ARG X 5 0.925 -3.001 6.677 1.00 1.00 ABET ATOM 51 HH12 ARG X 5 0.285 -4.616 6.734 1.00 1.00 ABET ... END 

Does anyone know a way to fix this? Notice that the info are the same in both files, the distance between the words is what is bothering me!

3
  • Can't you read in the whole line and save it to a buffer, process the buffer, then write the whole line to the file? Commented Aug 21, 2014 at 17:38
  • you may be better off using FILE * and fprintf so you can space in the format string. Commented Aug 21, 2014 at 17:39
  • The buffer seems a good solution. C style would work if I knew the spacing I would need. Commented Aug 21, 2014 at 18:02

3 Answers 3

1

It appears you're trying to modify a .pdb file. This file format is very finicky in that it requires the spacing to be exact. The way to get this to work is to study the format, and mkae sure you put the right number of spaces in the right places. For example, you want the atom number to finish in the 11th place to match up with the other file, so you add 7 - str.length() whitespaces between ATOM and the first atom number (7 because the first four characters are already taken up by ATOM). Follow a similar approach for the rest of the file and you should be fine.

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, it's a .pdb file, but I don't like the idea of counting. A buffer can do that easier.+1 for identifying the extension though!
@G.Samaras: As someone who worked with .pdb files everyday for the length of my Ph.D., it's pretty easy to spot. Yeah, a buffer would work better... it's just that whenever I wrote these codes, I always used pure C, so the above solution is actually what I've coded and used for ~4 years.
I can imagine why! I posted what I finally did. Thanks however for the answer. Hope you may find me answer useful! (Good luck with the Ph.D.).
1

The functions you are using to process this data format are fighting with the data format, as they are not meant to process that format of data.

Read the file line-by-line into a string and use memcmp/memcpy instead of string compares to just compare and modify things. It's fixed format. (or you could use COBOL to easily process it j/k!)

char inline[5000]; //open file //loop thru // read line to string if (0==memcmp(inline,"ATOM",4)) ... // yada yada yada for (int j = 0; j < 3; ++j) { char coord[9]; memcpy(coord,inline+offset+j*8,8); coord[8]=0; // do something with it... if (iNeedToWriteToOuptput) { memcpy(inline+offset+j*8," 0.000"); // etc... // write string to output 

You get the idea, hope that helps.

3 Comments

I got the idea, but the code you are providing is a bit unclear. However, +1 for the general idea.
It's pre-c++ stuff from the C libs. Kinda psuedocode and meant for example. (e.g., I don't see you modifying the output like I show zeroing out a field but it appears your logic would exclude lines). Also I didn't compile or test my syntax nor have I written C for 15 years. but the functions are well documented.
I am not hitting on you, I am just saying that the (pseudo)code is not as clear as it could be. :)
1

The answer is basically what clcto commented under the question.

I use this code to copy the files and process them.

void readPoints(char* filename, std::vector<Point>& v, char* outfilename) { std::ofstream outfile; outfile.open(outfilename); std::ifstream infile(filename); if (!infile) { std::cout << "File not found!" << std::endl; return; } std::string line; while (std::getline(infile, line)) { std::cout << line << std::endl; // if line of interest, process it // write to the other file outfile << line << std::endl; } infile.close(); outfile.close(); } 

And then I used this answer for the replacement.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.