1

I'm new here. Trying to do something I think should be easy but can't get to work. I have two files which have just simple data in

FileA

KIC 757137 892010 892107 892738 892760 893214 1026084 1435467 1026180 1026309 1026326 1026473 1027337 1160789 1161447 1161618 1162036 3112152 1163359 1163453 1163621 3123191 1164590 

and File B

KICID 1430163 1435467 1725815 2309595 2450729 2837475 2849125 2852862 2865774 2991448 2998253 3112152 3112889 3115178 3123191 � 

I'd like to read both files, and then print out the values that are the same, and ignoring titles. In this case I'd get that 1435467 3123191 are in both, and just these would be sent to a new file. so far I have

#include <cmath> #include <cstdlib> #include <string> #include <iomanip> #include <iostream> #include <fstream> #include <ctime> using namespace std; // Globals, to allow being called from several functions // main program int main() { float A, B; ifstream inA("FileA"); // input stream ifstream inB("FileB"); // second instream ofstream outA("OutA.txt"); // output stream while (inA >> A) { while (inB >> B) { if (A == B) { outA << A << "\t" << B << endl; } } } return 0; } 

And this just produces an empty document OutA I thought this would read a line of FileA, then cycle through FileB until it found a match, send to OutA, and then move onto the next line of FileA Any help would be appreciated?

3
  • How large are the files? Would it be an option to read both of them completely into memory? What do you mean by "ignoring titles"? Commented Sep 11, 2014 at 13:25
  • 3
    You need to reset inB to the start of the file for each A. And skip over the titles before you start reading numbers. Commented Sep 11, 2014 at 13:27
  • 1
    use inB.seekg(0, std::ios_base::beg); to reset the file pointer to the begining of the file every time you would like to match a number. Or much better you could read the data of one file in a structure (eg: std::set) and read the second trying to match if exist or not. In this case you only need to read the files (both one time). Disk access is a really expensive operation. Commented Sep 11, 2014 at 13:32

4 Answers 4

1

You need to put

inB.seekg(0, inB.beg) 

to the end of the outer while loop. Else you will stay at the end of inB and will read nothing after processing of the first entry of inA

Sign up to request clarification or add additional context in comments.

3 Comments

OK I tried this and currently still doesn't produce a result.
@ThomasNorth Maybe I wrote it misunderstandable. You should place it inside the outer while loop, and after the inner while loop.
:) I guessed that part. I've also changed the type to strings as that's actually what I need for the project.
1

Another problem may be that you are using float for A and B. Try int (or string), as float may not behave as you expect with ==. Refer to this question for details: What is the most effective way for float and double comparison?.

This code worked on my platform:

... while (inA >> A) { inB.clear(); inB.seekg(0, inB.beg); while (inB >> B) { if (A == B) { outA << A << "\t" << B << endl; } } } 

Notice the inB.clear() and inB.seekg(...), A and B are strings.

By the way, this method only good for quick-and-dirty implementation, it's not optimal for big files, as you get N * M complexity (N - size of FileA, M - size of FileB). By using hash set you may get to nearly linear (N + M) complexity.

Example of hash set implementation (C++11):

#include <string> #include <iostream> #include <fstream> #include <unordered_set> using namespace std; int main() { string A, B; ifstream inA("FileA"); // input stream ifstream inB("FileB"); // second instream ofstream outA("OutA.txt"); // output stream unordered_set<string> setA; while (inA >> A) { setA.insert(A); } while (inB >> B) { if (setA.count(B)) { outA << A << "\t" << B << endl; } } return 0; } 

1 Comment

I've changed the type to string, as that's what I actually need
1

Are both the files small enough to read into memory?

You could try something similar to the following:

int main(int argc, char**argv) { std::vector<std::string> a; std::vector<std::string> b; ofstream outA("OutA.txt"); // output stream ifstream inA("FileA"); // input stream ifstream inB("FileB"); // second instream std::string value; inA >> value; //read first line (and don't use - discarding header) while (inA >> A) { a.push_back(A);} //populate first vector inB >> value; //read first line (and don't use - discarding header) while (inB >> B) { b.push_back(B);} //populate first vector //std::sort will perform a pretty efficient sort std::sort(a.begin(),a.end()); std::sort(b.begin(),b.end()); //now that it is sorted, comparing is easier for (std::vector<std::string>::iterator ita=a.begin(), std::vector<std::string>::iterator itb=b.begin(); ita!=a.end(), itb!=b.end();) { if(*ita > *itb) itb++; else if(*ita < *itb) ita++; else outA << *ita <<'\n'; } return 0; } 

Reads both files into memory, sorts them both, and then compares them. The comparison only has to go through each file once, which reduces the complexity immensely O(a+b) instead of O(a*b). Of course the sorting will have an overhead, but this should be more efficient for larger files, and for shorter files it should be sufficiently fast still. (unless comparing lots and lots (and lots) of small files). I believe with std::sort the worst case for all this is O(aloga + blogb) which is better than O(a*b)

1 Comment

@Thomas North Due to your comments in response to this answer I've updated this to handle std::string instead of int
0

In the end I fixed it like so

#include <cmath> #include <cstdlib> #include <string> #include <iomanip> #include <iostream> #include <fstream> #include <ctime> using namespace std; //Globals, to allow being called from several functions //main program int main() { string A, B; ifstream inA("FileA.txt"); //input stream ifstream inB("FileB.txt") ;//second instream ofstream outA("OutA.txt"); //output stream while(inA>>A){//take in first stream while(inB>>B){//whilst thats happening take in second stream if (A==B){//do they match? If so then send out the value outA<<A<<"\t"<<B<<endl; //THIS IS JUST SHOW A DOES = B! } }//end of B loop inB.clear();//now clear the second stream (B) inB.seekg(0, inB.beg);//return to start of stream B }//move onto second input in stream A, and repeat return 0; } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.