0

I am hoping this is an easy question that someone can answer for me or give me an example. I am working with files that can be sized upwards of 4GB and I foresee memory issues if I want to store my entire input file edit it and then output it, so I thought it would be easier to look into rewriting the file as I go (line by line or more likely section of 25 lines at a time). As I looked into it though it seems way more complicated than I originally thought, and everywhere (including this site) people recommend storing the data or opening a separate output file. Is it possible to edit a file as you read it in? If so what is the best way to do so? Should I try to use the npos of the file to go back to what a read in?

File Format (Without header and extraneous information):

* voxel 0 0 0 1 1 1 3
Res 000000000000 000100000000 2.66668e+06
Cap 000000000000 000100000000 2.19141e-16
Res 000000010000 000100010000 2.66668e+06
Cap 000000010000 000100010000 2.19141e-16
Res 000000000001 000100000001 2.66668e+06
Cap 000000000001 000100000001 2.19141e-16
Res 000000010001 000100010001 2.66668e+06
Cap 000000010001 000100010001 2.19141e-16
Res 000000000000 000000010000 2.66668e+06
Cap 000000000000 000000010000 2.19141e-16
Res 000100000000 000100010000 2.66668e+06
Cap 000100000000 000100010000 2.19141e-16
Res 000000000001 000000010001 2.66668e+06
Cap 000000000001 000000010001 2.19141e-16
Res 000100000001 000100010001 2.66668e+06
Cap 000100000001 000100010001 2.19141e-16
Res 000000000000 000000000001 2.66668e+06
Cap 000000000000 000000000001 2.19141e-16
Res 000100000000 000100000001 2.66668e+06
Cap 000100000000 000100000001 2.19141e-16
Res 000000010000 000000010001 2.66668e+06
Cap 000000010000 000000010001 2.19141e-16
Res 000100010000 000100010001 2.66668e+06
Cap 000100010000 000100010001 2.19141e-16

Based on information from another file and the location (the 0 0 0 1 1 1) at the top of the file, determines how and what values change, but again I am really concerned when talking about thousands if not more of these blocks of data that I cannot read from and then write to a new file, and I have no idea how to read and write effectively on this file. The only thing that should change is the values at the end of each line (2.6668e+06 and 2.19141e-16) although those will differ, they are all the same in this case for easier understanding. I am currently reading in the file (ifstream only) and can get to the point where I need to rewrite the file but don't know how to easily change position of where I am writing nor how to insert over vs just insert in and other similar issues of writing to an existing file.

Any advice is appreciated, short examples especially so!

3
  • I think you need to look into tellp and the other associated links at the bottom of the page if I understand you correctly. Commented Sep 12, 2013 at 22:31
  • Note that while you can modify the file in place, many (most?) modern file storage implementations will write the data back in a different physical storage block and substitute that into the file's block collection in place of the old one. Commented Sep 12, 2013 at 22:39
  • Really the only complicated part of this is figuring out the size of your header information. Once you have that you can seek back and forth by (record number * record size + header lenth). Commented Sep 12, 2013 at 22:39

2 Answers 2

1

If the records you are modifying are always the same size for both "new" and "old" data, it's no problem rewriting into the same file - it's only an issue if you are going to write data that is a different length than your "old" data.

Just open the file with fstream f("somename.ext", ios::out|ios::in), and use f.seekg() and f.seekp() as required to go to the relevant place in the file (you can use tellp and tellg to figure out where you are currently)

Sign up to request clarification or add additional context in comments.

Comments

1

To expand on what Mats Petersson said, with files that size, and with like-sized writes, you would be well served by memory-mapped files--especially with the file sizes you're describing. Otherwise, your next best bet is going to be buffer-list scheme, which doesn't necessarily have to be much more complicated.

4 Comments

Why? All he has to do is modify a single value in-place on disk. It's just a matter of calculating the offset.
Whether memory mapped file is beneficial or not really depends on the read/write pattern. If the file is read sequentially, then the only benefit with a memory mapped file is that it may be a little bit faster due to the reduced number of copying of the actual data. But text files are quite awkward to deal with as memory mapped files, since the parsing then requires using stringstream and finding the start/end of a record, etc. Using fstream to read the data using >> and writing it using << is much simpler. If performance is very bad, then it's perhaps worth investigating ...
As I interpreted OP, he essentially has (a) a separate transaction file indicating what records he has to change; (b) this file which consists of a single line of (something) and a bunch of fixed length records in text format. Memory mapping seems both more complicated and unnecessary for this. Probably the best bet, if possible, is to sort the transaction file so it is more or less a sequential pass through the file without seeking all over the place.
Thank you guys for the posts. Sorry it took so long to get back to you, with the weekend here I got preoccupied. Your posts were very beneficial, although I am commenting here because I am kinda confused, I don't know exactly what you mean by memory mapped file or how I would go about doing such a thing. This project as a whole is pretty lenient as far as what I can choose to do, unfortunately I have little experience in files of this magnitude. If there is any articles or posts you could link to regarding the memory mapped files whether I use it or not I would love to learn more about it!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.