1

I want to process a long svg file, but I found that when reading it into a buffer then printing the buffer out, a lot of text is written to the buffer beyond the end of the file, and I discovered that that extra text has been copied from before the end of the data and added. The last bit of text in the file closes with the svg tag, so it is easy to see where the end is. For a regular text file this would be less obvious. When trying to open such a file with an image application or a browser, the application gets confused because of this trailing text.

Eventually I discovered that for some reason the function ftell() returns a file length that is too large. Here is a simplified version of my function:

int datReadFileToBuf(char* fn, BYTE** buf) { int fileLen = 0; if (*buf != NULL) return -1; // ERROR: The buffer should be NULL. FILE* fp = NULL; if (fopen_s(&fp, fn, "r") != 0) return -2; fseek(fp, 0, SEEK_END); fileLen = ftell(fp); rewind(fp); if ((*buf = (BYTE*)calloc(fileLen, 1)) == NULL) return -3; size_t sizetLen = fread(*buf, 1, fileLen, fp); // fileLen == 483553 // sizetLen == 481976 fclose(fp); // return fileLen; // Bad return sizetlen; // Good } 

Negative return values indicate errors, fn is the filename and buf is the buffer which is declared as BYTE* datInBuf = NULL in the main() function and passed to datReadFileToBuf() together with fn. By returning sizeLen as the length of the buffer, the rest of the program works OK as the rest of the buffer is ignored, but by returning fileLen causes problems as it is larger.

This seems to be a problem with long text files. I have not checked if this happens with binary files. I've searched for problems with ftell() online, but found no explanation, so would be grateful to have some information on this issue. Incidentally, I'm using Visual Studio 2022 on a Windows 10 platform. The workaround I'm using fixes the problem, but there might be a better way.

7
  • 3
    I have no idea why fseek()/ftell() gets taught to get the size of a file. C11 7.21.9.4p2: "For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read." Commented Oct 3, 2023 at 15:05
  • Binary stream? C11 7.21.9.2p3: "... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END." Footnote 268: "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ..." Commented Oct 3, 2023 at 15:08
  • 1
    On Windows, use _fileno() and _filelength() to get the size of the underlying file given a FILE pointer. Or skip stdio and use Win32 file functions including GetFileSizeEx() and ReadFile(). Commented Oct 3, 2023 at 16:38
  • The return type for ftell() is long, not int -- which, in addition to the other reasons, is one more reason you may get incorrect results. See man 3 fseek. Or on windows ftell, _ftelli64 Commented Oct 3, 2023 at 16:59
  • 1
    Does this answer your question? Is there a way to get size of a file on Windows using C? Commented Oct 3, 2023 at 17:10

1 Answer 1

4

Open the file in the mode "rb". Otherwise read() on Windows translates \r\n to \n and receives less bytes than the real file size.

Sign up to request clarification or add additional context in comments.

3 Comments

Stuff like this is why PNG has \r\n in its header. If the OS performs linefeed normalization (because the file was opened in the wrong mode), the header will be broken ;)
And another reason modeling a line-ending on typewriter operations was a bad idea...
Many thanks, I opened the file using "rb" and it works. Both fileLen and sizetLen in the code above are now 483553 bytes. The SVG file is "car.svg" in the link dev.w3.org/SVG/tools/svgweb/samples/svg-files. However, before each LF (code 10) is a CR (code 13), in the file, which are read into my buffer. Normally the only two control codes I deal with are NUL and LF, and reading a CR causes problems with my application. The easy solution is to write a loop removing all bytes with a value of 13 from the buffer. Christopher

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.