2

I'm currently trying to read the full contents of a file on Windows, using C's fread function. This function requires the size of the buffer that is being read into to be passed as an argument. And because I want the whole file to be read, I need to pass in the size of the file in bytes.

I've tried getting the size of a file on Windows though the use of the Win32 API, more specifically using GetFileSizeEx. The below snippet is from an existing Stack Overflow answer.

__int64 GetFileSize(const char* name) { HANDLE hFile = CreateFile(name, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if(hFile == INVALID_HANDLE_VALUE) return -1; // error condition, could call GetLastError to find out more LARGE_INTEGER size; if(!GetFileSizeEx(hFile, &size)) { CloseHandle(hFile); return -1; // error condition, could call GetLastError to find out more } CloseHandle(hFile); return size.QuadPart; } 

The returned size from this function is bigger than the actual file size. After executing the following code block

FILE* file = fopen(path, "r"); long size = (long)GetFileSize(path); char* buffer = new char[size + 1]; fread(buffer, 1, size, file); buffer[size] = '\0'; 

the buffer contains garbage bytes at the end of it. I've checked by hand, and the returned size is surely bigger than the actual size in bytes.

I've tried the other methods described in the same Stack Overflow answer linked above, but they all result in garbage bytes at the end of the buffer.

11
  • generic C you seek to the end and get the file position yes? Commented Jul 26, 2020 at 14:16
  • @old_timer I've tried that also but that seems to be non standard, and it also returns a bigger size than the actual size. Even with "r" and "rb" reading modes. Commented Jul 26, 2020 at 14:18
  • 1
    Both GetFileSize as well as seeking to the end are supposed to give you an accurate file size. How did you determine your "actual file size"? Maybe that method is wrong. Commented Jul 26, 2020 at 14:21
  • You say "contains garbage bytes at the end". Is that garbage read or garbage from the buffer? Did you check that? Besides new char[size] is C++, not C. Commented Jul 26, 2020 at 14:24
  • What is your baseline for file size? Commented Jul 26, 2020 at 14:27

4 Answers 4

4

FILE* file = fopen(path, "r"); should be FILE* file = fopen(path, "rb"); If you want an accurate size open the file in binary mode.

On Windows reading a file in text mode causes "\r\n" sequences to be converted to "\n", resulting in the appearance of fewer bytes being read than expected.

Sign up to request clarification or add additional context in comments.

1 Comment

For some reason using binary mode wasn't working before but now it is. I'm hoping it was an issue on my end and not with the function. Thanks!
1

The standard way to read file size on any system using only C standard functions make use of fseek() and ftell() function:

#include <stdio.h> long get_file_len(char *filename) { long int size=0; FILE *fp= fopen ( filename , "rb" ); if (!fp) return 0; fseek (fp,0,SEEK_END); //move file pointer to end of file size= ftell (fp); fclose(fp); return size; } 

As variant you can use also lseek():

#include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <fcntl.h> long get_file_len(char *filename) { long int size=0; int f_read = open(filename, O_RDONLY); if (f_read == -1) return 0; size = lseek (f_read ,0,SEEK_END); //move file pointer to end of file close(f_read ); return size; } 

2 Comments

Variant with lseek() is not correct. lseek() requires file desctiptor, not FILE * pointer. size = lseek(fileno(fp), 0, SEEK_END); should work. But in POSIX fstat() is more convenient for file descriptor.
@dimich thanks. For some reasons I don't modified the whole procedure. I fixed it now.
0

You should open the file in binary mode and you should use fseek and ftell to get the file size, that is the portable way. That way you get rid of the windows text mode convertions.

FILE* file = fopen(path, "rb"); fseek(file,0,SEEK_END) ; //move to 0 bytes to the end long size=ftell(file); //get the size (pos at end) rewind(file); //same as fseek(file,0,SEEK_SET), move the position to the begining char* buffer = new char[size + 1]; long bytes_read=fread(buffer, 1, size, file); buffer[bytes_read]=0; if (bytes_read!=size) { // check errors (feof) } 

Comments

0

If using stdio routines and FILE pointers instead of Win32 functions and HANDLEs, you can use _filelength() or _filelengthi64() to get the size of an opened file on Windows.

Demonstration program:

#include <stdio.h> #include <stdlib.h> #include <io.h> // For _filelength()/_filelengthi64() int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "Usage: %s filename\n", argv[0]); return EXIT_FAILURE; } FILE *fp = fopen(argv[1], "rb"); if (!fp) { fprintf(stderr, "Unable to open '%s' for reading!\n", argv[1]); return EXIT_FAILURE; } int fd = _fileno(fp); // Get the file descriptor from the file pointer long len = _filelength(fd); printf("File '%s' is %ld bytes long.\n", argv[1], len); fclose(fp); return 0; } 

Example:

$ .\filesize filesize.c File 'filesize.c' is 544 bytes long. 

The often seen fseek() and ftell() method isn't well supported on text files by either the C standard (C11 7.21.9.4p2: "For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read."), and is especially completely hopeless on files opened in text mode on Windows.

From the Microsoft documentation on fseek():

For streams opened in text mode, fseek and _fseeki64 have limited use, because carriage return-line feed translations can cause fseek and _fseeki64 to produce unexpected results.

among other issues (The presence of a CTL-Z, or BOM for example). Because of these, any time you're interested in the total size of the file you likely want to open it in binary mode on Windows, no matter how you're getting the size.

Comments