6

I have a file, say myfile. Using Rust, I would like to open myfile, and read bytes N to M into a Vec, say myvec. What is the most idiomatic way to do so? Naively, I thought of using bytes(), then skip, take and collect, but that sounds so inefficient.

4
  • 4
    Use seek() on File to skip to wherever you want to start reading and then read_exact() to read exactly the amount you want. Commented Aug 7, 2021 at 16:55
  • Cool! But, I’d like to get the data into a Vec. Should I preallocate one full of zeros? That sounds wasteful, no? Commented Aug 7, 2021 at 17:03
  • 2
    vec![0; 1024] will heap-allocate a zeroed buffer of 1024 bytes in a single call to the allocator, can't get any faster than that. Commented Aug 7, 2021 at 17:11
  • Your correct in assuming .bytes().skip(a).take(b).map(|r| r.unwrap()).collect::<Vec<_>>() will be slow. It can be 200 or more times slower than the .seek() & .read_exact() approach or even much slower depending on the number of bytes skipped and taken. Commented Aug 7, 2021 at 22:34

2 Answers 2

11

The most idiomatic (to my knowledge) and relatively efficient way:

let start = 10; let count = 10; let mut f = File::open("/etc/passwd")?; f.seek(SeekFrom::Start(start))?; let mut buf = vec![0; count]; f.read_exact(&mut buf)?; 

You indicated in the comments that you were concerned about the overhead of zeroing the memory before reading into it. Indeed there is a nonzero cost to this, but it's usually negligible compared to the I/O operations needed to read from a file, and the advantage is that your code remains 100% sound. But for educational purposes only, I tried to come up with an approach that avoids the zeroing.

Unfortunately, even with unsafe code, we cannot safely pass an uninitialized buffer to read_exact because of this paragraph in the documentation (emphasis mine):

No guarantees are provided about the contents of buf when this function is called, implementations cannot rely on any property of the contents of buf being true. It is recommended that implementations only write data to buf instead of reading its contents.

So it's technically legal for File::read_exact to read from the provided buffer, which means we cannot legally pass uninitialized data here (using MaybeUninit).

Sign up to request clarification or add additional context in comments.

3 Comments

About your unsafe code, this comment in a similar code from std source code says: "This creates a (mut) reference to a slice of uninitialized integers, which is undefined behavior. Only the standard library gets to soundly "ignore" this...". Although there is an unstable read_initializer feature that may make this reliably possible in the future.
@rodrigo Thanks! In particular, the from_raw_parts_mut requirement that "data must point to len consecutive properly initialized values of type T" is being violated here. Do you know of a way to do this without UB?
I guess you could do: let mut v = Vec::with_capacity(count); r.take(count).read_to_end(&mut v);, but I don't know if it will be noticeable or even worse. Anyway, IME, zeroing of buffers never has as much performance impact as people expect.
3

The existing answer works, but it reads the entire block that you're after into a Vec in memory. If the block you're reading out is huge or you have no use for it in memory, you ideally need an io::Read which you can copy straight into another file or pass into another api.

If your source implements Read + Seek then you can seek to the start position and then use Read::take to only read for a specific number of bytes.

use std::{fs::File, io::{self, Read, Seek, SeekFrom}}; let start = 20; let length = 100; let mut input = File::open("input.bin")?; // Seek to the start position input.seek(SeekFrom::Start(start))?; // Create a reader with a fixed length let mut chunk = input.take(length); let mut output = File::create("output.bin")?; // Copy the chunk into the output file io::copy(&mut chunk, &mut output)?; 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.