I have a file, say myfile. Using Rust, I would like to open myfile, and read bytes N to M into a Vec, say myvec. What is the most idiomatic way to do so? Naively, I thought of using bytes(), then skip, take and collect, but that sounds so inefficient.
2 Answers
The most idiomatic (to my knowledge) and relatively efficient way:
let start = 10; let count = 10; let mut f = File::open("/etc/passwd")?; f.seek(SeekFrom::Start(start))?; let mut buf = vec![0; count]; f.read_exact(&mut buf)?; You indicated in the comments that you were concerned about the overhead of zeroing the memory before reading into it. Indeed there is a nonzero cost to this, but it's usually negligible compared to the I/O operations needed to read from a file, and the advantage is that your code remains 100% sound. But for educational purposes only, I tried to come up with an approach that avoids the zeroing.
Unfortunately, even with unsafe code, we cannot safely pass an uninitialized buffer to read_exact because of this paragraph in the documentation (emphasis mine):
No guarantees are provided about the contents of
bufwhen this function is called, implementations cannot rely on any property of the contents of buf being true. It is recommended that implementations only write data tobufinstead of reading its contents.
So it's technically legal for File::read_exact to read from the provided buffer, which means we cannot legally pass uninitialized data here (using MaybeUninit).
3 Comments
read_initializer feature that may make this reliably possible in the future.from_raw_parts_mut requirement that "data must point to len consecutive properly initialized values of type T" is being violated here. Do you know of a way to do this without UB?let mut v = Vec::with_capacity(count); r.take(count).read_to_end(&mut v);, but I don't know if it will be noticeable or even worse. Anyway, IME, zeroing of buffers never has as much performance impact as people expect.The existing answer works, but it reads the entire block that you're after into a Vec in memory. If the block you're reading out is huge or you have no use for it in memory, you ideally need an io::Read which you can copy straight into another file or pass into another api.
If your source implements Read + Seek then you can seek to the start position and then use Read::take to only read for a specific number of bytes.
use std::{fs::File, io::{self, Read, Seek, SeekFrom}}; let start = 20; let length = 100; let mut input = File::open("input.bin")?; // Seek to the start position input.seek(SeekFrom::Start(start))?; // Create a reader with a fixed length let mut chunk = input.take(length); let mut output = File::create("output.bin")?; // Copy the chunk into the output file io::copy(&mut chunk, &mut output)?;
seek()onFileto skip to wherever you want to start reading and thenread_exact()to read exactly the amount you want.vec![0; 1024]will heap-allocate a zeroed buffer of 1024 bytes in a single call to the allocator, can't get any faster than that..bytes().skip(a).take(b).map(|r| r.unwrap()).collect::<Vec<_>>()will be slow. It can be 200 or more times slower than the.seek()&.read_exact()approach or even much slower depending on the number of bytes skipped and taken.