7

I have something that is Read; currently it's a File. I want to read a number of bytes from it that is only known at runtime (length prefix in a binary data structure).

So I tried this:

let mut vec = Vec::with_capacity(length); let count = file.read(vec.as_mut_slice()).unwrap(); 

but count is zero because vec.as_mut_slice().len() is zero as well.

[0u8;length] of course doesn't work because the size must be known at compile time.

I wanted to do

let mut vec = Vec::with_capacity(length); let count = file.take(length).read_to_end(vec).unwrap(); 

but take's receiver parameter is a T and I only have &mut T (and I'm not really sure why it's needed anyway).

I guess I can replace File with BufReader and dance around with fill_buf and consume which sounds complicated enough but I still wonder: Have I overlooked something?

3 Answers 3

6

Like the Iterator adaptors, the IO adaptors take self by value to be as efficient as possible. Also like the Iterator adaptors, a mutable reference to a Read is also a Read.

To solve your problem, you just need Read::by_ref:

use std::io::Read; use std::fs::File; fn main() { let mut file = File::open("/etc/hosts").unwrap(); let length = 5; let mut vec = Vec::with_capacity(length); file.by_ref().take(length as u64).read_to_end(&mut vec).unwrap(); let mut the_rest = Vec::new(); file.read_to_end(&mut the_rest).unwrap(); } 
Sign up to request clarification or add additional context in comments.

3 Comments

I found out I don't even have to call by_ref(). Rust seems to do it automatically on a &File, it also doesn't seem to be necessary to be mutable. This confuses me greatly! Is by_ref() some sort of magic method?
@musiKk No, it is not magic, it's actually extremely simple, its implementation is one line and the body is one keyword. The fact that a reference works seems... incorrect to me. I might have to ask my own question!
Don't sweat it, I'm done with one of my own. ;)
3

1. Fill-this-vector version

Your first solution is close to work. You identified the problem but did not try to solve it! The problem is that whatever the capacity of the vector, it is still empty (vec.len() == 0). Instead, you could actually fill it with empty elements, such as:

let mut vec = vec![0u8; length]; 

The following full code works:

#![feature(convert)] // needed for `as_mut_slice()` as of 2015-07-19 use std::fs::File; use std::io::Read; fn main() { let mut file = File::open("/usr/share/dict/words").unwrap(); let length: usize = 100; let mut vec = vec![0u8; length]; let count = file.read(vec.as_mut_slice()).unwrap(); println!("read {} bytes.", count); println!("vec = {:?}", vec); } 

Of course, you still have to check whether count == length, and read more data into the buffer if that's not the case.


2. Iterator version

Your second solution is better because you won't have to check how many bytes have been read, and you won't have to re-read in case count != length. You need to use the bytes() function on the Read trait (implemented by File). This transform the file into a stream (i.e an iterator). Because errors can still happen, you don't get an Iterator<Item=u8> but an Iterator<Item=Result<u8, R::Err>>. Hence you need to deal with failures explicitly within the iterator. We're going to use unwrap() here for simplicity:

use std::fs::File; use std::io::Read; fn main() { let file = File::open("/usr/share/dict/words").unwrap(); let length: usize = 100; let vec: Vec<u8> = file .bytes() .take(length) .map(|r: Result<u8, _>| r.unwrap()) // or deal explicitly with failure! .collect(); println!("vec = {:?}", vec); } 

6 Comments

I have thought about something like that but isn't that rather inefficient? It looks like allocation of the Vec changes from O(1)ish to O(n).
Sure, but you're going O(n) anyway when filling the vector, so it's a matter of 2n vs. n, which is not that bad. I would be more worried about the check count == length which is avoided in the 2nd version (see edit).
I guess the question boiled down to "how do I create a Vec with a particular length`. When I think about it, higher-level languages that shield the developer from this probably do this behind the scenes, too. Otherwise I'm aware of error handling and checking the number of bytes actually read. I just left it out for brevity. Thanks. :)
Your solution with bytes() actually has the same problem as my take(): It consumes the file but I only have a borrowed mutable reference to the file so this doesn't work. - Ok, I have no idea what's going on. It does work with a reference to the file but not if I have a reference to a struct that contains the file.
Please use vec![0u8; length] to build the vector, it's shorter, performs better, and should be the idiom.
|
1

You can always use a bit of unsafe to create a vector of uninitialized memory. It is perfectly safe to do with primitive types:

let mut v: Vec<u8> = Vec::with_capacity(length); unsafe { v.set_len(length); } let count = file.read(vec.as_mut_slice()).unwrap(); 

This way, vec.len() will be set to its capacity, and all bytes in it will be uninitialized (likely zeros, but possibly some garbage). This way you can avoid zeroing the memory, which is pretty safe for primitive types.

Note that read() method on Read is not guaranteed to fill the whole slice. It is possible for it to return with number of bytes less than the slice length. There are several RFCs on adding methods to fill this gap, for example, this one.

1 Comment

This is very interesting. For now I went with file.take(length).read_to_end(vec). I had a problem with a move out of a borrowed struct that I now managed to fix. That's why I dismissed this solution in my question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.