34

Is there a way I can read a structure directly from a file in Rust? My code is:

use std::fs::File; struct Configuration { item1: u8, item2: u16, item3: i32, item4: [char; 8], } fn main() { let file = File::open("config_file").unwrap(); let mut config: Configuration; // How to read struct from file? } 

How would I read my configuration directly into config from the file? Is this even possible?

4
  • 1
    Which format do your file have? The correct answer depends on the actual data representation in the file quite strongly. Commented Aug 20, 2014 at 16:59
  • 3
    @VladimirMatveev Binary format, I don't want to read from the file and copy to my struct; I want to use my struct as a buffer to read the file with. Commented Aug 20, 2014 at 17:02
  • Ah, I understand now what you need. You can't do it without some unsafe code. I'll try to write proof of concept now. Commented Aug 20, 2014 at 17:03
  • This crate seems to do exactly what you want: github.com/TyOverby/bincode Commented May 2, 2015 at 3:40

3 Answers 3

21

Here you go:

use std::io::Read; use std::mem; use std::slice; #[repr(C, packed)] #[derive(Debug, Copy, Clone)] struct Configuration { item1: u8, item2: u16, item3: i32, item4: [char; 8], } const CONFIG_DATA: &[u8] = &[ 0xfd, // u8 0xb4, 0x50, // u16 0x45, 0xcd, 0x3c, 0x15, // i32 0x71, 0x3c, 0x87, 0xff, // char 0xe8, 0x5d, 0x20, 0xe7, // char 0x5f, 0x38, 0x05, 0x4a, // char 0xc4, 0x58, 0x8f, 0xdc, // char 0x67, 0x1d, 0xb4, 0x64, // char 0xf2, 0xc5, 0x2c, 0x15, // char 0xd8, 0x9a, 0xae, 0x23, // char 0x7d, 0xce, 0x4b, 0xeb, // char ]; fn main() { let mut buffer = CONFIG_DATA; let mut config: Configuration = unsafe { mem::zeroed() }; let config_size = mem::size_of::<Configuration>(); unsafe { let config_slice = slice::from_raw_parts_mut(&mut config as *mut _ as *mut u8, config_size); // `read_exact()` comes from `Read` impl for `&[u8]` buffer.read_exact(config_slice).unwrap(); } println!("Read structure: {:#?}", config); } 

Try it here (Updated for Rust 1.38)

You need to be careful, however, as unsafe code is, well, unsafe. After the slice::from_raw_parts_mut() invocation, there exist two mutable handles to the same data at the same time, which is a violation of Rust aliasing rules. Therefore you would want to keep the mutable slice created out of a structure for the shortest possible time. I also assume that you know about endianness issues - the code above is by no means portable, and will return different results if compiled and run on different kinds of machines (ARM vs x86, for example).

If you can choose the format and you want a compact binary one, consider using bincode. Otherwise, if you need e.g. to parse some pre-defined binary structure, byteorder crate is the way to go.

Sign up to request clarification or add additional context in comments.

7 Comments

Yeah I'm aware about endian issues - but it's just a quick tool I'm writing which will run on about 3 computers.
@A.B., this, I believe. It is now located here.
I went with ´mem::uninitialized´ as opposed to mem::zeroed at the end. Doesn't seem to be much point initializing the memory to 0 if it's going to be overwritten anyway.
this gives me a "warning, this warning will become an error" message, github.com/rust-lang/rust/issues/46043
While the general outline of this code is good, this specific instance violates Rust's safety. The values for the character data are not valid and exceed the currently supported boundaries of characters.
|
15

As Vladimir Matveev mentions, using the byteorder crate is often the best solution. This way, you account for endianness issues, don't have to deal with any unsafe code, or worry about alignment or padding:

use byteorder::{LittleEndian, ReadBytesExt}; // 1.2.7 use std::{ fs::File, io::{self, Read}, }; struct Configuration { item1: u8, item2: u16, item3: i32, } impl Configuration { fn from_reader(mut rdr: impl Read) -> io::Result<Self> { let item1 = rdr.read_u8()?; let item2 = rdr.read_u16::<LittleEndian>()?; let item3 = rdr.read_i32::<LittleEndian>()?; Ok(Configuration { item1, item2, item3, }) } } fn main() { let file = File::open("/dev/random").unwrap(); let config = Configuration::from_reader(file); // How to read struct from file? } 

I've ignored the [char; 8] for a few reasons:

  1. Rust's char is a 32-bit type and it's unclear if your file has actual Unicode code points or C-style 8-bit values.
  2. You can't easily parse an array with byteorder, you have to parse N values and then build the array yourself.

4 Comments

I suppose these read_u8 and other read_X calls may invoke a system call. So it may not be very efficient. Can we read a whole structure in a certain endianness instead of small portions of integer types?
@VictorPolevoy that is the job of a buffered reader to fix. See What's the de-facto way of reading and writing files in Rust 1.x?, starting at "Buffered I/O". But yes, you can unsafely take any random blob of bytes and convert it to any given type. That's the point of the other two answers here.
what if I want to read 10 GB file? the performance penalty will be high. Using from_raw_parts is the only way IMO.
@mishmashru I don't immediately see why this would have lower performance than from_raw_parts. This isn't something you need to have an opinion about. Write both and benchmark it — then you will know for sure.
5

The following code does not take into account any endianness or padding issues and is intended to be used with POD types. struct Configuration should be safe in this case.


Here is a function that can read a struct (of a POD type) from a file:

use std::io::{self, Read}; use std::slice; fn read_struct<T, R: Read>(mut read: R) -> io::Result<T> { let num_bytes = ::std::mem::size_of::<T>(); unsafe { let mut s = ::std::mem::uninitialized(); let buffer = slice::from_raw_parts_mut(&mut s as *mut T as *mut u8, num_bytes); match read.read_exact(buffer) { Ok(()) => Ok(s), Err(e) => { ::std::mem::forget(s); Err(e) } } } } // use // read_struct::<Configuration>(reader) 

If you want to read a sequence of structs from a file, you can execute read_struct multiple times or read all the file at once:

use std::fs::{self, File}; use std::io::BufReader; use std::path::Path; fn read_structs<T, P: AsRef<Path>>(path: P) -> io::Result<Vec<T>> { let path = path.as_ref(); let struct_size = ::std::mem::size_of::<T>(); let num_bytes = fs::metadata(path)?.len() as usize; let num_structs = num_bytes / struct_size; let mut reader = BufReader::new(File::open(path)?); let mut r = Vec::<T>::with_capacity(num_structs); unsafe { let buffer = slice::from_raw_parts_mut(r.as_mut_ptr() as *mut u8, num_bytes); reader.read_exact(buffer)?; r.set_len(num_structs); } Ok(r) } // use // read_structs::<StructName, _>("path/to/file")) 

5 Comments

why ::std::mem... instead of std::mem? is there any difference?
A path starting with :: is absolute. Using an absolute path will ensure that the code will compile if the function is put on a module. Search for absolute in doc.rust-lang.org/book/crates-and-modules.html to learn more.
Why a ::std::mem::forget is needed here? Doesn't it indicates a memory leak?
@Knight To prevent the destructor from running on s (s is uninitialized). This is one use case described on forget documentation.
While this answer alludes to the underlying problem, it improperly uses unsafe Rust. The proposed function can introduce memory unsafety in safe Rust code. One example shows it causing a segfault. This code should not be used.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.