Looking at a very simple program, how to figure out where the CPU is spent:
use mmap::*; use crc::{crc32, Hasher32}; use std::cmp::min; use std::env::args; use std::fs::File; use std::fs::metadata; use std::os::unix::io::AsRawFd; use std::slice::from_raw_parts; use std::time::Instant; fn main() -> std::io::Result<()> { let filename = args().nth(1).unwrap(); let t0 = Instant::now(); let file = File::open(String::from(&filename[..]))?; let fd = file.as_raw_fd(); let fs = metadata(filename)?; let sz = fs.len() as usize; let mut offset: usize = 0; let mut c32 = crc32::Digest::new(crc32::IEEE); while offset < sz { let rem = sz - offset; let rem = min(rem, 1024 * 1024); let map = MemoryMap::new(rem, &[MapOption::MapFd(fd), MapOption::MapReadable, MapOption::MapOffset(offset)]). map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?; let buf = map.data(); c32.write(unsafe { from_raw_parts(buf, rem) }); offset += rem; } println!("{:08x} in {:.3}", c32.sum32(), t0.elapsed().as_secs()); Ok(()) } This is intended to memory-map the file provided on command line, and compute the CRC32 of it. I am not looking for other implementations that may achieve that, because my intention is to practice interactions with libc functions.
The program appears to operate correctly, but consumes much more time and CPU than equivalent Go or Java program I wrote, even though I think it is compiled with optimisations:
Cargo.toml:
[profile.dev] opt-level = 3 [package] name = "mmap" version = "0.1.0" authors = ["Gee"] edition = "2018" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] mmap = "~0.1.1" crc = "^1.0.0" Mac OSX, SSD, executing Go and Rust programs in various order to eliminate the impact of "cold" file buffers and "warm" buffers - produces about 4 seconds cold for Go, and less than a second warm run for Go, but always 30+ seconds for Rust. Tried cargo build --release, too. CPU utilisation by Rust program is substantially higher, too (eg 25% for Go, 80+% for Rust). I would expect CPU utilisation to come mostly from CRC-32 computation and copying the file contents between some buffers. The difference is then in something that causes Rust to do extra work here.
Go program scanning a 1GB XML file using the same method, syscall.Mmap, mapping 1MB at a time:
$ ../a ~/demo/dest.xml a772d8c4 in 3.978 Cargo run:
$ cargo run ~/demo/dest.xml Finished dev [optimized + debuginfo] target(s) in 0.13s Running `target/debug/mmap /Users/ololo/demo/dest.xml` a772d8c4 in 33 In the comments there was a request to show a Java program. Here's the Go program that can be seen to read that file in 3.9 seconds:
package main import( "fmt" "hash/crc32" "os" "syscall" "time" ) func main() { t0 := time.Now() fn := os.Args[1] fd, err := os.Open(fn) if err != nil { fmt.Printf("Couldn't open %s", err) return } defer fd.Close() fi, err := fd.Stat() if err != nil { fmt.Printf("Couldn't stat: %s", err) return } sz := fi.Size() cksum := crc32.NewIEEE() var off int64 off = 0 for sz > off { rem := 1024 * 1024 if sz - off < int64(rem) { rem = int(sz - off) } data, err := syscall.Mmap(int(fd.Fd()), off, rem, syscall.PROT_READ, syscall.MAP_SHARED) if err != nil { fmt.Printf("Couldn't mmap at %d", off) return } off += int64(rem) n, err := cksum.Write(data) if err != nil || n != len(data) { fmt.Printf("Somehow could not cksum %d", len(data)) return } syscall.Munmap(data) } fmt.Printf("%x in %.3f\n", cksum.Sum32(), time.Now().Sub(t0).Seconds()) }
cargo run --release; 2. Your abuse ofunsafewhen you could've done without robbed the compiler of a few valuable optimizations even with--release; 3. You didn't actually show us the java code, so we can't even tell you whether you benchmarked the same thing.ptrto aslice- but more importantly, how to confirm, or how to get around not usingunsafe?; 3. I thought of keeping the question short and focused on how to understand where Rust spends time. I've updated the question with the Go program that can be seen to run in 4 seconds, and consume 25% CPU.mmapcrate after all: ``` a772d8c4 in 37.119 Time breakdown: mmap: 32.859 .data(): 0.000 from_raw_parts: 0.000 crc32.write: 4.233 ```