6

I want to read single bytes as fast as possible from a file into a D2 application. The application need byte per byte, so reading larger blocks of data is not an option for the interface to the reader.

For this I created some trivial implementations in C++, Java, D2 at: https://github.com/gizmomogwai/performance.

As you can see I tried plain reads, buffers in the application code and memory mapped files. For my usecase the memory mapped solution worked best, but the strange thing is that D2 is slower than java. I would have hoped for D2 to land between C++ and Java (C++ code is compiled with -O3 -g, D2 code is compiled with -O -release).

So please tell me what I am doing wrong here and how to speed up the D2 implementation.

To give you an idea of the use case here is a C++ implementation:

class StdioFileReader { private: FILE* fFile; static const size_t BUFFER_SIZE = 1024; unsigned char fBuffer[BUFFER_SIZE]; unsigned char* fBufferPtr; unsigned char* fBufferEnd; public: StdioFileReader(std::string s) : fFile(fopen(s.c_str(), "rb")), fBufferPtr(fBuffer), fBufferEnd(fBuffer) { assert(fFile); } ~StdioFileReader() { fclose(fFile); } int read() { bool finished = fBufferPtr == fBufferEnd; if (finished) { finished = fillBuffer(); if (finished) { return -1; } } return *fBufferPtr++; } private: bool fillBuffer() { size_t l = fread(fBuffer, 1, BUFFER_SIZE, fFile); fBufferPtr = fBuffer; fBufferEnd = fBufferPtr+l; return l == 0; } }; size_t readBytes() { size_t res = 0; for (int i=0; i<10; i++) { StdioFileReader r("/tmp/shop_with_ids.pb"); int read = r.read(); while (read != -1) { ++res; read = r.read(); } } return res; } 

which is much faster compared to the "same" solution in D:

struct FileReader { private FILE* fFile; private static const BUFFER_SIZE = 8192; private ubyte fBuffer[BUFFER_SIZE]; private ubyte* fBufferPtr; private ubyte* fBufferEnd; public this(string fn) { fFile = std.c.stdio.fopen("/tmp/shop_with_ids.pb", "rb"); fBufferPtr = fBuffer.ptr; fBufferEnd = fBuffer.ptr; } public int read(ubyte* targetBuffer) { auto finished = fBufferPtr == fBufferEnd; if (finished) { finished = fillBuffer(); if (finished) { return 0; } } *targetBuffer = *fBufferPtr++; return 1; } private bool fillBuffer() { fBufferPtr = fBuffer.ptr; auto l = std.c.stdio.fread(fBufferPtr, 1, BUFFER_SIZE, fFile); fBufferEnd = fBufferPtr + l; return l == 0; } } size_t readBytes() { size_t count = 0; for (int i=0; i<10; i++) { auto reader = FileReader("/tmp/shop_with_ids.pb"); ubyte buffer[1]; ubyte* p = buffer.ptr; auto c = reader.read(p); while (1 == c) { ++count; c = reader.read(p); } } return count; } 
5
  • 1
    I have done some other non-related coding in D and Java, (math intensive computations), and turns out Java is marginally faster in my tests. I guess that you should not expect java to be that much slower nowadays, the JIT compiler is VERY good at optimization. Commented Aug 26, 2011 at 10:40
  • 1
    Yeah ... you are right ... I dont expect java to be much slower than cpp (which it still is in my demo example using the default jit), but my point is that d is even slower. I hoped d to be on par with cpp. Commented Aug 26, 2011 at 11:05
  • Yes, I did that as well, when I converted a Java algorithm to D a few months ago. I think they have some quirks to fix in the code optimization. or maybe the GC is just really bad, and slow, so try turning that of? Commented Aug 26, 2011 at 16:23
  • @Paxinum: Actually i tried this already (and it did not make a difference). The algorithm (if you can call this small program such) does not produce much garbage. Commented Aug 26, 2011 at 16:45
  • Added llvm-c++ to the benchmarks ... just for information! Commented Aug 30, 2011 at 15:04

2 Answers 2

3

It's very likely because of sfread. No one guarantees it to be doing the same thing in D as in C -- you're very likely using a different CRT altogether (unless you're using the Digital Mars C++ compiler?).

That means the library could be doing things like synchronization, etc. which slow things down. The only way you can know is to force D to use the same library as C, by telling the linker to link to the same libraries.

Until you can do that, you're comparing apples to oranges. If that's not possible, then call the OS directly from both, and then compare the results -- that way you're guaranteed that the underlying call is the same for both.

Sign up to request clarification or add additional context in comments.

12 Comments

You are totally right. I dont know if the implementations of fread are similar or not. But my question is how to implement the functionality in d2 as fast as in java or even in c++.
@Gizmomogwai: Right, but the implication of this question is that D is inherently slow. There's a big difference between a language that's inherently slow because its design fundamentally requires a lot of overhead and a language where one small area is slow because it's not well-optimized yet.
@Gizmomogwai: To implement it as fast as in Java or as in C++, you just have to do whatever they're doing -- which probably means that you should create your own buffered wrapper around the native OS call (ReadFile on Windows) and use that, then see how it goes. That will tell you whether it's a language problem or a library problem.
@dsimcha: thanks for your comment. actually i don't think, that d2 is inherently slower than java or c++. thats the reason why i started with this microbenchmark. there are some discussions in the d2-newsgroups about a new stream interface (digitalmars.com/d/archives/digitalmars/D/…)
@Mehrdad: i think i am almost at your proposed solution with readbytes2.d by using import std.c.stdio, which is as i understand a very thin layer on top of the std-c library. Also i am not comparing to navive windows calls, because the c++ solution also uses the stdlib.
|
1

what happens if you use the std.stdio module:

import std.stdio; struct FileReader { private File fFile; private enum BUFFER_SIZE = 8192;//why not enum? private ubyte[BUFFER_SIZE] fBuffer=void;//avoid (costly) initialization to 0 private ubyte[] buff; public this(string fn) { fFile = File("/tmp/shop_with_ids.pb", "rb"); } /+ public ~this(){//you really should have been doing this if you used std.c.stdio.fopen //but it's unnecessary for std.stdio's File (it's ref counted) fFile.close(); } +/ public int read(out ubyte targetBuffer) { auto finished = buff.length==0; if (finished) { finished = fillBuffer(); if (finished) { return 0; } } targetBuffer = buff[0]; buff = buff[1..$]; return 1; } private bool fillBuffer() { if(!fFile.isOpen())return false; buff = fFile.rawRead(fBuffer[]); return buff.length>0; } } size_t readBytes() { size_t count = 0; for (int i=0; i<10; i++) { auto reader = FileReader("/tmp/shop_with_ids.pb"); ubyte buffer; auto c = reader.read(buffer); while (1 == c) { ++count; c = reader.read(buffer); } } return count; } 

if you want true speed comparison you should compile with -release -O -inline (this turns off debugging (mostly array OOB checks) optimizes and inlines what it can) (and of course similar with the c++ solution as well)

1 Comment

thanks for your comments. actually i compiled the d2 code with -O -release (-inline was slower for all of my examples). i corrected your program (fillBuffer should return buff.length == 0) and my benchmark yielded 600ms on my machine (compared to 80ms which is the cpp-mmapped solution). see the table on the github page.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.