Cursor.each should work in batches #804

Swatinem · 2012-12-14T11:59:11Z

Currently, Cursor.each uses a really inefficient process.nextTick -> nextObject() -> recurse to self
loop.

This patch avoids this behavior and rather loops over a batch directly, providing a nice speedup up to 2x depending on workload.
I’ve run make test and its passing without issues.

The benchmarks I have done are available at Swatinem/mongobench. I can observe the following speedups:

(jitter graph is in log scale)

The patched version however uses more memory:

I might need some help fixing that.

christkv · 2012-12-14T12:40:11Z

Higher memory usage would be normal in this case as you're keeping around more items in each stack frame. process.nextTick with nextObject means the stack frame is tiny for each. But if you have a couple of thousands of objects in each process.nextTick GC will have to do more work collecting them as the last graph shows pretty well. More memory usage pr say is not a problem but it might be worth benchmarking it with higher concurrency (like 50-100 clients running cursors on the same size of data) and see what the memory impact is. I have been thinking about looking into using trampolining instead of process.nextTick but have had no luck getting it to work.

By the way process.nextTick is not necessarily a bad thing. Think of it as yielding to the event loop so it can schedule other concurrent in flight work.

Swatinem · 2012-12-15T03:57:16Z

I have updated my benchmark to run with concurrency. You can also grab the code here to run the tests yourself.

A few more pretty graphs with concurrency=100

The memory numbers are now actually better with my patch. I also noticed that I’m actually using Cursor.toArray() instead of .each() which surely adds some memory overhead.

Regarding process.nextTick:
nodejs/node-v0.x-archive@8546d18 got me a little worried. Seems like nextTick will be completely overhauled for 0.10.
However, I haven’t actually tried it so I can’t really say if it really is an issue for this workload.

Swatinem · 2012-12-15T11:41:41Z

Since my use case involves Cursor.toArray, why not use batches there too instead of delegating to .each…
With some minor array optimizations, I get a small speedup as well

And with concurrency = 100

christkv · 2012-12-17T11:34:17Z

I can't seem to get the code you provided to run, I assume it's using some stuff I cannot see in your environment. Can your provide some simple instructions on how to run it and generate the graphs as I'm interested in just playing with it a bit before taking a decision on the pull request. So far it looks good though but I need to test with varying sizes of documents being returned as well as concurrency.

Swatinem · 2012-12-17T17:15:37Z

I’ve added a README: Swatinem/mongobench@cd43bb0

Should be straight forward, just copy/symlink mongodb into $directory and run the nodejs script and save the output to a .tsv file.
Then you need R to create the graphs.

christkv · 2012-12-23T16:20:28Z

Had to back out this checkin due to problems with replicasets when it was in, will investigate it again after christmas

Cursor.each should work in batches

8a88cae

Cursor.toArray should use batches too + minor optimizations

3dee6b7

christkv merged commit 3dee6b7 into mongodb:master Dec 19, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cursor.each should work in batches #804

Cursor.each should work in batches #804

Uh oh!

Swatinem commented Dec 14, 2012

christkv commented Dec 14, 2012

Swatinem commented Dec 15, 2012

Swatinem commented Dec 15, 2012

christkv commented Dec 17, 2012

Swatinem commented Dec 17, 2012

christkv commented Dec 23, 2012

Labels

2 participants

Cursor.each should work in batches #804

Cursor.each should work in batches #804

Uh oh!

Conversation

Swatinem commented Dec 14, 2012

christkv commented Dec 14, 2012

Swatinem commented Dec 15, 2012

Swatinem commented Dec 15, 2012

christkv commented Dec 17, 2012

Swatinem commented Dec 17, 2012

christkv commented Dec 23, 2012

Labels

2 participants