Skip to content

feifangit/MongoDB-GridFS-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose

MongoDB GridFS comes with some natural advantages such as scalability(sharding) and HA(replica set). But as it stores file in ASCII string chunks, there's no doubt a performance loss.

I'm trying 3 different deployments (different MongoDB drivers) to read from GridFS. And compare the results to classic Nginx configuration.

Contributors

Fan Fei (feifan.pub@gmail.com)

Neil Chen (neil.chen.nj@gmail.com)

Configurations

1, Nginx

location /files/ { alias /home/ubuntu/; } 

open_file_cache kept off during the test.

2, Nginx_GridFS

It's a Nginx plugin based on MongoDB C driver. https://github.com/mdirolf/nginx-gridfs

Compile code & install

I made a quick install script in this repo, run it with sudo. After Nginx is ready, modify the configration file under /usr/local/nginx/conf/nginx.conf (if you didn't change the path).

Configuration

location /gridfs/{ gridfs test1 type=string field=filename; } 

Use /usr/local/nginx/sbin/nginx to start Nginx. And use parameter -s reload if you changed the configuration file again.

3, Python

library version

  • Flask 0.10.1
  • Gevent 1.0.0
  • Gunicorn 0.18.0
  • pymongo 2.6.3

run application

cd flaskapp/ sudo chmod +x runflask.sh bash runflask.sh 

Script runflask.sh will start gunicorn with gevnet woker mode. Gunicorn configuration file here

4, Node.js

library version

  • Node.js 0.10.4
  • Express 3.4.7
  • mongodb(driver) 1.3.23

run application

cd nodejsapp/ sudo chmod +x runnodejs.sh bash runnodejs.sh 

Test

Test items:

  1. file served by Nginx directly
  2. file served by Nginx_gridFS + GridFS
  3. file served by Flask + pymongo + gevent + GridFS
  4. file served by Node.js + GridFS

Files for downloading:

Run script insert_file_gridfs.py from MongoDB server to insert 4 different size of file to database test1(pymongo is required)

  • 1KB
  • 100KB
  • 1MB

Test Environment

2 servers:

  • MongoDB+Application/Nginx
  • tester(Apache ab/JMeter)

hardware:

Concurrency

100 concurrent requests, total 500 requests.

ab -c 100 -n 500 ... 

Result

Throughput

Time per request (download)

File size Nginx+Hard drive Nginx+GridFS plugin Python(pymongo+gevent) Node.js
1KB 0.174 1.124 1.982 1.679
100KB 1.014 1.572 3.103 3.708
1MB 9.582 9.567 15.973 18.317

You can get Apache ab report in folder: testresult

Server load

The server load is be monitored by command: vmstat 2

Nginx:

Nginx

Nginx_gridfs

Nginx

gevent+pymongo

Nginx

Node.js

Nginx

Conclusion

  • Files served by Nginx directly
  • No doubt it's the most efficient one, whether performance or server load.

  • Support cache. In real world, the directive open_file_cache should be configured well for better performance.

  • And must mention, it's the only one support pause and resume during the download(HTTP range support).

  • For the rest 3 test items, files are stored in MongoDB, but served by different drivers.

  • serve static files by application is really not an appropriate choice. They drains CPU too much and the performance is not good.
  • nginx_gridfs (MongoDB C driver): downloading requests will be processed at Nginx level, which is in front of web applications in most deployments. Web application can focus on processing dynamic contents instead of static content.
  • nginx_gridfs got the best performance comparing to other applications written in script languages. - The performance differences between Nginx and nginx_gridfs getting small after file size increased. But you can not turn a blind eye on the server load.
  • pymongo and node.js driver: it's a draw game. Static files should be avoid to be served in productive applications.

Advantages of GridFS

  • Put files in database make static content management much easier. We can omit maintain the consistency between files and its meta data in database.
  • Scalable and HA advantages come with MongoDB

Drawbacks of GridFS

  • bad performance
  • can not resume downloading after pause or break

When should I use MongoDB GridFS

There are rare use cases I can imagine, especially in a performance sensitive system. But I may taste it in some prototype projects.

Here goes the answer from MongoDB official website, hope this will help. http://docs.mongodb.org/manual/faq/developers/#faq-developers-when-to-use-gridfs

About

performance test about downloading files from MongoDB GridFS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors