TFMongoDB is a C++ implemented dataset op for google’s tensorflow that allows you to connect to your MongoDatabase natively. Hence you can access your mongodb stored documents more efficiently.
Currently only MacOS X is supported while Tensorflow >= 1.5 is required.
In order to use the dataset you need to install it with pip:
pip install tfmongodb To use the tfmongodb you first have to install mongoc (http://mongoc.org/libmongoc/current/installing.html) and mongocxx (http://mongocxx.org/mongocxx-v3/installation/) libraries (follow the official manual for your distribution).
Afterwards you clone the repo:
git clone https://github.com/svenboesiger/tfmongodb.git Change to the directory:
cd tfmongodb Create a virtualenv called "venv_tf", switch to the directory, initialize and install tensorflow:
virtualenv venv_tf cd venv_tf/ source bin/activate pip install tensorflow Create the makefile:
cd .. cmake . Compile and link the library:
make Create the pip package:
cd dist/ ./build_wheel.sh Install the wheel:
cd .. cd dist pip install TF<Version> TFMongoDB can be accessed through the MongoDBDataset:
dataset = MongoDBDataset(<database_name>, <collection_name>) example:
from tfmongodb import MongoDBDataset from tensorflow.python.framework import ops from tensorflow.python.data.ops import iterator_ops import tensorflow as tf CSV_TYPES = [[""], [""], [0]] def _parse_line(line): fields = tf.decode_csv(line, record_defaults=CSV_TYPES) return fields dataset = MongoDBDataset("eccounting", "creditors") dataset = dataset.map(_parse_line) repeat_dataset2 = dataset.repeat() batch_dataset = repeat_dataset2.batch(20) iterator = iterator_ops.Iterator.from_structure(batch_dataset.output_types) #init_op = iterator.make_initializer(dataset) init_batch_op = iterator.make_initializer(batch_dataset) get_next = iterator.get_next() with tf.Session() as sess: sess.run(init_batch_op, feed_dict={}) for i in range(5): print(sess.run(get_next))