0

I have a MongoDB document with quite a large embedded array:

name : "my-dataset" data : [ {country : "A", province: "B", year : 1990, value: 200} ... 150 000 more ] 

Let us say I want to return data objects where country == "A".

  1. What is the proper way of doing this, for example via NodeJs?

  2. Given 150 000 entries with 200 matches, how long should the query take approximately?

  3. Would it be better (performance/structure wise) to store data as documents and the name as a property of each document?

  4. Would it be more efficient to use Mysql for this? )

3
  • Okay. Now taking a breath here which question are you actually asking? Are you saying a "document" has 150,000 embedded array elements? Probably not a good design. Try to carefully explain what you are trying to do and make it sound like a single question. The solution is most likely to use a different structure, but proper answers will address what you need to do because you explained it properly. BTW. Just bolding text does not get you better answers. Commented Sep 24, 2014 at 9:03
  • 1
    4. From my experience, when comparing performance with mysql, results are mostly equivalent for simple queries like 'find me one record by this ID', and somewhat better with mysql for more complex sql queries with grouping etc. (aggregation framework with mongo) It was for some rather big amount of data (0.5 Tb). what seems more important, that mysql required 3 times less disk space to hold exactly same data, which indirectly may be the reason why it is faster (less disk reading). Commented Sep 24, 2014 at 10:09
  • explain() is the best way to analyze performance in mongodb. Find more details in docs.mongodb.org/manual/reference/method/cursor.explain/… Commented Sep 24, 2014 at 13:23

2 Answers 2

1

A) Just find them with a query.

B) If the compound index {name:1, data.country:1} is built, the query should be fast. But you store all the data in one array, $unwind op has to be used. As a result, the query could be slow.

C) It will be better. If you store the data like:

{country : "A", province: "B", year : 1990, value: 200, name:"my-dataset"} {country : "B", province: "B", year : 1990, value: 200, name:"my-dataset"} ... 

With compound index {name:1, country:1}, the query time should be < 10ms.

D) MySQL vs MongoDB 1000 reads

Sign up to request clarification or add additional context in comments.

Comments

1

1.You can use the MongoDB aggregation :

db.collection.aggregate([ {$match: {name: "my-dataset"}}, {$unwind: "$data"}, {$match: {"data.country": "A"}} ]) 

Will return a document for each data entry where the country is "A". If you want to regroup the datasets, add a $group stage :

db.collection.aggregate([ {$match: {name: "my-dataset"}}, {$unwind: "$data"}, {$match: {"data.country": "A"}}, {$group: {_id: "$_id", data: {$addToSet: "$data"}}} ]) 

(Didn't test it on a proper dataset, so it might be bugged)

2.150000 Subdocuments is still not a lot for mongodb, so if you're only querying on one dataset it should be pretty fast (the order of the millisecond).

3.As long as you are sure that your document is going to be smaller than 16MB (kinda hard to say), the maximum BSON document size), it should be fine, but the queries would be simpler if you stored your data as documents with the dataset name as a property, which is generally better for performances.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.