How to best manage multilingual information in documents in JSON/MongoDB/Mongoose?

Question

I have to manage multilingual information in JSON documents in a MongoDB/Mongoose context. But I'm in a dilemma regarding best the format to use with a view to performance metrics:

I'm currently considering two formats (data is left empty):

// English translation in top level, and all of the others language are // kept in a translation array [ { "_id": "", "title": "", "description": "", "image": "", "slug": "", "content": "", "translation": [ { "language": "", "title": "", "description": "", "content": "" } ] } ]

and:

// English translations is moved from the header to the array // together with the translations for all the other languages [ { "_id": "", "image": "", "slug": "", "translation": [ { "language": "", "title": "", "description": "", "content": "" } ] } ]

Which of the two approaches would be better from the point of view of the performance? Are there any better format patterns to use?

"Pattern" doesn't mean what you think it means. "Which one performs better" is a question you are asking too early in your design process, but "the one with less data" is the obvious answer. — Robert Harvey
– Robert Harvey, Commented May 12, 2021 at 13:51
In any case, the best way to evaluate performance is to measure it, after you've identified an actual performance problem in your own environment. Software developers are not good at guessing. — Robert Harvey
– Robert Harvey, Commented May 12, 2021 at 13:53
@RobertHarvey Indeed, objective measurements are required to address such performance issues. However, the intention of such doing so should not prevent from a little inexpensive thinking ahead that could prevent a lot of refactoring later. Moreover, if the system is new and there is no representative datasets yet, the conclusions drawn from measurements could even be misleading (e.g. if EN language is always first in the array in case 2 will lead to different results than if it was at random position or always at the end; it would of course be noticed only with huge volumes ;-) ) — Christophe
– Christophe, Commented May 14, 2021 at 7:40
@Christophe: Sure, but not "Is this JSON faster or that one?" It's almost certainly micro-optimization. — Robert Harvey
– Robert Harvey, Commented May 14, 2021 at 13:03

Christophe · Accepted Answer · 2021-05-12 10:13:49Z

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.
The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any relative performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a member name, and the value would be the translated/localized content:

[ { "_id": "", "image": "", "slug": "", "translation": { "EN":{ "title": "", "description": "", "content": "" }, "DE":{ "title": "", "description": "", "content": "" } … } } ]

While this may seem less intuitive, you’d benefit from an optimized access to the members/languages (dictionary level performance for members vs sequential array access, see for example here for JS).

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

gnasher729 · Accepted Answer · 2021-05-12 07:59:37Z

It seems you forgot something... Like which language you translated to.

What is common is to make the whole document a dictionary with language as key. So you extract the language that you want, plus possible a fallback language in case one language isn't complete, and don't look at the languages any further.

So your code would be: Read item from wanted language. If not present read from fallback language. If not present handle error.

Stack Exchange Network

How to best manage multilingual information in documents in JSON/MongoDB/Mongoose?

2 Answers 2

Comparison of both formats

A third way ?

Hot Network Questions

How to best manage multilingual information in documents in JSON/MongoDB/Mongoose?

2 Answers 2

Comparison of both formats

A third way ?

Related

Hot Network Questions