-2

I have to manage multilingual information in JSON documents in a MongoDB/Mongoose context. But I'm in a dilemma regarding best the format to use with a view to performance metrics:

I'm currently considering two formats (data is left empty):

// English translation in top level, and all of the others language are // kept in a translation array [ { "_id": "", "title": "", "description": "", "image": "", "slug": "", "content": "", "translation": [ { "language": "", "title": "", "description": "", "content": "" } ] } ] 

and:

// English translations is moved from the header to the array // together with the translations for all the other languages [ { "_id": "", "image": "", "slug": "", "translation": [ { "language": "", "title": "", "description": "", "content": "" } ] } ] 

Which of the two approaches would be better from the point of view of the performance? Are there any better format patterns to use?

4
  • 1
    "Pattern" doesn't mean what you think it means. "Which one performs better" is a question you are asking too early in your design process, but "the one with less data" is the obvious answer. Commented May 12, 2021 at 13:51
  • In any case, the best way to evaluate performance is to measure it, after you've identified an actual performance problem in your own environment. Software developers are not good at guessing. Commented May 12, 2021 at 13:53
  • @RobertHarvey Indeed, objective measurements are required to address such performance issues. However, the intention of such doing so should not prevent from a little inexpensive thinking ahead that could prevent a lot of refactoring later. Moreover, if the system is new and there is no representative datasets yet, the conclusions drawn from measurements could even be misleading (e.g. if EN language is always first in the array in case 2 will lead to different results than if it was at random position or always at the end; it would of course be noticed only with huge volumes ;-) ) Commented May 14, 2021 at 7:40
  • 1
    @Christophe: Sure, but not "Is this JSON faster or that one?" It's almost certainly micro-optimization. Commented May 14, 2021 at 13:03

2 Answers 2

2

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

  • The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.

  • The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any relative performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a member name, and the value would be the translated/localized content:

[ { "_id": "", "image": "", "slug": "", "translation": { "EN":{ "title": "", "description": "", "content": "" }, "DE":{ "title": "", "description": "", "content": "" } … } } ] 

While this may seem less intuitive, you’d benefit from an optimized access to the members/languages (dictionary level performance for members vs sequential array access, see for example here for JS).

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

0

It seems you forgot something... Like which language you translated to.

What is common is to make the whole document a dictionary with language as key. So you extract the language that you want, plus possible a fallback language in case one language isn't complete, and don't look at the languages any further.

So your code would be: Read item from wanted language. If not present read from fallback language. If not present handle error.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.