Comparison of both formats
I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:
The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.
The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).
From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.
A third way ?
What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any relative performance impact compared to the first format.
If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a member name, and the value would be the translated/localized content:
[ { "_id": "", "image": "", "slug": "", "translation": { "EN":{ "title": "", "description": "", "content": "" }, "DE":{ "title": "", "description": "", "content": "" } … } } ]
While this may seem less intuitive, you’d benefit from an optimized access to the members/languages (dictionary level performance for members vs sequential array access, see for example here for JS).
Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.