22

So here's my scenario:

I'm working on Localization for a project of mine, and typically I would go about doing this in the C# code, however I want to do this in SQL a bit more since I am trying to buff up my SQL a bit.

Environment: SQL Server 2014 Standard, C# (.NET 4.5.1)

Note: the programming language itself should be irrelevant, I'm only including it for completeness.

So I sort-of accomplished what I wanted, but not to the extent I wanted. It's been a while (at least a year) since I have done any SQL JOINs except basic ones, and this is quite a complex JOIN.

Here is a diagramme of the relevant tables of the database. (There are plenty more, but not necessary for this portion.)

Database Diagramme

All relationships described in the image are complete in the database - the PK and FK constraints are all setup and operating. None of the columns described are nullable. All the tables have the schema dbo.

Now, I have a query which almost does what I want: that is, given ANY Id of SupportCategories and ANY Id of Languages, it will return either:

If there is a right-proper translation for that language for that string (I.e. StringKeyId -> StringKeys.Id exists, and in LanguageStringTranslations StringKeyId, LanguageId, and StringTranslationId combination exists, then it loads StringTranslations.Text for that StringTranslationId.

If the LanguageStringTranslations StringKeyId, LanguageId, and StringTranslationId combination did NOT exist, then it loads the StringKeys.Name value. The Languages.Id is a given integer.

My query, be it a mess, is as follows:

SELECT CASE WHEN T.x IS NOT NULL THEN T.x ELSE (SELECT CASE WHEN dbo.StringTranslations.Text IS NULL THEN dbo.StringKeys.Name ELSE dbo.StringTranslations.Text END AS Result FROM dbo.SupportCategories INNER JOIN dbo.StringKeys ON dbo.SupportCategories.StringKeyId = dbo.StringKeys.Id INNER JOIN dbo.LanguageStringTranslations ON dbo.StringKeys.Id = dbo.LanguageStringTranslations.StringKeyId INNER JOIN dbo.StringTranslations ON dbo.StringTranslations.Id = dbo.LanguageStringTranslations.StringTranslationId WHERE dbo.LanguageStringTranslations.LanguageId = 38 AND dbo.SupportCategories.Id = 0) END AS Result FROM (SELECT (SELECT CASE WHEN dbo.StringTranslations.Text IS NULL THEN dbo.StringKeys.Name ELSE dbo.StringTranslations.Text END AS Result FROM dbo.SupportCategories INNER JOIN dbo.StringKeys ON dbo.SupportCategories.StringKeyId = dbo.StringKeys.Id INNER JOIN dbo.LanguageStringTranslations ON dbo.StringKeys.Id = dbo.LanguageStringTranslations.StringKeyId INNER JOIN dbo.StringTranslations ON dbo.StringTranslations.Id = dbo.LanguageStringTranslations.StringTranslationId WHERE dbo.LanguageStringTranslations.LanguageId = 5 AND dbo.SupportCategories.Id = 0) AS x) AS T 

The problem is that it is not capable of providing me ALL of the SupportCategories and their respective StringTranslations.Text if it exists, OR their StringKeys.Name if it didn't exist. It is perfect at providing any one of them, but not at all. Basically, it's to enforce that if a language does not have a translation for a specific key, then the default is to use StringKeys.Name which is of StringKeys.DefaultLanguageId translation. (Ideally, it would not even do that, but instead load the translation for StringKeys.DefaultLanguageId, which I can do myself if pointed in the right direction for the rest of the query.)

I've spent a LOT of time on this, and I know if I were to just write it in C# (like I usually do) it would be done by now. I want to do this in SQL, and I'm having trouble getting the output I like.

The only caveat, is I want to limit the number of actual queries applied. All columns are indexed and such as I like them for now, and without real stress-testing I cannot index them further.

Edit: Another note, I'm trying to keep the database as normalized as possible, so I don't want to duplicate things if I can avoid it.

Example Data

Source

dbo.SupportCategories (Entirety):

Id StringKeyId 0 0 1 1 2 2 

dbo.Languages (185 records, only showing two for examples):

Id Abbreviation Family Name Native 38 en Indo-European English English 48 fr Indo-European French français, langue française 

dbo.LanguagesStringTranslations (Entirety):

StringKeyId LanguageId StringTranslationId 0 38 0 1 38 1 2 38 2 3 38 3 4 38 4 5 38 5 6 38 6 7 38 7 1 48 8 -- added as example 

dbo.StringKeys (Entirety):

Id Name DefaultLanguageId 0 Billing 38 1 API 38 2 Sales 38 3 Open 38 4 Waiting for Customer 38 5 Waiting for Support 38 6 Work in Progress 38 7 Completed 38 

dbo.StringTranslations (Entirety):

Id Text 0 Billing 1 API 2 Sales 3 Open 4 Waiting for Customer 5 Waiting for Support 6 Work in Progress 7 Completed 8 Les APIs -- added as example 

Current Output

Given the exact query below, it outputs:

Result Billing 

Desired Output

Ideally, I would like to be able to omit the specific SupportCategories.Id, and get all of them, as so (regardless if language 38 English was used, or 48 French, or ANY other language at the moment):

Id Result 0 Billing 1 API 2 Sales 

Additional Example

Given I were to add a localization for French (I.e. add 1 48 8 to LanguageStringTranslations), the output would change to (note: this is example only, obviously I would add a localized string to StringTranslations) (updated with French example):

Result Les APIs 

Additional Desired Output

Given the example above, the following output would be desired (updated with French example):

Id Result 0 Billing 1 Les APIs 2 Sales 

(Yes, I know technically that's wrong from a consistency standpoint, but it's what would be desired in the situation.)

Edit:

Small updated, I did change the structure of the dbo.Languages table, and drop the Id (int) column from it, and replace it with Abbreviation (which is now renamed to Id, and all relative Foreign-Keys and and relationships updated). From a technical standpoint, this is a more appropriate setup in my opinion due to the fact that the table is limited to ISO 639-1 codes, which are unique to begin with.

Tl;dr

So: the question, how could I modify this query to return everything from SupportCategories and then return either StringTranslations.Text for that StringKeys.Id, Languages.Id combination, or the StringKeys.Name if it did NOT exist?

My initial thought, is that I could somehow cast the current query to another temporary type as another subquery, and wrap this query in yet another SELECT statement and select the two fields I want (SupportCategories.Id and Result).

If I don't find anything, I'll just do the standard method I typically use which is to load all the SupportCategories into my C# project, and then with it run the query I have above manually against each SupportCategories.Id.

Thanks for any and all suggestions/comments/critique.

Also, I apologize for it being absurdly long, I just don't want any ambiguity. I'm often on StackOverflow and see questions that lack substance, didn't wish to make that mistake here.

0

2 Answers 2

18

Here is the first approach I came up with:

DECLARE @ChosenLanguage INT = 48; SELECT sc.Id, Result = MAX(COALESCE( CASE WHEN lst.LanguageId = @ChosenLanguage THEN st.Text END, CASE WHEN lst.LanguageId = sk.DefaultLanguageId THEN st.Text END) ) FROM dbo.SupportCategories AS sc INNER JOIN dbo.StringKeys AS sk ON sc.StringKeyId = sk.Id LEFT OUTER JOIN dbo.LanguageStringTranslations AS lst ON sk.Id = lst.StringKeyId AND lst.LanguageId IN (sk.DefaultLanguageId, @ChosenLanguage) LEFT OUTER JOIN dbo.StringTranslations AS st ON st.Id = lst.StringTranslationId --WHERE sc.Id = 1 GROUP BY sc.Id ORDER BY sc.Id; 

Basically, get the potential strings that match the chosen language and get all the default strings, then aggregate so you only pick one per Id - prioritize on the chosen language, then take the default as a fallback.

You can probably do similar things with UNION/EXCEPT but I suspect this will almost always lead to multiple scans against the same objects.

0
14

An alternative solution that avoids the IN and the grouping in Aaron's answer:

DECLARE @SelectedLanguageId integer = 48; SELECT SC.Id, SC.StringKeyId, Result = CASE -- No localization available WHEN LST.StringTranslationId IS NULL THEN SK.Name ELSE ( -- Localized string SELECT ST.[Text] FROM dbo.StringTranslations AS ST WHERE ST.Id = LST.StringTranslationId ) END FROM dbo.SupportCategories AS SC JOIN dbo.StringKeys AS SK ON SK.Id = SC.StringKeyId LEFT JOIN dbo.LanguageStringTranslations AS LST WITH (FORCESEEK) -- Only for low row count in sample data ON LST.StringKeyId = SK.Id AND LST.LanguageId = @SelectedLanguageId; 

As noted, the FORCESEEK hint is only required to get the most efficient-looking plan due to the low cardinality of the LanguageStringTranslations table with the sample data provided. With more rows, the optimizer will choose an index seek naturally.

The execution plan itself has an interesting feature:

Execution Plan

The Pass Through property on the last outer join means that a lookup to the StringTranslations table is only performed if a row was previously found in the LanguageStringTranslations table. Otherwise, the inner side of this join is skipped completely for the current row.

Table DDL

CREATE TABLE dbo.Languages ( Id integer NOT NULL, Abbreviation char(2) NOT NULL, Family nvarchar(96) NOT NULL, Name nvarchar(96) NOT NULL, [Native] nvarchar(96) NOT NULL, CONSTRAINT PK_dbo_Languages PRIMARY KEY CLUSTERED (Id) ); CREATE TABLE dbo.StringTranslations ( Id bigint NOT NULL, [Text] nvarchar(128) NOT NULL, CONSTRAINT PK_dbo_StringTranslations PRIMARY KEY CLUSTERED (Id) ); CREATE TABLE dbo.StringKeys ( Id bigint NOT NULL, Name varchar(64) NOT NULL, DefaultLanguageId integer NOT NULL, CONSTRAINT PK_dbo_StringKeys PRIMARY KEY CLUSTERED (Id), CONSTRAINT FK_dbo_StringKeys_DefaultLanguageId FOREIGN KEY (DefaultLanguageId) REFERENCES dbo.Languages (Id) ); CREATE TABLE dbo.SupportCategories ( Id integer NOT NULL, StringKeyId bigint NOT NULL, CONSTRAINT PK_dbo_SupportCategories PRIMARY KEY CLUSTERED (Id), CONSTRAINT FK_dbo_SupportCategories FOREIGN KEY (StringKeyId) REFERENCES dbo.StringKeys (Id) ); CREATE TABLE dbo.LanguageStringTranslations ( StringKeyId bigint NOT NULL, LanguageId integer NOT NULL, StringTranslationId bigint NOT NULL, CONSTRAINT PK_dbo_LanguageStringTranslations PRIMARY KEY CLUSTERED (StringKeyId, LanguageId, StringTranslationId), CONSTRAINT FK_dbo_LanguageStringTranslations_StringKeyId FOREIGN KEY (StringKeyId) REFERENCES dbo.StringKeys (Id), CONSTRAINT FK_dbo_LanguageStringTranslations_LanguageId FOREIGN KEY (LanguageId) REFERENCES dbo.Languages (Id), CONSTRAINT FK_dbo_LanguageStringTranslations_StringTranslationId FOREIGN KEY (StringTranslationId) REFERENCES dbo.StringTranslations (Id) ); 

Sample Data

INSERT dbo.Languages (Id, Abbreviation, Family, Name, [Native]) VALUES (38, 'en', N'Indo-European', N'English', N'English'), (48, 'fr', N'Indo-European', N'French', N'français, langue française'); INSERT dbo.StringTranslations (Id, [Text]) VALUES (0, N'Billing'), (1, N'API'), (2, N'Sales'), (3, N'Open'), (4, N'Waiting for Customer'), (5, N'Waiting for Support'), (6, N'Work in Progress'), (7, N'Completed'), (8, N'Les APIs'); -- added as example INSERT dbo.StringKeys (Id, Name, DefaultLanguageId) VALUES (0, 'Billing', 38), (1, 'API', 38), (2, 'Sales', 38), (3, 'Open', 38), (4, 'Waiting for Customer', 38), (5, 'Waiting for Support', 38), (6, 'Work in Progress', 38), (7, 'Completed', 38); INSERT dbo.SupportCategories (Id, StringKeyId) VALUES (0, 0), (1, 1), (2, 2); INSERT dbo.LanguageStringTranslations (StringKeyId, LanguageId, StringTranslationId) VALUES (0, 38, 0), (1, 38, 1), (2, 38, 2), (3, 38, 3), (4, 38, 4), (5, 38, 5), (6, 38, 6), (7, 38, 7), (1, 48, 8); -- added as example 
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.