Skip to main content
6 of 10
added 237 characters in body
Erwin Brandstetter
  • 186.6k
  • 28
  • 465
  • 639

A better way to write this query?

I have the following schema for my database:

CREATE TABLE "PulledTexts" ( "Id" serial PRIMARY KEY, "BaseText" TEXT, "CleanText" TEXT ); CREATE TABLE "UniqueWords" ( "Id" serial PRIMARY KEY, "WordText" TEXT ); CREATE TABLE "WordTexts" ( "Id" serial PRIMARY KEY, "TextIdId" INTEGER REFERENCES "PulledTexts", "WordIdId" INTEGER REFERENCES "UniqueWords" ); CREATE INDEX "IX_WordTexts_TextIdId" ON "WordTexts" ("TextIdId"); CREATE INDEX "IX_WordTexts_WordIdId" ON "WordTexts" ("WordIdId"); 

Some sample data:

INSERT INTO public."PulledTexts" ("Id", "BaseText", "CleanText") VALUES (1, 'automate business audit', null), (2, 'aduit trial', null), (3, 'trial', null), (4, 'audit', null), (5, 'fresh report', null), (6, 'fresh audit', null), (7, 'automate this script', null), (8, 'im trying here', null), (9, 'automate this business', null), (10, 'lateral', null); INSERT INTO public."UniqueWords" ("Id", "WordText") VALUES (1, 'trial'), (2, 'audit'), (3, 'creation'), (4, 'business'), (5, 'automate'); INSERT INTO public."WordTexts" ("Id", "TextIdId", "WordIdId") VALUES (1, 1, 2), (2, 1, 4), (3, 1, 5), (4, 2, 1), (5, 3, 1), (6, 4, 2), (7, 6, 2), (8, 7, 5), (9, 9, 4), (10, 9, 5); 

The database itself is created through Entity framework migration for now.

I would like to know if there's a better, specifically more performance efficient way of writing this query because the WordTexts table will contain hundreds of thousands of records and eventually millions. I am also open to going a NoSql route if that would be more efficient for these kinds of queries.

SELECT * FROM "PulledTexts" WHERE "Id" IN ( SELECT "TextIdId" FROM "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" WHERE "wordTexts.WordId"."WordText" = 'example word' OR "TextIdId" IN ( SELECT "TextIdId" and1 from "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" where "wordTexts.WordId"."WordText" = 'example word2' EXCEPT SELECT "TextIdId" and2 from "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" WHERE "wordTexts.WordId"."WordText" = 'example word3' ) ); 

The reason I'm curious about this is that I read somewhere that subqueries can cause big performance hits depending on the query and I just wanted to be sure about my query.

I have barely any experience with databases and especially nosql databases so any guidance would be much appreciated.