Sql server full text search performance with additional conditions

Question

We have a performance problem with SQL Server (2008 R2) Full text search. When we have additional where conditions to full-text search condition, it gets too slow.

Here is my simplified query:

SELECT * FROM Calls C WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00') AND CONTAINS(CustomerText, '("efendim")')

Calls table's primary key is CallId (int, clustered index) and also Calls table indexed by CallTime. We have 16.000.000 rows and CustomerText is about 10KB for each row.

When I see execution plan, first it finds full-text search resultset and then joins with Calls table by CallId. Because of that, if first resultset has more rows, query gets slower (over a minute).

This is the execution plan:

enter image description here

When I run where conditions seperately, it returns 360.000 rows for CallTime condition:

SELECT COUNT(*) FROM Calls C WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')

and 1.200.000 rows for Contains condition:

SELECT COUNT(*) FROM Calls C WHERE CONTAINS(AgentText, '("efendim")')

What can I do to increase performance of my query?

Have you tried updatating Statistics and rerun the query ? Also try using containstable to see if it makes difference in execution time. — Kin Shah
– Kin Shah, Commented Sep 23, 2013 at 16:44

wBob · Accepted Answer · 2013-10-26 15:42:40Z

Consider using a hint to remove nested loops as a choice for the optimizer, eg

 -- Hint means Nested Loop join will never be chosen, but still leaves the Optimizer some choices SELECT * FROM Calls C WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00') AND CONTAINS(CustomerText, '("warning")') OPTION ( MERGE JOIN, HASH JOIN )

Be aware of the potential negative impact of using this hint, eg some of your queries with lower rowcount may actually perform better with Nested Loops, so test before implementing.

To make better use of your non-clustered index you could try a rewrite. I experimented with the options below, ie a more prescriptive query but still found the above query quicker. YMMV.

-- CONTAINSTABLE; as already suggested by others SELECT * FROM dbo.Calls C INNER JOIN CONTAINSTABLE(dbo.Calls, CustomerText, '("warning")') ft ON c.callId = ft.[KEY] WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00') -- This query is more likely to use the non-clustered index as it is covering for CallTime -- and callId only (not SELECT *); you could then join back to main table SELECT callId FROM Calls C WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00') INTERSECT SELECT callId FROM Calls C WHERE CONTAINS(CustomerText, '("warning")') -- Or explicitly persist the results to a temp table; this may be slower on occasions than the --'all-in-one' queries but would probably make for more consistent results IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp CREATE TABLE #tmp ( callId INT PRIMARY KEY ) GO INSERT INTO #tmp SELECT callId FROM Calls C WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00') SELECT * FROM Calls C WHERE CONTAINS(CustomerText, 'warning') AND EXISTS ( SELECT * FROM #tmp t WHERE c.callId = t.callId )

RLF · Accepted Answer · 2013-09-25 12:42:23Z

Kin made a couple of suggestions which should be helpful.

First: Make sure that your statistics for the CallTime index are up to date.

Your plan shows that the time filter is being made by seeks to the clustered index. So, for some reason the CallTime index is not being used. What is the definition of that index? If it is a multicolumn index, be sure that the most specific column is first. Example:

IDX_Calls_CallTime NOT: CallID, CallTime USE: CallTime, CallID

Second: Consider using CONTAINSTABLE.

I doubt that you will ever want to return 1.200.000 rows for "efendim". By using CONTAINSTABLE and ranking you can set the top number of rows that you want, thus reducing the number of FullText results to process to 10, 100, 1000, or whatever fits your needs.

The clustered index seeks are not because of the missing index, but because the FT returned "doc-IDs", and it needs to find those IDs in the table now. So point #1 is not valid in my opinion — jitbit
– jitbit, Commented Jun 5, 2019 at 18:49
#2 is also not applicable IMO. The top 1000 rows ranking by FTS don't guarantee to have the CallTime in the allowed range. — Hp93
– Hp93, Commented Aug 10, 2020 at 11:31

Arka Bhattacharjee · Accepted Answer · 2013-09-23 13:53:39Z

0

don't use * (asterisk) for count or any other operation until and unless you want all column values .Because whenever you use asterisk(*) It takes too much time for execution. Define column names whose value you want.

For count function Use particular a column where value exists in each rows.

answered Sep 23, 2013 at 13:53

Arka Bhattacharjee

1531 gold badge1 silver badge9 bronze badges

Actually, i break down my main query into 2 example queries to emphasize the record counts i'm dealing with. Count(*) is not the part of my problem.

Cankut
– Cankut

2013-09-23 14:04:22 +00:00
Commented Sep 23, 2013 at 14:04
And, of course, SELECT COUNT(*) which returns a single column is quite a different operation than SELECT * which returns every column in the table.

RLF
– RLF

2013-09-23 17:02:02 +00:00
Commented Sep 23, 2013 at 17:02

Add a comment |

Béranger · Accepted Answer · 2014-03-14 08:13:32Z

You can use Containstable. It is recommended to increase performances.

Here is the documentation : http://technet.microsoft.com/fr-fr/library/ms189760%28v=sql.100%29.aspx

SELECT select_list FROM table AS FT_TBL INNER JOIN CONTAINSTABLE(table, column, contains_search_condition) AS KEY_TBL ON FT_TBL.unique_key_column = KEY_TBL.[KEY]

Stack Exchange Network

Sql server full text search performance with additional conditions

4 Answers 4

Hot Network Questions

Sql server full text search performance with additional conditions

4 Answers 4

Related

Hot Network Questions